Foods, Vol. 15, Pages 2069: Predicting and Co-Optimizing the Taste and Aroma of Green Tea During Spreading Using the TabPFN Model

To investigate how spreading conditions affect green tea taste and aroma and to develop a generalizable prediction model from small data for process optimization, this study integrated SEM, non-targeted dual-omics, and TabPFN to systematically analyze Echa No. 10 spreading. A central composite design was used. Dehydration-induced mechanical stress altered cell membrane permeability, driving non-volatile taste compound transformation and volatile aroma release. Two chemical-sensory proxies, relative polyphenol-to-amino acid ratio (R-PAR) and floral intensity index (FII), were established using ultra-high performance liquid chromatography–high-resolution mass spectrometry (UHPLC-HRMS) and headspace solid-phase microextraction–gas chromatography–mass spectrometry (HS-SPME-GC-MS). A prediction model was built with these indicators and TabPFN. Multi-objective optimization yielded optimum conditions: initial moisture 76.8%, temperature 26.2 °C, relative humidity 61.5%, air speed 0.85 m/s, achieving R-PAR 0.465 and FII 125.70. Compared with response surface methodology (RSM), partial least squares regression (PLSR), and support vector regression (SVR), TabPFN showed prediction R 2 of 0.81 and 0.77, showing favorable applicability and predictive capability on small-sample data. This study validates TabPFN’s suitability for small-sample tea processing modeling, quantifies the mapping between spreading and key taste/aroma metabolism, and provides a methodological foundation for digital precision and intelligent optimization in green tea production. 1. Introduction Tea is the most widely consumed beverage raw material worldwide, characterized by rich product diversity and pronounced flavor differentiation. Depending on the processing method and degree of fermentation, tea can be classified into six major categories: green tea, black tea, oolong tea, white tea, yellow tea, and dark tea [ 1]. Among these, green tea stands out for its abundance of natural bioactive compounds such as tea polyphenols and amino acids, endowing it with significant health benefits and nutritional value [ 2, 3]. According to the 2025 China Tea Production and Marketing Situation Analysis Report released by the China Tea Marketing Association [ 4], China’s total dry tea output reached 3,640,300 tons in 2025, of which green tea production amounted to 2,148,700 tons, accounting for 59.03% of the national total. Green tea thus represents the largest and most industrially established core tea category in China. Among the numerous evaluation criteria that determine the grade and market price of green tea, taste and aroma serve as the central quality factors, together contributing up to 55% of the total weighting in comprehensive sensory evaluation [ 5]. Machine learning techniques have been widely applied in the tea processing domain with considerable success. However, the accuracy of these models typically depends on large sample sizes, and their modeling capability becomes severely constrained in small-sample scenarios. Given that tea processing is restricted by seasonal harvesting windows and the high cost of wet chemistry experiments, only small-scale trials are usually feasible. The resulting small feature matrices can easily drive complex models into overfitting. The Tabular Prior-Data Fitted Network (TabPFN) is a machine learning model specifically designed for small-sample experimentation. Built upon the Transformer architecture, TabPFN undergoes offline meta-learning on massive synthetic tabular datasets, encoding complex statistical prior knowledge into its network parameters [ 15]. This architecture enables TabPFN to directly approximate Bayesian posterior inference through in-context learning, without the need for gradient updates or hyperparameter tuning for a given target task. It thereby circumvents the overfitting risk inherent in small-sample settings while ensuring that predictions at the extreme boundaries of process parameters remain consistent with biochemical reaction principles, offering a robust modeling solution for tea processing studies constrained by limited sample sizes. Against the above background, this study takes the fresh leaves of Echa No. 10 as the research material. Scanning electron microscopy (SEM) was employed to observe microstructural changes under dehydration stress. Because floral notes constitute the core aroma profile of this specific cultivar, dual-omics techniques were applied to screen and define two core chemical-sensory proxies of its taste and aroma quality: the relative polyphenol-to-amino acid ratio (R-PAR) and the Floral Intensity Index (FII). On this basis, the TabPFN model, designed for small-sample scenarios, was introduced to construct a spreading quality prediction model with R-PAR and FII as proxy indicators and to perform multi-objective collaborative global process optimization. The performance of TabPFN was then benchmarked against mainstream modeling approaches to verify its advantages and predictive reliability under this specific process scenario, aiming to provide a novel modeling framework to support the digital precision processing of green tea. The novelty of this study lies in: (1) introducing the TabPFN model into the field of tea processing quality prediction for the first time, and (2) achieving multi-objective collaborative optimization of spreading process for both taste and aroma using dual-quality indicators jointly constructed by non-targeted metabolomics and volatilomics. 2.1.1. Tea Samples and Reagents The fresh tea leaves used in this study were harvested from Xiaocun Township, Xianfeng County, Enshi Prefecture, Hubei Province, between 25 March and 15 April 2025. Picking was conducted daily at 8:00 a.m. and suspended on rainy days. The cultivar was Echa No. 10, and the picking standard was strictly controlled to one bud and two leaves. Chromatographic-grade methanol, acetonitrile, formic acid, and isopropanol, along with deuterated styrene (25 μg/mL), were purchased from Sigma-Aldrich (St Louis, MO, USA). The 2.5% glutaraldehyde fixative and graded dehydration reagents were obtained from Beijing Solarbio Science & Technology Co., Ltd. (Beijing, China). Liquid nitrogen (purity ≥ 99%) was supplied by a local gas supplier. Ultrapure water produced by a Milli-Q ultrapure water system (Millipore, Burlington, MA, USA) was used throughout all experiments. 2.1.2. Experimental Equipment The JDKY-I multi-parameter controllable thin-layer spreading test bench, designed and manufactured by Changchun Jida Scientific Instrument Equipment Co., Ltd. (Changchun, China), and manufactured in 2023, served as the simulation equipment for green tea spreading (see ). The equipment dimensions are as follows: Length 550 mm × Width 550 mm × Height 1800 mm. The technical parameters include: Air velocity ranging from 0 to 1 m/s with an accuracy of ±0.05 m/s; a temperature range from room temperature to 100 °C with an accuracy of ±1 °C; and a relative humidity range of 20% to 80% with an accuracy of ±2%. Other equipment: An HC103 halogen moisture analyzer (Mettler Toledo, Greifensee, Switzerland), a Shimadzu ATX124R analytical balance with 0.1 mg readability (Shimadzu Corporation, Tokyo, Japan), a MAS-II Plus microwave synthesis/extraction workstation (Shanghai Xinyi Microwave Chemical Technology Co., Ltd., Shanghai, China), a Scilogex D3024R low-temperature high-speed centrifuge (Scilogex Scientific Instruments Co., Ltd., Rocky Hill, CT, USA), an Agilent 1100 high-performance liquid chromatograph equipped with a UV detector, an Agilent 7890 gas chromatograph coupled with an Agilent 5975 mass spectrometer together with the corresponding solid-phase microextraction fiber assembly (Agilent Technologies, Inc., Santa Clara, CA, USA), a Vanquish ultra-high performance liquid chromatograph and a Q Exactive™ HF high-resolution mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA), a JSM-6700F cold field emission scanning electron microscope (JEOL Ltd., Tokyo, Japan), and a Milli-Q ultrapure water system (Millipore). 2.2. Spreading Experiment Utilizing Design-Expert software (version 8.0.6.1) in conjunction with the Central Composite Design (CCD) method, a four-factor, five-level spreading experiment comprising 21 runs was designed for green tea. The experimental factors included initial moisture content (matched to the nearest level, as the actual moisture of fresh leaves cannot be perfectly aligned with predefined values), spreading temperature, relative spreading humidity, and spreading airflow velocity. Each spreading treatment was repeated three times, yielding a total of 63 samples. The independent variables and their designated levels are detailed in . The specific spreading procedures were conducted as follows: (1) A 5 g (±0.0001 g) sample of fresh tea leaves was extracted to determine the initial moisture content using a halogen moisture analyzer; this measurement was performed in triplicate. (2) A 0.6 kg batch of fresh tea leaves was placed into a multi-parameter controllable spreading apparatus. The experiment was executed according to the specific parameter levels outlined in . Once spreading commenced, samples were taken every hour to predict the moisture content using the gravimetric method. (3) When the predicted moisture content decreased to 71%, the sampling and weighing frequency was increased to every 30 min. The spreading process was deemed complete when the predicted moisture content reached 70%. Subsequently, the actual moisture content was determined in accordance with the national standard GB/T 8304-2002 (Tea—Determination of moisture content) [ 16]. (4) Following the methodology described in a previous study [ 17], the spread samples were subjected to microwave fixation at a power level of 6 kW. (5) The fixed samples were placed in a forced-air drying oven at 80 °C for 1 h, and then transferred to a desiccator to cool to room temperature for subsequent use. (6) The prepared samples were individually subjected to qualitative and quantitative analyses to determine the core taste and aroma compounds of the green tea. (7) Across all aforementioned steps, sampling was conducted using the quartering method, in reference to GB/T 8302-2002 (Tea—Sampling) [ 18]. Furthermore, the grinding of samples destined for quality component analysis was performed according to GB/T 8303-2002 (Tea—Preparation of ground sample and determination of dry matter content) [ 19]. For all subsequent physicochemical analyses, both UHPLC-HRMS and HS-SPME-GC-MS measurements for each spreading sample were performed in triplicate, and the arithmetic mean was used for subsequent data analysis and modeling. 2.3.1. Microstructural Observation via Scanning Electron Microscopy To investigate the effects of spreading-induced stress on the histological structure of fresh tea leaves at the microphysical level, the microstructures of the treated samples were observed using SEM. The observations encompassed the overall structure of the epidermal cells (100×), the evolution of mechanical epidermal shrinkage (500×), and the distribution and initial state of the stomata (500×), as well as the high-magnification microscopic features of stress-induced stomatal closure (2000×). Furthermore, the analysis included the contractive structure of the vascular bundles in the tea stalks (50×), the curling and structural damage characteristics of the trichomes (2500×), and the textural alterations of the cuticular wax alongside the evolution of drought-induced microcracks (2500×). By comparatively analyzing the differences in these microstructural features under various spreading conditions, the extent to which the intensity of environmental stress degraded the physical structure of the tea cells was qualitatively demonstrated. The sample preparation and observation procedures were conducted according to standard scanning electron microscopy (SEM) preparation methods for biological samples [ 20]. 2.3.2. UHPLC-HRMS Methodology The detection method was based on reference [ 21], with the following modifications tailored to the characteristics of fresh green tea leaves: Briefly, 100 mg of freeze-dried tea powder, previously ground in liquid nitrogen, was accurately weighed and mixed with 500 μL of an 80% aqueous methanol solution. The mixture was vortexed thoroughly and incubated in an ice bath for 5 min, followed by centrifugation at 15,000× g for 20 min at 4 °C. The resulting supernatant was diluted with ultrapure water to a final methanol concentration of 53% and centrifuged again. This final supernatant was collected for subsequent analysis. Simultaneously, quality control (QC) samples, prepared by mixing equal volumes of the experimental samples, and blank samples (53% methanol) were established. Chromatographic and Mass Spectrometric Parameters: Chromatographic separation was performed using a Vanquish UHPLC system coupled with a Hypersil Gold C18 column (100 × 2.1 mm, 1.9 μm). The column temperature was maintained at 40 °C, the injection volume was set to 2 μL, and the flow rate was held constant at 0.2 mL/min. The mobile phase consisted of a 0.1% aqueous formic acid solution (A) and methanol (B). The elution gradient was programmed as follows: 0–1.5 min, maintained at 2% B; 1.5–3.0 min, linearly increased to 85% B; 3.0–10.0 min, further increased to 100% B. Mass spectrometry was conducted using a Q Exactive HF high-resolution mass spectrometer. Fragmentation data within an m/ z range of 100–1500 were acquired in both positive and negative ion modes utilizing data-dependent acquisition (DDA). Data Processing and Metabolite Identification: The raw instrumental data were extracted and aligned using XCMS software (Version 3.16.0), followed by qualitative identification against the NovoMetDB database. After filtering out unstable characteristic peaks with a coefficient of variation (CV) > 30% in the QC samples, precise qualitative identification and pathway annotation of the metabolites were achieved by integrating a high-quality local MS2 spectral library with public databases, including the Kyoto Encyclopedia of Genes and Genomes (KEGG), Human Metabolome Database (HMDB), and Lipid Metabolites and Pathways Strategy (LIPID MAPS). 2.3.3. Calculation of R-PAR The concentration ratio of polyphenols to free amino acids serves as a core theoretical index for evaluating the balance between astringency and umami, as well as the overall quality of tea infusions. Given the significant disparities in the ionization efficiencies of various compounds in UHPLC-HRMS data, directly comparing raw peak areas lacks statistical validity. Consequently, this study introduced the min-max normalization pretreatment strategy, a classic approach in metabolomic data analysis [ 22]. Based on this dimensionality reduction strategy, the objective function for R-PAR was constructed. Referring to previous studies on the taste profiles of green tea [ 23, 24], from the untargeted mass spectrometry data of the experimental groups, the raw peak areas x i , j of the m core polyphenols and n core free amino acids, which decisively influence the astringent and umami qualities, were explicitly identified. Subsequently, the corresponding peak areas of the m + n compounds in the experimental samples were individually subjected to min-max normalization N i , j = x i , j − x i , m i n x i , m a x − x i , m i n (1) the comprehensive retention index of polyphenols P j : P j = 1 m ∑ i = 1 m N i , j (2) the comprehensive retention index of amino acids A j : A j = 1 n ∑ i = 1 n N i , j (3) the relative polyphenol-to-amino acid ratio (R-PAR, R j ): R j = P j A j (4) It should be noted that R-PAR is not a direct quantification of sensory taste, but rather a chemical-sensory proxy constructed on the basis of compounds with known taste contributions. It was determined that a lower R-PAR value indicates superior spreading quality, which aligns perfectly with the metabolic transformation objective of “suppressing bitterness and enhancing umami” during green tea processing [ 25]. Therefore, R-PAR can serve as a reliable chemical indicator for evaluating the evolution of taste quality under the present experimental conditions. 2.3.4. HS-SPME-GC-MS Determination Method The detection method was performed with slight modifications based on the previously reported literature to suit the characteristics of green tea [ 26]. Extraction and SPME Conditions: The tea samples were shredded, and 1.00 g of the solid sample was accurately weighed into a headspace vial. Subsequently, 20 μL of internal standard solution (deuterated styrene, CAS: 19361-62-7, 25 μg/mL in methanol) was added. The vial was equilibrated at 60 °C and 250 r/min for 15 min. The SPME fiber was then inserted into the headspace of the vial for extraction at 60 °C for 30 min. Finally, the fiber was inserted into the GC inlet and thermally desorbed at 260 °C for 5 min. GC-MS Parameters: Analysis was conducted using an Agilent 7890 GC coupled with an Agilent 5975 MS, equipped with an HP-5MS Ultra Inert capillary column (30 m × 0.25 mm × 0.25 μm, Agilent Technologies). The injection was performed in split mode (split ratio 25:1) with high-purity helium (99.999%) as the carrier gas at a constant flow rate of 1.0 mL/min. The oven temperature program was set as follows: initial temperature at 50 °C (held for 5 min), increased to 230 °C at a rate of 7 °C/min (held for 3 min), and finally increased to 320 °C at a rate of 40 °C/min (held for 5 min). The MS parameters were: ion source temperature 240 °C, quadrupole temperature 160 °C, ionization energy 70 eV, and mass scan range 50–500 m/ z. Data Processing and Identification: Raw data were converted to .abf format via the Analysis Base File Converter. Peak picking, alignment, and integration were performed using MSDIAL software (version 4.60). Accurate identification of metabolites was achieved by matching mass spectra and Retention Index (RI) with the NIST 2020 database. Finally, the relative content ( C i ) of each monomer was output based on the internal standard method to construct the volatile relative abundance matrix. This matrix served as the fundamental data input for the subsequent calculation of the Floral Intensity Index (FII), without further secondary conversion to absolute peak areas [ 27]. 2.3.5. Calculation of FII Water-deficit stress during the green tea spreading process drastically induces carotenoid degradation and terpenoid metabolic pathways, prompting the fresh leaves to emit pronounced floral and fruity aromas. To eliminate environmental background noise in high-dimensional omics data and establish a direct mapping between physical abundance and human sensory perception, this study introduced the Relative Odor Activity Value (ROAV) quantitative method [ 28] to construct the FII, which exhibits high sensitivity to variations in the spreading process. By referencing the standard odor thresholds of volatile aroma compounds in the aqueous phase from the literature [ 29], and integrating the relative content and olfactory threshold of each component, the ROAV for each aroma component was calculated, as detailed in the Equation. R O A V i = C i / T i C m a x / T m a x ୍ଠ 100 (5) where C i is the relative content (μg/kg) of the odor-active compound; T i is the standard olfactory threshold of that compound in the aqueous phase. C m a x T m a x represent the relative content and the olfactory threshold, respectively, of the component exhibiting the maximum aroma contribution across the entire sample set. To achieve global comparability of aroma quality among different spreading treatment groups, the ROAV of the component with the maximum overall aroma contribution across the entire sample set (i.e., the compound with the highest ratio of relative content to its corresponding olfactory threshold) was defined as 100. According to the consensus in flavor chemistry evaluation, combined with the globally normalized R O A V calculation logic of this study, when ROAV ≥ 1, the compound is determined to be a core odor-active compound that dictates the aroma profile of the sample; when 0.1 30% in QC samples were strictly filtered out during the data preprocessing stage. Primary classification statistics (a,b) revealed that the metabolites were highly enriched in categories such as phenylpropanoids, polyketides, organic acids and their derivatives, as well as lipids. These abundant polyphenols and amino acid groups constitute the foundational taste substances of green tea. The KEGG annotation results ( Figure S2) indicated that metabolites exhibited the most significant enrichment signals in amino acid metabolism and carbohydrate metabolism, with a total of 62 amino acid-related metabolites annotated in both positive and negative modes, reflecting the strong influence of Spreading and dehydration on fundamental carbon and nitrogen metabolism in tea leaves. Meanwhile, the HMDB database annotation results ( Figure S3) further revealed that metabolites were significantly clustered in phenylpropanoids and polyketides, which are key precursors for tea polyphenol synthesis. Additionally, the LIPIDMAPS database annotation results ( Figure S4) indicated a significant accumulation of lipids and lipid-like molecules, suggesting the activation of pathways involved in membrane lipid degradation. Based on comprehensive multi-dimensional annotation results, fresh tea leaves exhibit distinct coordinated responses in carbon and nitrogen metabolism during the Spreading process. Nitrogen metabolism, represented by free amino acid metabolism, evolves synchronously with carbon metabolism, characterized by phenylpropanoid and polyphenol synthesis. The dynamic transformation between these two pathways constitutes the core biochemical basis for the fluctuation in the taste quality of tea soup, providing robust metabolomic evidence for the subsequent construction and optimization of R-PAR. 3.2.2. Screening of Core Taste-Active Compounds and Construction of R-PAR The screening of core taste-active compounds followed a rigorous stepwise procedure to ensure data reliability and sensory relevance. According to the quality control standards for metabolomic data, unstable feature peaks with a coefficient of variation (CV) greater than 30% in all quality control (QC) samples were first eliminated to remove instrumental noise and false-positive interferences from unstable detection. Subsequently, unsupervised principal component analysis (PCA) was performed on the globally QC-cleaned UHPLC-HRMS profiles, and a preliminary set of candidate compounds highly responsive to spreading stress was identified using an absolute eigenvector loading threshold of |Loadings| > 0.05. This candidate pool was then refined using established knowledge of taste contribution [ 36] and the criterion of relative abundance dominance, further eliminating interfering substances with strong mass spectrometric response but low sensory activity. The final selection was narrowed to the core constituents governing the astringency–umami balance after spreading: among polyphenols, six flavonoid glycosides and three catechins known to drive the bitter–astringent evolution of green tea were selected; among amino acids, L-glutamic acid, L-theanine, and L-aspartic acid were chosen (detailed information on the core compounds is provided in ). In terms of compound identification, the mass errors of all targeted markers were strictly controlled within ≤5 ppm, and accurate matching was achieved by integrating the NovoMetDB database and a high-quality local MS2 spectral library. After the above process, the 12 core substances shown in were finally established as the chemical basis for defining the R-PAR. This selection aligns with the consensus of previous metabolomic studies on tea processing [ 37]. Based on the twelve core taste substances ultimately identified, their corresponding chromatographic peak areas were extracted and subjected to range normalization. The relative phenolic-to-amino ratios for the various processing groups were calculated, as illustrated in . This dimensionless characteristic index was established to characterize the evolution of core taste substances under different spreading and Spreading processes. 3.3.1. Volatile Metabolite Profiling and Metabolic Pathway Analysis After chromatographic peak deconvolution and matching, along with rigorous denoising pretreatment (eliminating derivatization mismatches and trace environmental contaminants), a total of 261 volatile compounds were identified in the test samples ( Table S1). Detailed classification and retention indices (RI) of all volatile constituents are provided in Table S1. Based on their chemical structures, these compounds were categorized into nine classes (a). Among them, alcohols were the most abundant (67 compounds, 25.67%), followed by esters (40, 15.33%), ketones (38, 14.56%), alkenes (36, 13.79%), and aldehydes (30, 11.49%), together accounting for over 80% of the total. Alkanes (18, 6.90%) and aromatic compounds (15, 5.75%) each exceeded ten compounds, whereas acids (10, 3.83%) and lactones (7, 2.68%) constituted relatively low proportions in terms of compound numbers (b). Consistent with previous findings [ 38], esters, alcohols, and ketones were the predominant aroma constituents of steamed green tea, further confirming the reliability of the analytical results. To investigate the formation mechanism of aroma under the physical stress of spreading and Spreading, the metabolites detected by HS-SPME-GC-MS were mapped to the HMDB, KEGG, and LIPIDMAPS databases (). Cross-annotation between HMDB (a) and LIPIDMAPS (b) revealed a predominance of lipids and lipid-like molecules. This finding confirms that as water loss occurs during the processes of spreading and Spreading, the metabolic pathways of hydrolytic enzymes are significantly activated, promoting the release of fatty aldehydes and alcohols, which form an essential foundation for the aroma profile of the spread-withered leaves. Furthermore, the accumulation of other core aroma compounds follows distinct pathway branches. The isoprenoids detected in LIPIDMAPS and the phenolic compounds in HMDB constitute the critical structural backbone of characteristic aroma components. The KEGG enrichment analysis (c) further indicates that aromatic compounds are highly active in carbohydrate and amino acid metabolic pathways. This suggests that during the spreading process, not only does membrane lipid degradation occur, but the enzymatic release of glycosidic aroma precursors and the transformation of free amino acids also undergo intense and synchronous evolution, thereby laying the foundation for the accumulation of compounds such as terpenoid alcohols (linalool) and aromatic alcohols (benzyl alcohol). These findings are consistent with previous research conclusions [ 39]. In summary, the aroma evolution of fresh tea leaves during the spreading process represents a dynamic balance driven by both lipid degradation metabolism and isoprenoid/aromatic metabolism. This principle provides theoretical support for subsequent calculations of core substance ROAV and the construction of FII. 3.4.1. Evaluation of the Tea Spreading Quality Prediction Model In terms of predictive performance, the model achieved an R pred2 of 0.81 and an RMSE of just 0.11 for R-PAR; for FII, it attained an R pred2 of 0.77 and an RMSE of 13.72. Notably, given the inherently high volatility of tea aroma compounds driven by environmental and biological variability, the RMSE of 13.72 for FII falls well within the accepted tolerance band for digital modeling in food processing. Overall, these results demonstrate that the TabPFN model can effectively learn the mapping between spreading conditions and the core taste and aroma components, with an inference accuracy that meets current analytical needs. This provides a viable machine-learning foundation for the subsequent global multi-objective optimization. 3.4.2. Multi-Objective Synergistic Optimization Results To identify the spreading parameter combination that delivers the optimal balance between taste and aroma, a grid-based virtual optimization was carried out using the TabPFN model. The search revealed that as the D-value approached its optimum, the model identified an optimal set of process conditions: initial moisture content 76.8%, spreading temperature 26.2 °C, relative humidity 61.5%, and airflow rate 0.85 m·s −1. Under these conditions, the predicted R-PAR dropped to 0.465, while the FII climbed to 125.70. The 95% confidence intervals of the predicted means at the optimal point were: R-PAR [0.41, 0.52] and FII [119.3, 132.1]. To verify the reliability of the model predictions, fresh leaves of Echa No. 10 from the same season were used, and three independent, complete spreading and detection runs were performed as parallel validation experiments based on the practically achievable process parameters: initial moisture content 77.0%, spreading temperature 26.0 °C, relative humidity 61.5%, and airflow rate 0.9 m·s −1. The comparative results are summarized in . The measured R-PAR averaged 0.478 ± 0.015, and the FII averaged 122.98 ± 3.26. The intra-group relative standard deviations (RSD) were 3.14% and 2.65%, respectively, both well below the commonly accepted 5% quality-control threshold for tea physicochemical testing, indicating satisfactory repeatability and reliable measurements. All model-predicted values fell completely within the 95% confidence intervals of the measured values, and the confidence intervals of the predicted means overlapped with those of the measured values, indicating no statistically significant difference between the predicted and measured values. This confirms the accuracy and reliability of the TabPFN model in predicting the dual quality indicators under the optimal spreading process. As corroborated by prior research, micro-perturbations in processing parameters exert a decisive influence on the characteristic aroma quality of tea [ 41]. In this study, the optimal spreading configuration identified by the model validates the accuracy of TabPFN in characterizing the complex nonlinear relationship between the spreading process and quality attributes. Furthermore, these findings substantiate the immense potential of process optimization for the targeted modulation of taste and aroma. 3.5.1. Model Performance Comparison compares the performance of TabPFN with three widely used models: response surface methodology (RSM), partial least squares regression (PLSR), and support vector regression (SVR). The results show that TabPFN exhibited superior generalization for both quality indicators, as visualized in . In the prediction of R-PAR, TabPFN achieved a predictive coefficient of determination (R pred2) of 0.81 and a root mean square error (RMSE) of 0.11. Compared to the second-best performing SVR model (R pred2 = 0.69, RMSE = 0.142), TabPFN demonstrated a 17.4% relative increase in R pred2 and a 27.6% reduction in RMSE. Regarding FII prediction, the TabPFN model reached an R pred2 of 0.77 and an RMSE of 13.72, representing improvements of 20.3% and 20.1%, respectively, over SVR (R pred2 = 0.64, RMSE = 17.17). The above results indicate that the TabPFN model achieved a favorable level of performance in the small-sample food processing modeling scenario, but has not yet reached the threshold of excellence. This performance level demonstrates that the current model can effectively capture the major variation trends between process parameters and quality indicators, yet approximately 19–23% of the variation remains unexplained. This unexplained variation may originate from biological fluctuations among fresh leaf batches, inherent detection errors in non-targeted metabolomic data, and potential influencing factors not included in the input space. Therefore, the results of this study should be positioned as a proof-of-concept validation of TabPFN in this application scenario, rather than a mature model meeting industrial-grade precision requirements. 3.5.2. Mechanistic Interpretation of Performance Discrepancies and Application Suitability Viewed through the lens of model principles and their adaptability to green tea spreading scenarios, a comparison of the inference mechanisms underlying the various modeling strategies is illustrated in . When addressing such small-sample process modeling scenarios, the three mainstream modeling approaches exhibit certain inherent limitations. Although RSM provides acceptable macroscopic predictive capability within the experimental design range, it is constrained by the fixed structure of second-order polynomials, which fails to precisely characterize the higher-order nonlinear regulatory effects of processing parameters on the metabolism of taste and aroma compounds. The linear modeling assumption of PLSR struggles to adequately capture the complex nonlinear associations between processing conditions and the metabolism of polyphenols and amino acids. Regarding SVR, kernel extrapolation in the sparse regions of small samples is dominated by a limited number of support vectors, thereby lacking physical constraints consistent with enzymatic kinetics and continuous water loss patterns. In contrast, TabPFN compensates for these core deficiencies. By performing meta-learning on massive synthetic tabular datasets, universal statistical priors are solidified into the network parameters. This allows the model to construct smooth and robust nonlinear mappings within a sparse process parameter space without requiring gradient iteration or hyperparameter tuning on the small sample set, effectively mitigating the risks of overfitting and multicollinearity. This not only elucidates the superior performance demonstrated in but also substantiates that the optimal spreading process derived from this model is both highly scientific and reliable. 3.6. Research Limitations This study, by integrating multi-dimensional omics techniques with a small-sample machine learning model, provides an effective pathway for the intelligent optimization of the green tea spreading process. However, to comprehensively evaluate the scientific boundaries and engineering application potential of this study, the following limitations require systematic and critical discussion: (1) Limitations of proxy indicators and analytical sample sensory validation. This study focused on elucidating the quality evolution during the single spreading process; therefore, microwave fixation was directly applied to terminate the experiment once spreading was completed. From a processing perspective, the tea leaves did not undergo the complete manufacturing process, and consequently, no quantitative professional sensory descriptive analysis (QDA) validation was conducted. (2) The model development data were derived solely from spring fresh leaves of a single tea cultivar (Echa No. 10), and the model has not yet been validated on independent external datasets. In actual agricultural production, the metabolism of fresh tea leaves fluctuates with cultivar genotype and environmental factors across harvesting seasons. Although TabPFN has demonstrated inferential robustness on small samples, this scale remains insufficient to cover the extreme environmental conditions and fresh leaf moisture states encountered in natural settings. (3) At the level of industrial application, this study was conducted on a multi-parameter controllable thin-layer test bench. In large-scale industrial continuous production, unevenness in tea leaf pile thickness and non-steady-state variations in workshop temperature and humidity will introduce substantial environmental noise. The current model remains at the proof-of-concept stage. Before formal deployment in large-scale production, it is imperative to establish an expanded database covering multiple cultivars and multiple seasons, and to incorporate online monitoring data from actual production lines for dynamic model calibration and iterative updating. This study systematically elucidated the microstructural alterations and the dynamic trajectories of taste and aroma profiles under water-deficit stress during green tea spreading. Furthermore, a high-fidelity predictive model for dual-quality indicators, specifically optimized for small-sample scenarios, was successfully established. The core conclusions are as follows: (1) SEM observations confirmed that water loss during spreading disrupts the biochemical compartmentalization within fresh leaf cells, thereby establishing the requisite biophysical environment for subsequent enzymatic transformations. Additionally, metabolomic-driven approaches were employed to precisely quantify R-PAR and FII. (2) Addressing the inherent challenge of small sample sizes in tea processing research, this study employed the TabPFN algorithm to construct a high-precision quality prediction model. The synergistic optimization of taste and aroma was achieved under the following optimal parameters: an initial moisture content of 76.8%, a spreading temperature of 26.2 °C, a spreading relative humidity of 61.5%, and a spreading airflow velocity of 0.85 m·s −1. Under these optimal conditions, the R-PAR and FII reached 0.465 and 125.70, respectively. (3) The predictive accuracy and generalization capacity of the TabPFN model significantly outperformed both traditional baseline models (RSM and PLSR) and conventional machine learning algorithms (SVR). This substantiates the scientific reliability of utilizing TabPFN for quality prediction in the green tea spreading process. (4) The model in this study was constructed based on spring fresh leaves of a single tea cultivar, and its cross-cultivar and cross-season generalization ability remains to be further validated. R-PAR and FII are chemical proxy indicators, and their quantitative relationship with human sensory evaluation results still requires subsequent investigation to support the industrial application of the model. Funding This research was funded by the Key Research and Development Plan of Hubei Province, grant number 2025BBB053; the Hubei Provincial Rural Revitalization Project, grant number 2025EBA037 (to Shengpeng Wang); and the APC was funded by Junyi Chen. The following abbreviations are used in this manuscript: CCD Central Composite Design CV Coefficient of Variation DDA Data-Dependent Acquisition FII Floral Intensity Index GC-MS Gas Chromatography–Mass Spectrometry HMDB Human Metabolome Database HPLC High-Performance Liquid Chromatography HS-SPME Headspace Solid-Phase Microextraction KEGG Kyoto Encyclopedia of Genes and Genomes LIPID MAPS Lipid Metabolites and Pathways Strategy LOOCV Leave-One-Out Cross-Validation MS Mass Spectrometry PCA Principal Component Analysis PLSR Partial Least Squares Regression QC Quality Control RI Retention Index RMSE Root Mean Square Error ROAV Relative Odor Activity Value R-PAR Relative Polyphenol-to-Amino Acid Ratio RSD Relative Standard Deviation RSM Response Surface Methodology SEM Scanning Electron Microscopy SVR Support Vector Regression TabPFN Tabular Prior-Data Fitted Network UHPLC Ultra-High Performance Liquid Chromatography UHPLC-HRMS Ultra-High Performance Liquid Chromatography–High-Resolution Mass Spectrometry Figure 1. Multi-parameter controllable thin-layer spreading test bench: ( a) Physical photograph; ( b) Schematic diagram. 1. Air flow guiding device; 2. Axial fan; 3. Test bench door; 4. Material tray; 5. Sensor compartment; 6. Dehumidification fan; 7. Electric heater; 8. Test bench outer casing; 9. Electric sealing valve; 10. Test bench side panel; 11. Sensor. Figure 1. Multi-parameter controllable thin-layer spreading test bench: ( a) Physical photograph; ( b) Schematic diagram. 1. Air flow guiding device; 2. Axial fan; 3. Test bench door; 4. Material tray; 5. Sensor compartment; 6. Dehumidification fan; 7. Electric heater; 8. Test bench outer casing; 9. Electric sealing valve; 10. Test bench side panel; 11. Sensor. Figure 2. Construction workfl

www.mdpi.com

Zum Originalartikel

Foods, Vol. 15, Pages 2069: Predicting and Co-Optimizing the Taste and Aroma of Green Tea During Spreading Using the TabPFN Model

https://insideparadeplatz.ch/2026/06/08/musk-und-spacex-the-great-hallucination/#comment-1218669

Welchen Vaporisator?

Foods, Vol. 15, Pages 2069: Predicting and Co-Optimizing the Taste and Aroma of Green Tea During Spreading Using the TabPFN Model

https://insideparadeplatz.ch/2026/06/08/musk-und-spacex-the-great-hallucination/#comment-1218669

Welchen Vaporisator?

Prometheus - Die linke Stimme der Schweiz