Advance Journal of Food Science and Technology 6(10): 1143-1146, 2014 ISSN: 2042-4868; e-ISSN: 2042-4876 © Maxwell Scientific Organization, 2014 Submitted: July 24, 2014 Accepted: September 13, 2014 Published: October 10, 2014 Application of Statistical Analysis Software in Food Scientific Modeling 1 Miaochao Chen, 2Kong Xiangsheng and 1Kan Chen Department of Mathematics, Chaohu University, Hefei, 238000, P.R. China 2 Department of Computer and Information Engineering, Xinxiang University, Xinxiang City, Henan Province, P.R. China 1 Abstract: In food scientific researches, sophisticated statistical analysis problems often can be met and in this study, through SPSS statistical analysis software, the method of the curve regression model and the multiple regression model that both are common in food science has been established and the experimental results show that the method can be effectively used in the statistical analysis model of food science. Keywords: Food science, modeling, SPSS statistical analysis software INTRODUCTION SPSS is a widely used program for statistical analysis in social science. It is also used by market researchers, health researchers, survey companies, government, education researchers, marketing organizations, data miners and others. The original SPSS manual (Nie et al., 1970) has been described as one of "sociology's most influential books” for allowing ordinary researchers to do their own statistical analysis. In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored in the data file) are features of the base software (Du and Ke, 2013). In food science searches, the sample sources are complex, the production processes are various and there are many links of quality control measures of processing (Eke et al., 2013). In order to save a lot of manpower and material resources, speed up the development of new products and control the quality of products better, the important problems for food scientific researchers that must be solved are to establish the corresponding mathematical model and reveal universal natural laws. However, if we use the traditional statistical analysis method to analyze and process the experimental data and set up a mathematical model, the calculation process will be very complicated. On the basis of being engaged in experimental design and the teaching of statistical analysis for a long time, the authors of this study combine the actual problems in food scientific researches and give the introduction of the application of SPSS statistical software in the modeling of food scientific researches as follows (Jiwen et al., 2013). Fig. 1: The relationship of the specific activity between substrate concentration and enzyme MATERIALS AND METHODS In the problems of food scientific researches, sometimes two variables have a linear relationship, but in most cases, they always haven’t a linear relationship but a nonlinear relationship. For example, when making a Michaelis-Menten equation and the calculation of Michaelis-Menten constants, we can measure the relationship between the specific activity of enzyme and the substrate concentration and get the 9 pairs of data as shown in Table 1. Make the scatter diagram with these data (Fig. 1) and we can see that the distribution of these points is in a curve shape (Zhao et al., 2011). The basic task of curve regression analysis is to establish a curve regression equation through the actual observation data of the two correlated variables x and y so as to reveal the form of the curve relation between x and y. There are usually three ways to determine the type of the curve relationship between x and y: first, use professional knowledge as well as the known theoretical laws and practical experience; second, when Corresponding Author: Kan Chen, Department of Mathematics, Chaohu University, Hefei, 238000, P.R. China 1143 Adv. J. Food Sci. Technol., 6(10): 1143-1146, 2014 Table 1: Measured values of the research of the reasons of non-enzymatic browning in the manufacturing process of a brand of peach juice Non-enzymatic browning Determination No. Ieucoanthocyanin X1 Anthocyanin X2 Maillard reaction X3 Ascorbic content X4 package value y 1 0.045 0.032 0.008 2.654 9.520 2 0.063 0.013 0.007 2.891 8.960 3 0.054 0.025 0.005 3.251 8.420 4 0.062 0.012 0.004 3.218 8.320 5 0.060 0.009 0.013 2.854 7.650 6 0.053 0.016 0.011 3.264 7.850 7 0.041 0.021 0.015 2.785 7.210 8 0.074 0.011 0.008 3.216 8.470 9 0.046 0.015 0.021 3.518 9.460 10 0.065 0.021 0.018 2.920 10.910 11 0.071 0.016 0.017 3.464 12.410 12 0.094 0.012 0.011 3.217 9.760 13 0.048 0.015 0.021 2.768 8.320 14 0.057 0.021 0.017 3.276 7.870 15 0.021 0.013 0.013 2.896 7.980 16 0.095 0.021 0.011 3.217 8.120 there’s no theoretical law and practical experience to be used, use the point depiction method to draw the actual measuring points on the paper with rectangular coordinates and observe the nearest function curve of the distribution trend of the actual measuring points and then use the function relation and make it matching with the curve relation; third, in the SPSS viewer, make judgment according to the squared value of the correlation coefficient and the larger the squared value of the correlation coefficient is, the better the model will be Zagorska et al. (2011). Curve regression analysis: Step 1: Input the data in Table 1 into SPSS data editor, select “Analyze-Regression-Curve Estination” in turn as shown in Fig. 2 and then the dialog box of (Curve Estination) can be opened. Step 2: Select the variable “ ” and enter into the box of “Variables”. Make “Y” enter into the box of “Dependents”. And in the column of “Models”, select all the options and press (OK) to end it. In the column of “Models”, there are 11 kinds of fitting models and the function expression that is represented by all model options is as follows. For the option of Linear, use a linear model for fitting and the model is: y = b 0 + b 1 x. For the option of Quadratic, use a quadratic polynomial for fitting and the model is: y = b 0 + b 1 x + b 2 x2. For the option of Compound, use a complex model for fitting and the model is: y = b 0 (b 1 )x. For the option of Growth, use a growth model for fitting and the model is: π¦π¦ = ππ (ππ0 + ππ1 π₯π₯) . For the option of Logarithmic, use a logarithmic model for fitting and the model is: y = b 0 + b 1 ln (x). For the option of cubic, use a cubic polynomial for fitting and the model is: y = b 0 + b 1 x + b 2 x2 + b 3 x3. For the option of s, use a s curve for fitting and the model is: y = exp (b 0 + b 1 ) /x. For the option of Exponential, use a logarithmic model for fitting and the model is: π¦π¦ = ππ0 ππ ππ1 π₯π₯ . For the option of inverse, use a hyperbolic curve for fitting and the model is: y = (b 0 + b 1 ) /x. For the option of Power, use a power exponent model for fitting and the model is: y = b 0 π₯π₯ ππ1 . For the option of Logistic, use a logic model for fitting and the model is: y = 1/ (1/u + b 0 b 1 x). Multiple regression analysis: In food scientific researches, there are many independent variables that can affect the dependent variable y. For instance, the factors that affect the quality of food products include the processing temperature, the pressure of sterilization, the time of sterilization, PH value, etc. Thus, it is necessary to further discuss the regression of 1 dependent variable and several independent variables as well as the relevant problems (Wythe et al., 2013). The basic task of multiple regression analysis is to establish the regression equation between the dependent variable and all the independent variables according to the actual observation values of all dependent variables and independent variables so as to reveal the specific form of the relation between the dependent variable and all the independent variables and the purpose is to use the established regression equation for prediction and control. The calculation of multiple regression analysis is complex and SPSS software can make the computational process “a fool”. For example, in the research of the reasons of non-enzymatic browning in the manufacturing process of a brand of peach juice, the ieucoanthocyanin (X1), anthocyanin (X2), Maillard reaction (X3) and non-enzymatic browning package value (Y) and the results are shown in Table 1 and now we make regression analysis on the relationship between Y and X1, X2 and X3: Step 1: After input the data in Table 2 into SPSS data editor, select “Analyze-Regression Linear” in turn as shown in Fig. 3 and then the dialog box of (Linear Regression) can be opened. 1144 Adv. J. Food Sci. Technol., 6(10): 1143-1146, 2014 Table 2: Summary table of models Change statistics --------------------------------------------------------------------------------Sig. F 2 2 Model R R Adjusted R S.E.E. R2 change F change df1 df2 change 1 0.821 0.674 0.651 0.86640 0.674 28.971 1 14 0 2 0.885 0.784 0.751 0.73227 0.110 6.598 1 13 0.023 3 0.939 0.882 0.853 0.56235 0.098 10.044 1 12 0.008 a predictors: (constant) X2; b predictors: (constant) X2 X1; c predictors: (constant) X2 X1 X4; d dependent variable: Y; S.E.E.: Standard error of the estimate Table 3: Output results of the curve regression Model summary ---------------------------------------------------------------------------------Equation R2 F df1 df2 Sig. Linear 0.691 15.639 1 7 0.005 Logarithmic 0.886 54.382 1 7 0.000 Inverse 0.984 435.179 1 7 0.000 Quadratic 0.923 35.739 2 6 0.000 Cubic 0.986 115.349 3 5 0.000 Compound 0.592 10.172 1 7 0.015 Power 0.809 29.638 1 7 0.001 S 0.962 179.038 1 7 0.000 Growth 0.592 10.173 1 7 0.015 Exponential 0.594 10.172 1 7 0.015 Logistic 0.591 10.170 1 7 0.015 Step 2: Select the left “Y” into the right “dependent (dependent variable), select “X1”, “X2”, “X3” and “X4” into the right “Independent” (independent variable and select “Stepwise (stepwise regression method)” in the “Method” here, Enter, Forward, Backward and other methods also can be selected, tut the results are different as shown in Fig. 4. Step 3: After pressing the button (statisticβ―), it is as shown in Fig. 5, check “Estimates” (estimated value), “Confidence intervals (confidence interval)”, “Modelfit” (test of goodness of fit of regression model), “Rsquared change (square of correlation coefficient)” as well as the residual “Durbin Watson” and then click the (Continue) button to go back to the dialog box of (Linear Regression) and press (OK) to end it. RESULTS AND DISCUSSION Curve regression analysis: The output results are shown in Table 3. In the above results, the meaning of each column of data is: • • • • • • • • Dependent Variable: dependent variable Equation: the selected model used for fitting R Square: the squared value of the correlation coefficient F: F value dfl and df2: degree of freedom Sig: significance level Constant b0: constant term bl-b3: the fitted values of the undetermined coefficients in the model Parameter estimates ------------------------------------------------------------------Constant b1 b2 b3 24.337 4.373 18.697 20.686 68.286 -64.850 2.534 17.289 -1.190 -19.443 36.714 -5.509 0.264 24.459 1.119 20.755 0.551 4.378 -1.787 3.197 0.113 24.459 0.113 0.041 0.893 The results in Table 3 are the optimal fitted results of all models and the advantages and disadvantages of all models can be compared by comparing the squared values (R Square) of the correlation coefficients. The larger the squared values of the correlation coefficients are, the better the model is. In CUB (the cubic polynomial model), Rsq = 0.986, namely the squared value of the correlation coefficient is bigger, so choosing this kind of model for fitting is the most appropriate. The model of adopting the cubic polynomial for fitting is: y= −19.443 + 36.714 x − 5.509 x 2 + 0.264 x 3 Multiple regression analysis: Table 2 is the summary table of models. In the table, every step of the correlation coefficient (R), the square of the correlation coefficient (R Square), the adjusted square of the correlation coefficient (Adjusted R Square), the standard error of estimate (Std. Error of the Estimate) and the change statistics (Change Stat cs) including the square of the correlation coefficient (R square Change), F value (F Change), the first degree of freedom (df), the second degree of freedom (df2) and the significant possibility (Sig. F Change) etc., have been listed. The footnote under the table shows every step of item used for prediction (including independent variables and constant terms). Table 2 is the table of coefficient analysis. In the table, every step of constant terms and the corresponding un-standardized coefficients have been listed, including the values (B) and the standard error (Std Error) of the coefficients of constant terms and variables, Beta value of standardized coefficients, t 1145 Adv. J. Food Sci. Technol., 6(10): 1143-1146, 2014 value and the significance level (Sig.) and the values of the undetermined coefficients of independent variables and the confidence interval of 95% of constant terms (95% Confidence Interval for B). Based on the above information, we can obtain the multiple regression equation by the stepwise regression method: y= −79.641x1 + 207.215 x2 + 1.491x4 + 5.484 and the correlation coefficient of the model i 0.939. REFERENCES Du, L. and Y. Ke, 2013. Research and development on food nutrition statistical analysis software system. Adv. J. Food Sci. Technol., 5(12): 1637-1640. Eke, M.O., N.I. Olaitan and H.I. Sule, 2013. Nutritional evaluation of yoghurt-like product from baobab (Adansonia digitata) fruit pulp emulsion and the micronutrient content of baobab leaves. Adv. J. Food Sci. Technol., 5(10): 1266-1270. Jiwen, G., R. Guihua, M. Wenjie, C. Huafeng and W. Shuyuan, 2013. Water quality assessment of Gufu river in three gorges reservoir (China) using multivariable statistical methods. Adv. J. Food Sci. Technol., 5(7): 908-920. Wythe, H., C. Wilkinson, J. Orme, L. Meredith and E. Weitkamp, 2013. Food hygiene challenges in older people: Intergenerational learning as a health asset. WIT Tr. Biomed. Health, 16: 211-224. Zagorska, J., I. Eihvalde, I. Gramatina and S. Sarvi, 2011. Evaluation of colostrum quality and new possibilities for its application. Proceeding of the 6th Baltic Conference on Food Science and Technology: Innovations for Food Science and Production (FOODBALT-2011), pp: 45-49. Zhao, H., B. Guo, Y. Wei, B. Zhang, S. Sun, L. Zhang and J. Yan, 2011. Determining the geographic origin of wheat using multielement analysis and multivariate statistics. J. Agr. Food Chem., 59(9): 4397-4402. Note: Provide us missing Figure 2 to 5 which are mentioned in the text but not available with file. Note: At end note of Table 2 “a, b, c, d” are explained while not mention anywhere in Table 2 1146