Ray Rhew (chair of AIAA/GTTC’s Internal Balance Working Group) N. Ulbrich and T. Volden, Jacobs Technology Inc., M/S 227-5, M/S 227-1 NASA Ames Research Center, Moffett Field, CA 94035-1000 Email: norbert.m.ulbrich@nasa.gov, thomas.r.volden@nasa.gov, SUBJECT: New Chapter for AIAA Recommended Practice DATE: Dec. 13, 2010 TO: FROM: Below is the promised “new” chapter for the AIAA Recommended Practice. It describes the evaluation of a math model of balance calibration data using different techniques and metrics from both numerical mathematics and statistics. The techniques and metrics are applicable to both the iterative and the non-iterative balance calibration data analysis methods that are used in the wind tunnel testing community today. The “new” chapter is an optimized and compressed version of some sections from the following technical paper that was presented in 2010: Ulbrich, N. and Volden, T., “Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data,” AIAA 2010-4545, paper presented at the 27th AIAA Aerodynamic Measurement Technology and Ground Testing Conference, Chicago, Illinois, June/July 2010; paper was also presented at the 7th International Symposium on Strain-Gauge Balances, Williamsburg, Virginia, May 2010. The “new” chapter should be inserted AFTER chapter “3.1.1 Math Model.” The text should be included as a new chapter in the AIAA Recommended Practice (not as a sub-chapter) because a lot of different topics are addressed. The “new” chapter should probably have the number “3.1.2.” Existing chapters with the same number and higher should be renumbered. 3.1.2 Math Model Evaluation 3.1.2.1 General Remarks New criteria for the evaluation of a regression model of balance calibration data have been introduced since the publication of the first edition of the present document (Ref. [X-1]). The new criteria are of interest to users of both the iterative and non-iterative balance calibration data analysis method as the new criteria may be applied to any solution of a multivariate global regression analysis problem. The new criteria, if correctly applied, will make sure that (i) calibration data is not overfitted and that (ii) a realistic assessment of the predictive capability of the math model of the calibration data is provided to the user of the balance. A numerical technique called Singular Value Decomposition was introduced, for example, to screen a math model of the balance calibration data for unwanted linear dependencies (Ref. [X-2]). In addition, a set of objective metrics was suggested for the evaluation of the math model of balance calibration data. These metrics have been used in linear regression analysis and statistics since the 1970s. They screen a math model for near-linear dependencies, test the statistical significance of individual terms of the math model, and try to directly assess its predictive capability (see Ref. [X-3], p.84, pp.125-126, pp.141-142, pp.323-341, for a description of the new evaluation criteria). Only the most important aspects of the new evaluation criteria are summarized in the next sections. The reader should review listed references for more detailed descriptions. 3.1.2.2 Math Term Selection Different steps can be identified that help an analyst build a math model of balance calibration data. Some of the steps were already addressed in section 3.1.1 above. Initially, the analyst has to decide if balance calibration data is to be processed using the iterative or the non-iterative method. This decision determines if balance loads or strain-gage outputs are the independent variables of the global regression analysis problem. Then, a suitable combination of function classes (e.g., intercept, linear terms, absolute value terms, square terms, combined terms, etc.) has to be chosen that will best represent the calibration data. At that point it is not yet clear which individual terms of the regression model are actually supported by the calibration load schedule design. Often, as pointed out in chapter 3.1.1, an analyst uses empirical knowledge about the design of the balance and the applied calibration loads in order to select the best combination of individual math model terms. However, it is also possible to use a numerical technique from linear algebra and a set of objective evaluation metrics from statistics for the selection of the best term combination. They are discussed in more detail below. 3.1.2.3 Linear Dependency Test Individual terms of a math model are the regressors for the global regression analysis of balance calibration data. The regressors are computed as a function of the independent variables. Some regressors, e.g., a combined normal-axial force term, may not be supported by the calibration data because no combined normal-axial force loadings were applied during the balance calibration. Such an unsupported regressor would introduce a linear dependency in the math model that would lead to a singular solution of the global regression analysis problem. Therefore, all unsupported regressors need to be identified and removed from the math model before the math model can be evaluated in more detail. In general, the application of a numerical technique from linear algebra is recommended in order to test the linear dependency of regressors. The technique is called Singular Value Decomposition (SVD). The technique is applied to the vector space defined by the regressors in order to identify and eliminate a linearly dependent term in the math model. The application of SVD helps define the largest math model for a given calibration data set and function class selection that will lead to a nonsingular solution of the regression analysis problem. References [X-2] and [X-4] provide more information about SVD. SVD essentially replaces the empirical "selection-by-inspection process" that would otherwise have to be used in order to remove unsupported terms from the math model. 3.1.2.4 Near-Linear Dependency Test “Near-linear” dependencies (also called “collinearity” or “multicollinearity”) between terms of a math model also have to be avoided as they could diminish the predictive capability of a math model (see Ref. [X-5] for more details). In addition, it has been observed that a direct connection exists between the divergence of the iteration equation that the iterative method uses for the analysis of balance calibration data and the presence of massive “near-linear” dependencies in a math model (see Ref. [X-6], p. 4). Therefore, terms that cause “near-linear” dependencies in a math model need to be identified and removed. The removal of these terms (i) helps prevent overfitting of the calibration data and (ii) avoids convergence problems if the iterative method is used for analysis. Different techniques are recommended in the literature to diagnose and avoid “near-linear” dependencies in a math model (see, e.g., Ref. [X-3], pp. 334-340). Most analysts prefer to use the variance inflation factor (VIF) for this purpose (see Ref. [X-7] for a detailed explanation of steps that are needed to compute VIFs of a math model). The VIF is computed for each individual term of a math model. “Near-linear” dependencies in a math model are negligible if the largest VIF of all terms of a tested math model is smaller than the recommended threshold of 10. 3.1.2.5 Test of Statistical Significance of Terms This test allows an analyst to identify and remove statistically insignificant terms of the math model that may cause overfitting of calibration data. The test of the statistical significance of individual coefficients of a regression model looks at the standard error of each coefficient. The standard error is an estimate of the standard deviation of the coefficient. It is a measure of the precision with which the regression coefficient is measured. A general rule from statistics says that a coefficient should be included in the math model if it is large compared to its standard error. Traditionally, the “t-statistic of a coefficient” is used in science and engineering to quantitatively compare a coefficient with its standard error. The t-statistic equals the ratio between the coefficient value and its standard error (for more detail see, e.g., Ref. [X-3], p. 84). A coefficient is probably “significant” if its t-statistic is greater than the critical value of Student's t-distribution. This comparison can also be performed using the p-value of the coefficient. The p-value of a coefficient is determined from a comparison of the t-statistic with values in Student's t-distribution. With a p-value of, e.g., 0.01 (or 1 %) one can say with a 99 % probability of being correct that the coefficient is having some effect. The threshold for the p-value may range from a conservative value of 0.0001 to a liberal value of 0.05. A decrease of the p-value threshold, e.g., from 0.001 to 0.0001, tightens the term rejection criterion. Therefore, the final math model will have fewer terms and is expected to have smaller extrapolation errors if it is applied in the vicinity of the boundaries of the fitted data set. In the case of a force balance, for example, the “t-statistic of a coefficient” will usually tell an analyst that the primary gage loadings are the most important terms of the math model (this is something that an analyst would intuitively anticipate). In other words, the “t-statistic of a coefficient” is closely related to the "physics" that is contained in the calibration data as the "most relevant" terms are identified using an objective metric. It is interesting to note that the “t-statistic of a coefficient” and the “percent contribution of a coefficient” are used in the wind tunnel testing community for the same purpose, i.e., the assessment of the importance of an individual term of the regression model of the calibration data. The “percent contribution of a coefficient,” however, has two disadvantages if compared with the “t-statistic of a coefficient”: (1) The “percent contribution of a coefficient” is only known in the wind tunnel testing community. The “t-statistic of a coefficient,” on the other hand, is applied in every field of science and engineering that performs a multivariate regression analysis of data. (2) The “percent contribution of a coefficient” is computed by using the independent variables all simultaneously at their maximum values. In other words, it assesses the significance of individual terms by using an artificial condition that may never exist in a real-world application of the regression model of the balance calibration data. Therefore, after considering the disadvantages discussed above, some groups within the wind tunnel testing community believe that the “t-statistic of a coefficient” is the more reliable test. 3.1.2.6 Hierarchy Rule Some analysts prefer to use a “hierarchical” math model, i.e., a math model that does not have missing lower order terms, for the analysis of balance calibration data. Only a “hierarchical” math model has the ability to correctly model calibration data whenever independent variables have a constant shift (see the discussion of the “hierarchy rule” in Refs. [X-7], pp.6-8, and [X-8]). Often it is unclear if a given experimental data set needs to be modeled using a “hierarchical” math model. An analytic first principles description of the experimental data may not be known that could be used as the basis for the math model term selection. Thus, it is left to the analyst to decide if (i) the math model should be made “hierarchical” to prevent a suspected problem with a “non-hierarchical” model or if (ii) the analysis should be performed using a “non-hierarchical” model. 3.1.2.7 Predictive Capability Tests The predictive capability of the final regression model still needs to be tested after the completion of the term selection process. These tests make sure that the math model meets expected accuracy requirements. The predictive capability may be tested using two types of points: data points and confirmation points. Data points are the original calibration points that are used to compute the math model coefficients using a global regression analysis approach. Confirmation points, on the other hand, are points that are independent of the regression analysis of the calibration data. Reference [X-9] provides some additional information about the two types of points. In general, three types of residuals may be computed from data points and confirmation points. The first residual type is defined as the difference between the measured and fitted loads at the data points. This residual is widely used in the wind tunnel testing community for the assessment of the predictive capability of a regression model. The second residual type is the so-called PRESS residual of the data points. PRESS residuals are very useful as they are recommended in the literature for the comparison of the predictive capability of different math models (see, e.g., the discussion of PRESS residuals in Ref. [X-3], pp.125-126 and pp.141-142). The third residual type is defined by a set of confirmation points that is independent of the calibration data set. These residuals are computed as the difference between the measured and fitted load at a confirmation point. Three test metrics may be computed from the three residual types: (1) The standard deviation of the load residuals of data points, (2) The standard deviation of the PRESS residuals of data points, (3) The standard deviation of the load residuals of confirmation points. These three standard deviations can easily be compared with the accuracy requirements of the balance load predictions in order to decide if the calibration data and the math model meet expectations. The standard deviation of the load residuals of the confirmation points is the most important test metric as the math model's predictive performance is explicitly tested at points that were not used to perform the global regression analysis of the original calibration data. Sometimes, it may happen that certain load prediction accuracy requirements cannot be met. This observation could have several explanations: (i) a suboptimal function class combination was selected for the regression analysis of the balance calibration data that could not model important characteristics of the calibration data; (ii) a suboptimal load schedule was used during the calibration of the balance that omitted important loads or load combinations that are needed for a proper characterization of the physical behavior of the balance; (iii) large load and/or gage output measurement errors occurred that were not detected during the calibration of the balance; (iv) balance design characteristics (like, e.g., bolted joints) cause hysteresis effects that cannot easily be modeled using a global regression analysis approach. A detailed examination of the math model, the load schedule design, measurement error sources, and the balance design may be needed in order to improve the analysis results and/or the calibration data so that the desired accuracy of the load predictions can be met. References [X-1] AIAA/GTTC Internal Balance Technology Working Group, “Recommended Practice, Calibration and Use of Internal Strain-Gage Balances with Application to Wind Tunnel Testing,” AIAA R-091-2003, sponsored and published by the American Institute of Aeronautics and Astronautics, Reston, Virginia, 2003. [X-2] Ulbrich, N. and Volden, T., “Strain-Gage Balance Calibration Analysis Using Automatically Selected Math Models,” AIAA 2005-4084, paper presented at the 41st AIAA/ASME/SAE/ASEE Joint Propulsion Conference and Exhibit, Tucson, Arizona, July 2005. [X-3] Montgomery, D. C., Peck, E. A., and Vining, G. G., Introduction to Linear Regression Analysis, 4th ed., John Wiley & Sons, Inc., New York, 2006, p.84, pp.125-126, pp.141-142, pp.323-341. [X-4] Ulbrich, N., and Bader, J., “Analysis of Sting Balance Calibration Data Using Optimized Regression Models,” AIAA 2009-5372, paper presented at the 45st AIAA/ASME/SAE/ASEE Joint Propulsion Conference and Exhibit, Denver, Colorado, August 2009, pp. 3-4. [X-5] DeLoach, R. and Ulbrich, N., “A Comparison of Two Balance Calibration Model Building Methods,” AIAA 2007-0147, paper presented at the 45th AIAA Aerospace Sciences Meeting and Exhibit, Reno, Nevada, January 2007. [X-6] Ulbrich, N. and Volden, T., “Regression Analysis of Experimental Data Using an Improved Math Model Search Algorithm,” AIAA 2008-0833, paper presented at the 46th AIAA Aerospace Sciences Meeting and Exhibit, Reno, Nevada, January 2008. [X-7] Ulbrich, N., “Regression Model Optimization for the Analysis of Experimental Data,” AIAA 2009-1344, paper presented at the 47th AIAA Aerospace Sciences Meeting and Exhibit, Orlando, Florida, January 2009. [X-8] DeLoach, R., “The Role of Hierarchy in Response Surface Modeling of Wind Tunnel Data,” AIAA 2010-0931, paper presented at the 48th AIAA Aerospace Sciences Meeting and Exhibit, Orlando, Florida, January 2010. [X-9] Ulbrich, N., “Optimization of Regression Models of Experimental Data using Confirmation Points,” AIAA 2010-0930, paper presented at the 48th AIAA Aerospace Sciences Meeting and Exhibit, Orlando, Florida, January 2010.