ChapterForAIAADocumentV3

advertisement
Ray Rhew (chair of AIAA/GTTC’s Internal Balance Working Group)
N. Ulbrich and T. Volden,
Jacobs Technology Inc.,
M/S 227-5, M/S 227-1
NASA Ames Research Center,
Moffett Field, CA 94035-1000
Email: norbert.m.ulbrich@nasa.gov, thomas.r.volden@nasa.gov,
SUBJECT: New Chapter for AIAA Recommended Practice
DATE:
Dec. 13, 2010
TO:
FROM:
Below is the promised “new” chapter for the AIAA Recommended Practice. It describes the
evaluation of a math model of balance calibration data using different techniques and metrics
from both numerical mathematics and statistics. The techniques and metrics are applicable to
both the iterative and the non-iterative balance calibration data analysis methods that are used in
the wind tunnel testing community today.
The “new” chapter is an optimized and compressed version of some sections from the following
technical paper that was presented in 2010: Ulbrich, N. and Volden, T., “Regression Model Term
Selection for the Analysis of Strain-Gage Balance Calibration Data,” AIAA 2010-4545, paper
presented at the 27th AIAA Aerodynamic Measurement Technology and Ground Testing
Conference, Chicago, Illinois, June/July 2010; paper was also presented at the 7th International
Symposium on Strain-Gauge Balances, Williamsburg, Virginia, May 2010.
The “new” chapter should be inserted AFTER chapter “3.1.1 Math Model.” The text should be
included as a new chapter in the AIAA Recommended Practice (not as a sub-chapter) because a
lot of different topics are addressed. The “new” chapter should probably have the number “3.1.2.”
Existing chapters with the same number and higher should be renumbered.
3.1.2 Math Model Evaluation
3.1.2.1 General Remarks
New criteria for the evaluation of a regression model of balance calibration data have been
introduced since the publication of the first edition of the present document (Ref. [X-1]). The new
criteria are of interest to users of both the iterative and non-iterative balance calibration data
analysis method as the new criteria may be applied to any solution of a multivariate global
regression analysis problem.
The new criteria, if correctly applied, will make sure that (i) calibration data is not overfitted and
that (ii) a realistic assessment of the predictive capability of the math model of the calibration data
is provided to the user of the balance. A numerical technique called Singular Value
Decomposition was introduced, for example, to screen a math model of the balance calibration
data for unwanted linear dependencies (Ref. [X-2]). In addition, a set of objective metrics was
suggested for the evaluation of the math model of balance calibration data. These metrics have
been used in linear regression analysis and statistics since the 1970s. They screen a math model
for near-linear dependencies, test the statistical significance of individual terms of the math
model, and try to directly assess its predictive capability (see Ref. [X-3], p.84, pp.125-126,
pp.141-142, pp.323-341, for a description of the new evaluation criteria). Only the most important
aspects of the new evaluation criteria are summarized in the next sections. The reader should
review listed references for more detailed descriptions.
3.1.2.2 Math Term Selection
Different steps can be identified that help an analyst build a math model of balance calibration
data. Some of the steps were already addressed in section 3.1.1 above. Initially, the analyst has
to decide if balance calibration data is to be processed using the iterative or the non-iterative
method. This decision determines if balance loads or strain-gage outputs are the independent
variables of the global regression analysis problem. Then, a suitable combination of function
classes (e.g., intercept, linear terms, absolute value terms, square terms, combined terms, etc.)
has to be chosen that will best represent the calibration data. At that point it is not yet clear which
individual terms of the regression model are actually supported by the calibration load schedule
design. Often, as pointed out in chapter 3.1.1, an analyst uses empirical knowledge about the
design of the balance and the applied calibration loads in order to select the best combination of
individual math model terms. However, it is also possible to use a numerical technique from linear
algebra and a set of objective evaluation metrics from statistics for the selection of the best term
combination. They are discussed in more detail below.
3.1.2.3 Linear Dependency Test
Individual terms of a math model are the regressors for the global regression analysis of balance
calibration data. The regressors are computed as a function of the independent variables. Some
regressors, e.g., a combined normal-axial force term, may not be supported by the calibration
data because no combined normal-axial force loadings were applied during the balance
calibration. Such an unsupported regressor would introduce a linear dependency in the math
model that would lead to a singular solution of the global regression analysis problem. Therefore,
all unsupported regressors need to be identified and removed from the math model before the
math model can be evaluated in more detail.
In general, the application of a numerical technique from linear algebra is recommended in order
to test the linear dependency of regressors. The technique is called Singular Value
Decomposition (SVD). The technique is applied to the vector space defined by the regressors in
order to identify and eliminate a linearly dependent term in the math model. The application of
SVD helps define the largest math model for a given calibration data set and function class
selection that will lead to a nonsingular solution of the regression analysis problem. References
[X-2] and [X-4] provide more information about SVD. SVD essentially replaces the empirical
"selection-by-inspection process" that would otherwise have to be used in order to remove
unsupported terms from the math model.
3.1.2.4 Near-Linear Dependency Test
“Near-linear” dependencies (also called “collinearity” or “multicollinearity”) between terms of a
math model also have to be avoided as they could diminish the predictive capability of a math
model (see Ref. [X-5] for more details). In addition, it has been observed that a direct connection
exists between the divergence of the iteration equation that the iterative method uses for the
analysis of balance calibration data and the presence of massive “near-linear” dependencies in a
math model (see Ref. [X-6], p. 4). Therefore, terms that cause “near-linear” dependencies in a
math model need to be identified and removed. The removal of these terms (i) helps prevent
overfitting of the calibration data and (ii) avoids convergence problems if the iterative method is
used for analysis.
Different techniques are recommended in the literature to diagnose and avoid “near-linear”
dependencies in a math model (see, e.g., Ref. [X-3], pp. 334-340). Most analysts prefer to use
the variance inflation factor (VIF) for this purpose (see Ref. [X-7] for a detailed explanation of
steps that are needed to compute VIFs of a math model). The VIF is computed for each individual
term of a math model. “Near-linear” dependencies in a math model are negligible if the largest
VIF of all terms of a tested math model is smaller than the recommended threshold of 10.
3.1.2.5 Test of Statistical Significance of Terms
This test allows an analyst to identify and remove statistically insignificant terms of the math
model that may cause overfitting of calibration data. The test of the statistical significance of
individual coefficients of a regression model looks at the standard error of each coefficient. The
standard error is an estimate of the standard deviation of the coefficient. It is a measure of the
precision with which the regression coefficient is measured. A general rule from statistics says
that a coefficient should be included in the math model if it is large compared to its standard error.
Traditionally, the “t-statistic of a coefficient” is used in science and engineering to quantitatively
compare a coefficient with its standard error. The t-statistic equals the ratio between the
coefficient value and its standard error (for more detail see, e.g., Ref. [X-3], p. 84). A coefficient is
probably “significant” if its t-statistic is greater than the critical value of Student's t-distribution.
This comparison can also be performed using the p-value of the coefficient. The p-value of a
coefficient is determined from a comparison of the t-statistic with values in Student's t-distribution.
With a p-value of, e.g., 0.01 (or 1 %) one can say with a 99 % probability of being correct that the
coefficient is having some effect. The threshold for the p-value may range from a conservative
value of 0.0001 to a liberal value of 0.05. A decrease of the p-value threshold, e.g., from 0.001 to
0.0001, tightens the term rejection criterion. Therefore, the final math model will have fewer terms
and is expected to have smaller extrapolation errors if it is applied in the vicinity of the boundaries
of the fitted data set.
In the case of a force balance, for example, the “t-statistic of a coefficient” will usually tell an
analyst that the primary gage loadings are the most important terms of the math model (this is
something that an analyst would intuitively anticipate). In other words, the “t-statistic of a
coefficient” is closely related to the "physics" that is contained in the calibration data as the "most
relevant" terms are identified using an objective metric.
It is interesting to note that the “t-statistic of a coefficient” and the “percent contribution of a
coefficient” are used in the wind tunnel testing community for the same purpose, i.e., the
assessment of the importance of an individual term of the regression model of the calibration
data. The “percent contribution of a coefficient,” however, has two disadvantages if compared
with the “t-statistic of a coefficient”:
(1) The “percent contribution of a coefficient” is only known in the wind tunnel testing community.
The “t-statistic of a coefficient,” on the other hand, is applied in every field of science and
engineering that performs a multivariate regression analysis of data.
(2) The “percent contribution of a coefficient” is computed by using the independent variables all
simultaneously at their maximum values. In other words, it assesses the significance of individual
terms by using an artificial condition that may never exist in a real-world application of the
regression model of the balance calibration data.
Therefore, after considering the disadvantages discussed above, some groups within the wind
tunnel testing community believe that the “t-statistic of a coefficient” is the more reliable test.
3.1.2.6 Hierarchy Rule
Some analysts prefer to use a “hierarchical” math model, i.e., a math model that does not have
missing lower order terms, for the analysis of balance calibration data. Only a “hierarchical” math
model has the ability to correctly model calibration data whenever independent variables have a
constant shift (see the discussion of the “hierarchy rule” in Refs. [X-7], pp.6-8, and [X-8]).
Often it is unclear if a given experimental data set needs to be modeled using a “hierarchical”
math model. An analytic first principles description of the experimental data may not be known
that could be used as the basis for the math model term selection. Thus, it is left to the analyst to
decide if (i) the math model should be made “hierarchical” to prevent a suspected problem with a
“non-hierarchical” model or if (ii) the analysis should be performed using a “non-hierarchical”
model.
3.1.2.7 Predictive Capability Tests
The predictive capability of the final regression model still needs to be tested after the completion
of the term selection process. These tests make sure that the math model meets expected
accuracy requirements. The predictive capability may be tested using two types of points: data
points and confirmation points. Data points are the original calibration points that are used to
compute the math model coefficients using a global regression analysis approach. Confirmation
points, on the other hand, are points that are independent of the regression analysis of the
calibration data. Reference [X-9] provides some additional information about the two types of
points.
In general, three types of residuals may be computed from data points and confirmation points.
The first residual type is defined as the difference between the measured and fitted loads at the
data points. This residual is widely used in the wind tunnel testing community for the assessment
of the predictive capability of a regression model. The second residual type is the so-called
PRESS residual of the data points. PRESS residuals are very useful as they are recommended in
the literature for the comparison of the predictive capability of different math models (see, e.g.,
the discussion of PRESS residuals in Ref. [X-3], pp.125-126 and pp.141-142). The third residual
type is defined by a set of confirmation points that is independent of the calibration data set.
These residuals are computed as the difference between the measured and fitted load at a
confirmation point. Three test metrics may be computed from the three residual types:
(1) The standard deviation of the load residuals of data points,
(2) The standard deviation of the PRESS residuals of data points,
(3) The standard deviation of the load residuals of confirmation points.
These three standard deviations can easily be compared with the accuracy requirements of the
balance load predictions in order to decide if the calibration data and the math model meet
expectations. The standard deviation of the load residuals of the confirmation points is the most
important test metric as the math model's predictive performance is explicitly tested at points that
were not used to perform the global regression analysis of the original calibration data.
Sometimes, it may happen that certain load prediction accuracy requirements cannot be met.
This observation could have several explanations: (i) a suboptimal function class combination
was selected for the regression analysis of the balance calibration data that could not model
important characteristics of the calibration data; (ii) a suboptimal load schedule was used during
the calibration of the balance that omitted important loads or load combinations that are needed
for a proper characterization of the physical behavior of the balance; (iii) large load and/or gage
output measurement errors occurred that were not detected during the calibration of the balance;
(iv) balance design characteristics (like, e.g., bolted joints) cause hysteresis effects that cannot
easily be modeled using a global regression analysis approach. A detailed examination of the
math model, the load schedule design, measurement error sources, and the balance design may
be needed in order to improve the analysis results and/or the calibration data so that the desired
accuracy of the load predictions can be met.
References
[X-1] AIAA/GTTC Internal Balance Technology Working Group, “Recommended Practice,
Calibration and Use of Internal Strain-Gage Balances with Application to Wind Tunnel Testing,”
AIAA R-091-2003, sponsored and published by the American Institute of Aeronautics and
Astronautics, Reston, Virginia, 2003.
[X-2] Ulbrich, N. and Volden, T., “Strain-Gage Balance Calibration Analysis Using Automatically
Selected Math Models,” AIAA 2005-4084, paper presented at the 41st AIAA/ASME/SAE/ASEE
Joint Propulsion Conference and Exhibit, Tucson, Arizona, July 2005.
[X-3] Montgomery, D. C., Peck, E. A., and Vining, G. G., Introduction to Linear Regression
Analysis, 4th ed., John Wiley & Sons, Inc., New York, 2006, p.84, pp.125-126, pp.141-142,
pp.323-341.
[X-4] Ulbrich, N., and Bader, J., “Analysis of Sting Balance Calibration Data Using Optimized
Regression Models,” AIAA 2009-5372, paper presented at the 45st AIAA/ASME/SAE/ASEE Joint
Propulsion Conference and Exhibit, Denver, Colorado, August 2009, pp. 3-4.
[X-5] DeLoach, R. and Ulbrich, N., “A Comparison of Two Balance Calibration Model Building
Methods,” AIAA 2007-0147, paper presented at the 45th AIAA Aerospace Sciences Meeting and
Exhibit, Reno, Nevada, January 2007.
[X-6] Ulbrich, N. and Volden, T., “Regression Analysis of Experimental Data Using an Improved
Math Model Search Algorithm,” AIAA 2008-0833, paper presented at the 46th AIAA Aerospace
Sciences Meeting and Exhibit, Reno, Nevada, January 2008.
[X-7] Ulbrich, N., “Regression Model Optimization for the Analysis of Experimental Data,” AIAA
2009-1344, paper presented at the 47th AIAA Aerospace Sciences Meeting and Exhibit, Orlando,
Florida, January 2009.
[X-8] DeLoach, R., “The Role of Hierarchy in Response Surface Modeling of Wind Tunnel Data,”
AIAA 2010-0931, paper presented at the 48th AIAA Aerospace Sciences Meeting and Exhibit,
Orlando, Florida, January 2010.
[X-9] Ulbrich, N., “Optimization of Regression Models of Experimental Data using Confirmation
Points,” AIAA 2010-0930, paper presented at the 48th AIAA Aerospace Sciences Meeting and
Exhibit, Orlando, Florida, January 2010.
Download