ChEn 477 Linear Analysis VBA Tool Instructions (Dr. Baxter)

advertisement
Contents
Program Objective ........................................................................................................................................ 2
Using the Program ........................................................................................................................................ 2
Short Description of Program Input .............................................................................................................. 3
Straight lines and polynomials .................................................................................................................. 3
More complex analyses ............................................................................................................................ 3
More Complete Description of Program Input ............................................................................................. 3
𝑓(𝑥) values ............................................................................................................................................... 4
Constant Term .......................................................................................................................................... 4
Polynomial Order ...................................................................................................................................... 4
Dependent Variable .................................................................................................................................. 5
Range of Predictions ................................................................................................................................. 5
Other Entries ............................................................................................................................................. 5
Running the Program .................................................................................................................................... 5
Ouput ............................................................................................................................................................ 5
Numerical Output ..................................................................................................................................... 6
Graphical Output....................................................................................................................................... 7
Program Objective
The attached Excel file is a (linear) data analysis macro that will compute:
1.
2.
3.
4.
5.
6.
7.
parameter best estimates
estimate of the experimental error
the predicted values of a model over a specified range
confidence intervals for the predicted values
single- or multi-point intervals for the average of new data
confidence intervals for the parameters
confidence regions for the parameters taken two at a time
This program can analyze polynomials of any order (including straight lines, or zero-order polynomials) or
any other linear model, that is, models of the the form
𝑦 = 𝑎0 + 𝑎1 𝑓1 (𝑥) + 𝑎2 𝑓2(𝑐) + ⋯
For example, it can analyze the model
𝑦 = 𝑎 cos 𝑥 +
𝑏 sinh 𝑥
𝑑
+ 𝑐 exp 𝑥 +
ln 𝑥
𝑥
where the values 𝑎, 𝑏, 𝑐 and 𝑑 represent the first four values of 𝑎𝑖 in the first equation above. The program
cannot analyze the model
𝑦 = 𝑎 cos(𝑏𝑥)
or similar equations because 𝑏 is a nonlinear coefficient of this equation. In this context, a model is linear
if its derivatives with respect to each of its coefficients does not depend on any of the coefficients.
Otherwise, the model is nonlinear and requires a different analysis.
The program input should be self-explanatory. The following instructions should provide details that may
be less obvious.
Using the Program
To use the macro, go to the View menu and select View Macros or go to the Developer menu and select
Macros. In either case, select what should be the only macro on the list (Linear Analysis). If the developer
menu is not in the excel workbook, use the View menu or go to the File Menu, select Options and then
select the Customize Ribbon option. This will create a dialog box describing the current menu layout. In
the right box, check the Developer box. This should add a Developer menu to the other menus at the top
of the workbook.
Macros in general are potential sources of viruses and other nefarious things. Sometimes default security
setting disable them. To resolve an error indicating security is preventing the macro from running, go to
the Developer menu and click on the Macro Security button. This will open a dialog box that indicates the
current security settings and alternative options. Change (lower) the security so the macro will run. To
make the changes take effect, save, exit, and reopen the workbook.
The first two headings under Short Description of Program Input section below are the short version of
how to use this program to analyze polynomials or other more complex models, respectively. The
remainder of this document summarizes the details of the input and output.
Short Description of Program Input
Straight lines and polynomials
This analysis involves only a few steps:
1.
2.
3.
4.
5.
Put the values of the independent variable, called 𝑥 here, in the range box labeled 𝑓(𝑥).
Put the dependent variable or measured data in the box labeled 𝑦.
Enter what order polynomial you want to fit (linear = 1, quadratic = 2, etc.).
Click “OK.”
That is it.
The more detailed explanation below explains the rest of the boxes.
More complex analyses
The only difference in input if the model is not a polynomial is the x-data entry. For example, assume the
model is the second equation above. This analysis requires one column each for each for cos 𝑥,
1
sinh 𝑥
,
ln 𝑥
exp 𝑥 and 𝑥, each evaluated at each value of 𝑥 for which you have measured data. In this case, you do not
need a column for 𝑥 itself. If the model included a term that is a constant times 𝑥, you would have a
column of 𝑥 values as well. If there is a constant term, not multiplied by any function of 𝑥, you check the
box that says include constant term. You enter the data in the “x or f(x)” box by highlighting the four
columns and n rows that contain these four functions of x.
More Complete Description of Program Input
The macro begins by preparing the workbook for the analysis. This includes several steps:
1. Removing the graphs and worksheets associated with previous analyses. The macro will search
for any charts or worksheets that contain key parts of the default names it gives to the worksheet
and charts it creates. If it finds one or more, it warns the user they are about to be deleted. If the
user wants to preserve those results, which would not be unusual, the user should cancel the
macro at this point and rename them. Otherwise, the macro deletes them.
2. Moving to a worksheet. After deleting the charts and worksheets that it may have previously
created, the macro moves to the next available worksheet on which it expects to receive data.
Data can be entered from any worksheet, but not from charts, etc.
3. Opening a dialog box seeking the following input. There are default values for each entry that can
have a default value.
𝑓(𝑥) values
The first or top box should contain a range specification for the values of the independent variable (x).
The cursor should initially be located here. Clicking in this box shrinks the dialog box so the workbook is
visible. Move to the worksheet containing the data if it is not already visible and highlight the x values.
The data should appear in columns on the spreadsheet, not in rows. Data entered in this box affects the
remainder of the macro in important ways, as described below:
1. If the user selects a single column of data, the macro treats these data as x-values and will do the
statistical correlation to them. The user can specify that the correlation should involve a
polynomial of any order, but the data are considered to be selected values of x in the polynomial.
2. If the user selects two or more columns of data, the macro treats the columns of functions of x or
functions of x, y, z, etc. That is, the data are considered to be such things as sin(x), exp(x), 1/x, etc.
For example, if the model equation includes temperature, pressure, and one mole fraction, there
would be three data columns, one each for temperature, pressure, and mole fraction, and with
one entry per row for the values of temperature, pressure, and mole fraction corresponding to
each value of the independent variable, or the measured data point. In this case, the macro does
not know the actual values of x. It only knows the values of the functions of x. For this reason,
subsequent plots and analyses can only be done at the specified values of x, not at intermediate
values as is done if only x is specified. If this option is used (two or more columns of data), the
order of the polynomial cannot be set and that box is disabled.
Constant Term
If the model includes a constant term, such as an intercept in the equation for a straight line, either enter
a column of ones as one of the columns of data (think of this as x0 in the multicolumn entry) or check the
box that indicates the model includes a constant. If this box is checked, the first parameter in the list of
parameters computed is the constant term. For a traditional polynomial fit with a constant, enter a single
column of data, check the constant box, and put the order of the polynomial in the polynomial order box.
Alternatively, enter a column of 1s and a series of columns containing 𝑥 𝑛 with 𝑛 ranging from one to the
order of polynomial model and enter a blank in the polynomial box.
Polynomial Order
In the case of a polynomial in one variable, such as temperature, the data can include one column for T,
one for T2, one for T3 and so on up to the order of the polynomial model. Alternatively and more
conveniently, the dialog box can specify that the model is a polynomial of a given order. In this case, the
program only needs one column of data (T) and the code computes the higher order values. The indicated
box accepts the order of the polynomial. The code can provide more continuous correlations and error
analyses if the polynomial order is set with a single column of numbers rather than choosing several
columns of numbers.
Dependent Variable
The dependent variable (y values) appear as a range in the third box and must be in a single column and
have the same number of rows as the x values. These are the measured results.
Range of Predictions
A multiplicative factor indicates over what range of independent variable (f(x) values) to make predictions.
For example, a value of 1.2 indicates to predict the values over a 20% larger range than the measured
values. This centers on the range of data such that the prediction will extend to 10% higher and 10% lower
than the range of independent variables.
Other Entries
The other boxes indicate what the code should predict. One logical use of them is to leave them all active
and to delete any superfluous information from the charts. They include options for
1. predicting the confidence interval for the predicted values,
2. predicting the single- or multi-point confidence interval for additional data and the number of
points for which these should be computed,
3. predicting the parameter confidence intervals,
4. predicting the parameter confidence joint regions,
5. indicating the confidence level at which these intervals and regions are computed.
Most of these are check boxes. The exceptions are the confidence level, which should be a fraction or a
percent, and the number of points used to compute the new data confidence intervals. The latter number
most commonly would be one. If it is greater than one, the program computes the confidence interval in
which the average of that many newly measured y values at a single x value would be expected to lie.
Running the Program
To begin computations, click the OK button. The macro checks consistency of the input data and indicates
problems if there is an issue. If it finds none, it computes the statistics for the data and writes the results
in worksheets and charts. It runs efficiently and should return results almost instantly, even for very large
data sets. The biggest factors in how long it runs are the number of parameters in the model and time it
takes to build the charts.
Ouput
The program deletes charts or worksheets with the same names as the ones the program will create. It
warns about this in a dialog box prior to deleting them and offers a chance to exit the program before
doing so. To preserve a previous set of results or to avoid the prompts about deleting them, rename the
charts and worksheets prior to each new analysis.
Numerical Output
A worksheet entitled Data Summary summarizes of all of the data and statistical results. These appear in
several groups of columns, separated by blank columns, as described below for each group of columns.
1. xexp - The values of the independent variables 𝑓(𝑥) that the program uses in the computations.
If multiple columns of 𝑥 data are selected, this column is replaced by the values of the predicted
𝑦 values.
2. yexp - The values of the dependent variable 𝑦.
3. ypred - The predicted values of the dependent variable at each value of the independent variable.
4. resid - The residuals (difference in predicted and measured value) at each value of the
independent variable.
5. est std dev - The estimated experimental error or standard deviation in the data.
6. coefficients - The best estimates of the parameters ordered in a column, with one row for each
parameter. The parameters appear in the same order as the x-values appear or, in the case of a
polynomial, in the order of increasing order of 𝑥. If there is a constant in the model, the first
parameter is for the constant.
7. coef std error – the standard error for each coefficient or parameter.
8. p 95% conf interval +/- The parameter confidence interval at the confidence level specified in the
input file (95% in this case).
9. xpred - The values of the independent variables used in the predictions. These, unlike the input
values of the independent variable, are generally equally spaced and cover the range specified in
the input.
10. ypred - The predicted values over the same range as xpred.
11. s err ypred +/- - The standard error in the prediction as a function of xpred
12. Pred 0.95 CI +/- - The predicted confidence interval for the mean over the same range. This
represents the interval from the mean, not the limit. That is, the range of uncertainty is obtained
by adding (upper limit) and subtracting (lower limit) this column from the predicted values given
in the column ypred.
13. b1 JCR 0.95, b2+ JCR 0.95, and b2- JCR 0.95 – The range of values for one parameter. For example,
b1 represents the first parameter and b2 would represent the second parameter. These entries
come in groups of three columns. The first column is the range contains values for one parameter
and would normally be plotted as an x-value on a plot. The second and third columns are the two
ranges of the second parameters and would normally be plotted as two y values that, when
combined, form an ellipse. For example, b2+ JCR 0.95 is the upper half of the ellipse and b2- JCR
0.95 the lower half of the ellipse, each of which contains values for parameter b2 that correspond
to the value of parameter b1 in the same row. All combinations of parameters taken two at a time
are included in a series of columns, three columns for each set of two parameters. In general, the
joint confidence region is a p-dimensional ellipsoid. However, such ellipsoids are difficult to
display if p is 3 and impossible to display if p is larger than 3. Therefore, the program plots a series
of projections of the dimensional ellipsoid onto a 2-dimensional space. That is, each series
represents the 2-dimensional shadow that the p-dimensional shape would cast in the 2dimensional space of just two parameters.
Graphical Output
The program also generates the following graphs:
1. The data with the prediction, the confidence interval for the prediction, and the single- or multipoint confidence interval for where the average of q additional measurement would lie. The value
of q is the one entered in the dialog box. This plot depends continuously on 𝑥 with continuous
confidence intervals if a single column of 𝑥 data are entered. If multiple 𝑥 data columns are
entered, this plot is a parity plot, or a plot predicted 𝑦 vs. measured 𝑦, since the macro does not
know what the values of 𝑥 are or how they change between the measured values. Similarly, the
confidence limits, etc. appear as points at the discrete values of the measured data rather than
as lines as a function of 𝑥.
2. A plot of the residuals as a function of 𝑥. If there is more than one independent variable the
residuals are plotted as a function of the measured values. Otherwise, they appear as a function
of 𝑥. These should not show any obvious pattern with respect to their average, which will be zero,
and should be normally distributed about this value.
3. A series of plots for the parameters showing the confidence intervals and joint confidence regions.
There are two joint confidence regions, one each at the confidence level specified in the dialog
box and at this value squared. The second of these is more comparable to the union of the
confidence intervals (that is, the probability that both parameters lie between their limits at some
confidence level is (confidence level)2).
Download