Contents Program Objective ........................................................................................................................................ 2 Using the Program ........................................................................................................................................ 2 Short Description of Program Input .............................................................................................................. 3 Straight lines and polynomials .................................................................................................................. 3 More complex analyses ............................................................................................................................ 3 More Complete Description of Program Input ............................................................................................. 3 𝑓(𝑥) values ............................................................................................................................................... 4 Constant Term .......................................................................................................................................... 4 Polynomial Order ...................................................................................................................................... 4 Dependent Variable .................................................................................................................................. 5 Range of Predictions ................................................................................................................................. 5 Other Entries ............................................................................................................................................. 5 Running the Program .................................................................................................................................... 5 Ouput ............................................................................................................................................................ 5 Numerical Output ..................................................................................................................................... 6 Graphical Output....................................................................................................................................... 7 Program Objective The attached Excel file is a (linear) data analysis macro that will compute: 1. 2. 3. 4. 5. 6. 7. parameter best estimates estimate of the experimental error the predicted values of a model over a specified range confidence intervals for the predicted values single- or multi-point intervals for the average of new data confidence intervals for the parameters confidence regions for the parameters taken two at a time This program can analyze polynomials of any order (including straight lines, or zero-order polynomials) or any other linear model, that is, models of the the form 𝑦 = 𝑎0 + 𝑎1 𝑓1 (𝑥) + 𝑎2 𝑓2(𝑐) + ⋯ For example, it can analyze the model 𝑦 = 𝑎 cos 𝑥 + 𝑏 sinh 𝑥 𝑑 + 𝑐 exp 𝑥 + ln 𝑥 𝑥 where the values 𝑎, 𝑏, 𝑐 and 𝑑 represent the first four values of 𝑎𝑖 in the first equation above. The program cannot analyze the model 𝑦 = 𝑎 cos(𝑏𝑥) or similar equations because 𝑏 is a nonlinear coefficient of this equation. In this context, a model is linear if its derivatives with respect to each of its coefficients does not depend on any of the coefficients. Otherwise, the model is nonlinear and requires a different analysis. The program input should be self-explanatory. The following instructions should provide details that may be less obvious. Using the Program To use the macro, go to the View menu and select View Macros or go to the Developer menu and select Macros. In either case, select what should be the only macro on the list (Linear Analysis). If the developer menu is not in the excel workbook, use the View menu or go to the File Menu, select Options and then select the Customize Ribbon option. This will create a dialog box describing the current menu layout. In the right box, check the Developer box. This should add a Developer menu to the other menus at the top of the workbook. Macros in general are potential sources of viruses and other nefarious things. Sometimes default security setting disable them. To resolve an error indicating security is preventing the macro from running, go to the Developer menu and click on the Macro Security button. This will open a dialog box that indicates the current security settings and alternative options. Change (lower) the security so the macro will run. To make the changes take effect, save, exit, and reopen the workbook. The first two headings under Short Description of Program Input section below are the short version of how to use this program to analyze polynomials or other more complex models, respectively. The remainder of this document summarizes the details of the input and output. Short Description of Program Input Straight lines and polynomials This analysis involves only a few steps: 1. 2. 3. 4. 5. Put the values of the independent variable, called 𝑥 here, in the range box labeled 𝑓(𝑥). Put the dependent variable or measured data in the box labeled 𝑦. Enter what order polynomial you want to fit (linear = 1, quadratic = 2, etc.). Click “OK.” That is it. The more detailed explanation below explains the rest of the boxes. More complex analyses The only difference in input if the model is not a polynomial is the x-data entry. For example, assume the model is the second equation above. This analysis requires one column each for each for cos 𝑥, 1 sinh 𝑥 , ln 𝑥 exp 𝑥 and 𝑥, each evaluated at each value of 𝑥 for which you have measured data. In this case, you do not need a column for 𝑥 itself. If the model included a term that is a constant times 𝑥, you would have a column of 𝑥 values as well. If there is a constant term, not multiplied by any function of 𝑥, you check the box that says include constant term. You enter the data in the “x or f(x)” box by highlighting the four columns and n rows that contain these four functions of x. More Complete Description of Program Input The macro begins by preparing the workbook for the analysis. This includes several steps: 1. Removing the graphs and worksheets associated with previous analyses. The macro will search for any charts or worksheets that contain key parts of the default names it gives to the worksheet and charts it creates. If it finds one or more, it warns the user they are about to be deleted. If the user wants to preserve those results, which would not be unusual, the user should cancel the macro at this point and rename them. Otherwise, the macro deletes them. 2. Moving to a worksheet. After deleting the charts and worksheets that it may have previously created, the macro moves to the next available worksheet on which it expects to receive data. Data can be entered from any worksheet, but not from charts, etc. 3. Opening a dialog box seeking the following input. There are default values for each entry that can have a default value. 𝑓(𝑥) values The first or top box should contain a range specification for the values of the independent variable (x). The cursor should initially be located here. Clicking in this box shrinks the dialog box so the workbook is visible. Move to the worksheet containing the data if it is not already visible and highlight the x values. The data should appear in columns on the spreadsheet, not in rows. Data entered in this box affects the remainder of the macro in important ways, as described below: 1. If the user selects a single column of data, the macro treats these data as x-values and will do the statistical correlation to them. The user can specify that the correlation should involve a polynomial of any order, but the data are considered to be selected values of x in the polynomial. 2. If the user selects two or more columns of data, the macro treats the columns of functions of x or functions of x, y, z, etc. That is, the data are considered to be such things as sin(x), exp(x), 1/x, etc. For example, if the model equation includes temperature, pressure, and one mole fraction, there would be three data columns, one each for temperature, pressure, and mole fraction, and with one entry per row for the values of temperature, pressure, and mole fraction corresponding to each value of the independent variable, or the measured data point. In this case, the macro does not know the actual values of x. It only knows the values of the functions of x. For this reason, subsequent plots and analyses can only be done at the specified values of x, not at intermediate values as is done if only x is specified. If this option is used (two or more columns of data), the order of the polynomial cannot be set and that box is disabled. Constant Term If the model includes a constant term, such as an intercept in the equation for a straight line, either enter a column of ones as one of the columns of data (think of this as x0 in the multicolumn entry) or check the box that indicates the model includes a constant. If this box is checked, the first parameter in the list of parameters computed is the constant term. For a traditional polynomial fit with a constant, enter a single column of data, check the constant box, and put the order of the polynomial in the polynomial order box. Alternatively, enter a column of 1s and a series of columns containing 𝑥 𝑛 with 𝑛 ranging from one to the order of polynomial model and enter a blank in the polynomial box. Polynomial Order In the case of a polynomial in one variable, such as temperature, the data can include one column for T, one for T2, one for T3 and so on up to the order of the polynomial model. Alternatively and more conveniently, the dialog box can specify that the model is a polynomial of a given order. In this case, the program only needs one column of data (T) and the code computes the higher order values. The indicated box accepts the order of the polynomial. The code can provide more continuous correlations and error analyses if the polynomial order is set with a single column of numbers rather than choosing several columns of numbers. Dependent Variable The dependent variable (y values) appear as a range in the third box and must be in a single column and have the same number of rows as the x values. These are the measured results. Range of Predictions A multiplicative factor indicates over what range of independent variable (f(x) values) to make predictions. For example, a value of 1.2 indicates to predict the values over a 20% larger range than the measured values. This centers on the range of data such that the prediction will extend to 10% higher and 10% lower than the range of independent variables. Other Entries The other boxes indicate what the code should predict. One logical use of them is to leave them all active and to delete any superfluous information from the charts. They include options for 1. predicting the confidence interval for the predicted values, 2. predicting the single- or multi-point confidence interval for additional data and the number of points for which these should be computed, 3. predicting the parameter confidence intervals, 4. predicting the parameter confidence joint regions, 5. indicating the confidence level at which these intervals and regions are computed. Most of these are check boxes. The exceptions are the confidence level, which should be a fraction or a percent, and the number of points used to compute the new data confidence intervals. The latter number most commonly would be one. If it is greater than one, the program computes the confidence interval in which the average of that many newly measured y values at a single x value would be expected to lie. Running the Program To begin computations, click the OK button. The macro checks consistency of the input data and indicates problems if there is an issue. If it finds none, it computes the statistics for the data and writes the results in worksheets and charts. It runs efficiently and should return results almost instantly, even for very large data sets. The biggest factors in how long it runs are the number of parameters in the model and time it takes to build the charts. Ouput The program deletes charts or worksheets with the same names as the ones the program will create. It warns about this in a dialog box prior to deleting them and offers a chance to exit the program before doing so. To preserve a previous set of results or to avoid the prompts about deleting them, rename the charts and worksheets prior to each new analysis. Numerical Output A worksheet entitled Data Summary summarizes of all of the data and statistical results. These appear in several groups of columns, separated by blank columns, as described below for each group of columns. 1. xexp - The values of the independent variables 𝑓(𝑥) that the program uses in the computations. If multiple columns of 𝑥 data are selected, this column is replaced by the values of the predicted 𝑦 values. 2. yexp - The values of the dependent variable 𝑦. 3. ypred - The predicted values of the dependent variable at each value of the independent variable. 4. resid - The residuals (difference in predicted and measured value) at each value of the independent variable. 5. est std dev - The estimated experimental error or standard deviation in the data. 6. coefficients - The best estimates of the parameters ordered in a column, with one row for each parameter. The parameters appear in the same order as the x-values appear or, in the case of a polynomial, in the order of increasing order of 𝑥. If there is a constant in the model, the first parameter is for the constant. 7. coef std error – the standard error for each coefficient or parameter. 8. p 95% conf interval +/- The parameter confidence interval at the confidence level specified in the input file (95% in this case). 9. xpred - The values of the independent variables used in the predictions. These, unlike the input values of the independent variable, are generally equally spaced and cover the range specified in the input. 10. ypred - The predicted values over the same range as xpred. 11. s err ypred +/- - The standard error in the prediction as a function of xpred 12. Pred 0.95 CI +/- - The predicted confidence interval for the mean over the same range. This represents the interval from the mean, not the limit. That is, the range of uncertainty is obtained by adding (upper limit) and subtracting (lower limit) this column from the predicted values given in the column ypred. 13. b1 JCR 0.95, b2+ JCR 0.95, and b2- JCR 0.95 – The range of values for one parameter. For example, b1 represents the first parameter and b2 would represent the second parameter. These entries come in groups of three columns. The first column is the range contains values for one parameter and would normally be plotted as an x-value on a plot. The second and third columns are the two ranges of the second parameters and would normally be plotted as two y values that, when combined, form an ellipse. For example, b2+ JCR 0.95 is the upper half of the ellipse and b2- JCR 0.95 the lower half of the ellipse, each of which contains values for parameter b2 that correspond to the value of parameter b1 in the same row. All combinations of parameters taken two at a time are included in a series of columns, three columns for each set of two parameters. In general, the joint confidence region is a p-dimensional ellipsoid. However, such ellipsoids are difficult to display if p is 3 and impossible to display if p is larger than 3. Therefore, the program plots a series of projections of the dimensional ellipsoid onto a 2-dimensional space. That is, each series represents the 2-dimensional shadow that the p-dimensional shape would cast in the 2dimensional space of just two parameters. Graphical Output The program also generates the following graphs: 1. The data with the prediction, the confidence interval for the prediction, and the single- or multipoint confidence interval for where the average of q additional measurement would lie. The value of q is the one entered in the dialog box. This plot depends continuously on 𝑥 with continuous confidence intervals if a single column of 𝑥 data are entered. If multiple 𝑥 data columns are entered, this plot is a parity plot, or a plot predicted 𝑦 vs. measured 𝑦, since the macro does not know what the values of 𝑥 are or how they change between the measured values. Similarly, the confidence limits, etc. appear as points at the discrete values of the measured data rather than as lines as a function of 𝑥. 2. A plot of the residuals as a function of 𝑥. If there is more than one independent variable the residuals are plotted as a function of the measured values. Otherwise, they appear as a function of 𝑥. These should not show any obvious pattern with respect to their average, which will be zero, and should be normally distributed about this value. 3. A series of plots for the parameters showing the confidence intervals and joint confidence regions. There are two joint confidence regions, one each at the confidence level specified in the dialog box and at this value squared. The second of these is more comparable to the union of the confidence intervals (that is, the probability that both parameters lie between their limits at some confidence level is (confidence level)2).