CHEG REU Polymath Regression Page 1/3 POLYMATH WORKSHOP - Regression and Data Analysis Data Table The data table is used for input, manipulation and storage of numerical data. The data are stored in a columnwise fashion where every column is associated with a name (variable) and can be addressed separately. The stored data can be regressed (meaning fitting a straight line, various curves and equations to the data using multiple linear, polynomial and nonlinear regression techniques), analyzed (meaning interpolated, differentiated, integrated and various statistics are calculated) and plotted. Linear & Polynomial Regression This part of the program will fit a polynomial of the form: P(x) = a0 + a1*x + a2*x^2 + . . . + an*x^n where a0, a1, ..., an are regression parameters to a set of N tabulated values of x (independent variable) versus y (dependent variable). The highest degree allowed for a polynomial is N - 1 (thus n >= N - 1). The program calculates the coefficients a0, a1, ..., an by minimizing the sum of squares of the deviations between the calculated P(x) above and the corresponding value of y for each value of x. Multiple Linear Regression This part of the program will fit a linear function of the form: y(x1, x2, ..., xn) = a0 + a1*x1 + a2*x2 + ... + an*xn where a0, a1, ..., an are regression parameters, to a set of N tabulated values of x1, x2, ..., xn (independent variables) versus y (dependent variable). Note that the number of data points must be greater than n+1 (thus N >= n+1). The program calculates the coefficients a0, a1, ..., an by minimizing the sum of squares of the deviations between the calculated and the data for y. Nonlinear Regression This part of the program will fit a nonlinear function of the form: y = f (x1, x2, …, xn, a0, a1, a2, …, am) where a0, a1, …, an are regression parameters to a set of N tabulated values of x1, x2, …, xn (independent variables) versus y (dependent variable). Note that the number of data points must be greater than m + 1 (thus N >= m + 1). CHEG REU Polymath Regression Page 2/3 POLYMATH Problem Data Set The following table presents vapor pressure versus temperature for benzene. For POLYMATH Let TC = Temperature (˚C) Let P = Pressure (mm Hg) ENTER BENZENE DATA SET INTO POLYMATH Linear Regression (Polynomial of degree 1) P(x) = a0 + a1*x General P(TC) = a0 + a1*TC Problem 1 (Linear Regression) PROBLEM 1 - SOLVE FOR BENZENE DATA SET Polynomial Regression (n is degree of polynomial) P(x) = a0 + a1*x + a2*x^2 + . . . + an*x^n P(TC) = a0 + a1*TC + a2*TC^2 General Problem 2 (Second Degree Polynomial Regression) PROBLEM 2 - SOLVE FOR BENZENE DATA SET CHEG REU Polymath Regression Page 3/3 Multiple Linear Regression y(x1, x2, ..., xn) = a0 + a1*x1 + a2*x2 + ... + an*xn General log(P) = A + B/T + C*log(T) + D*T^2 Problem 3 (Riedel equation) where T is the temperature in Kelvin and A, B, C and D are the parameters For POLYMATH Variable Transformations TC = Temperature (˚C) P = Pressure (mm Hg) TK = TC + 273.15 logP = log(P) Trec = 1/TK T2 = TK^2 LogT = log(TK) logP = a0 + a1*Trec + a2*logT + a3*T2 PROBLEM 3 - SOLVE FOR BENZENE DATA SET Nonlinear Regression y = f (x1, x2, …, xn, a0, a1, a2, …, am) logP = A + B/(TC + C) General Example Nonliner Regression Problem (Antoine Equation) For POLYMATH Variable Transformations TC = Temperature (˚C) P = Pressure (mm Hg) logP = log(P) Initial estimates for the parameters must also be provided. For this example: Initial estimates are: A = 6, B = -1000, and C = 200 PROBLEM 4 - DETERMINE ANTOINE EQUATION CONSTANTS FOR BENZENE DATA SET