Linear regression and calibration curves Chemistry 321, Summer 2014 In the next few labs, you will be generating calibration curves – generally testing the linear response of an instrument by measuring some property of known concentration standards You will record data that perhaps looks like this: [A] (mM) Absorbance 0 0 0.1 0.058 0.2 0.122 0.4 0.223 0.8 0.433 Linear regression answers the question: Which line has the best “fit” to the data? ? ? ? 17.3 Linear Regression Analysis • Regression analysis is used to predict the value of one variable (the dependent variable) on the basis of other variables (the independent variables). • Dependent variable: denoted Y • Independent variables: denoted X1, X2, …, Xk • If we only have ONE independent variable, the model is • which is referred to as simple linear regression. We would be interested in estimating β0 and β1 from the data we collect. Note that β0 represents the y-intercept of the “best-fit” line and β1is the slope of that line 17.4 Least Squares Line these differences are called residuals or errors 17.5 Least Squares Line…[sure glad we have computers now] • The coefficients b1 and b0 for the least squares line… • …are calculated as: 17.6 Least Squares Line… See if you can estimate Y-intercept and slope from this data Statistics Data Information Data Points: x y 1 6 2 1 3 9 4 5 5 17 6 12 y = .934 + 2.114x 17.7 Coefficient of Determination… • Tests thus far have shown if a linear relationship exists; it is also useful to measure the strength of the relationship. This is done by calculating the coefficient of determination – R2. • The coefficient of determination is the square of the coefficient of correlation (r), hence R2 = (r)2 • r will be computed shortly and this is true for models with only 1 independent variable 17.8 Coefficient of Determination • Unlike the value of a test statistic, the coefficient of determination does not have a critical value that enables us to draw conclusions. • In general the higher the value of R2, the better the model fits the data. • R2 = 1: Perfect match between the line and the data points. • R2 = 0: There are no linear relationship between x and y. 17.9 When can you use linear regression? • Linear regression requires us to satisfy three assumptions about the distributions of the two quantitative variables: • No outliers • A (expected) linear relationship between the variables • Equal variance of the residuals across predicted values • (weaker) At least ten data points • The evaluation of the conformity of the analysis to these assumptions is generally based upon visual analysis of the scatter plot of the dependent variable by the independent variable. (Big hint: On Excel, use “scatter”, not “line” graphs). 7/12/2016 Slide 10 Calibration curve for the spectrophotometer 0.5 0.45 Absrobance 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.2 0.4 0.6 0.8 1 [A] mM So go ahead and plot your points and label your axes as usual. (These are the data from the second slide). Calibration curve for the spectrophotometer 0.5 y = 0.5375x + 0.006 R² = 0.99893 0.45 Absrobance 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.2 0.4 0.6 0.8 1 [A] mM Using whatever software (don’t do it by hand – I used Excel for this), display both the equation of the best-fit line and the coefficient of determination (R2). Calibration curve for the spectrophotometer 0.5 y = 0.5375x + 0.006 R² = 0.99893 0.45 Absrobance 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.2 0.4 0.6 0.8 1 [A] mM So you can use either the equation or the graph itself to determine the concentration of your sample from its absorbance. Summary • Linear regression provides additional statistical information about the relationship between two quantitative variables. • The coefficient of determination, R², which indicates the percentage of variance in the dependent variable that is accounted by variability in the independent variable • The regression equation is the formula for the trend or fit line which enables us to predict the dependent variable for any given value of the independent variable • The regression equation has two parts – the intercept and the slope 7/12/2016 Slide 14