Correlation with a Non - Linear Emphasis Day 2 Correlation measures the strength of the linear association between 2 quantitative variables. Before you use correlation, you must check several conditions: Quantitative Variables Condition: Are both variables quantitative? Straight Enough Condition: Is the form of the scatterplot straight enough that a linear relationship makes sense? If the relationship is not linear, the correlation will be misleading. Outlier Condition: Outliers can distort the correlation dramatically. If an outlier is present it is often good to report the correlation with and without that point. Warning : Correlatio n Causation A hidden variable that stands behind a relationship and determines it by simultaneously affecting the other two variables is called a lurking (confounding) variable. Scatterplots and correlation coefficients NEVER prove causation. Don’t ever assume the relationship is linear just because the correlation coefficient is high. In order to determine whether a relationship is linear or not linear, we must always look at the residual plot. Residuals A residual is the vertical distance between a data point and the graph of a regression equation. The Residual is positive if the data point is above the graph. negative if the data point is below the graph. Is 0 only when the graph passes through the data point. What should you look for to tell if it is not linear?...... Sometimes a high “r” value for linear regression is deceptive. You must look at the scatter plot AND you must look at the residual pattern it makes. If the residuals have a curved pattern then it is NOT linear. To prove linearity A scatterplot of the residuals vs. the xvalues should be the most boring scatterplot you’ve ever seen. It shouldn’t have any interesting features, like a direction or a shape. It should stretch horizontally, with about the same amount of scatter throughout. It should show no bends. It should show no outliers. Some Non Linear Regression Shapes…… Positive Quadratic Regression: Negative Quadratic Regression: More Non Linear Regression Shapes…… Positive Exponential Regression: Negative Exponential Regression: Quadratic and Exponential on GDC…… Quadratic: Exponential: Example……The scatter plot could possibly be linear. You must check the residual pattern. x y 5 16.3 10 9.7 15 8.1 20 4.2 45 1.9 25 3.4 60 1.3 NOTE : RESIDS are found by 2nd Stat AFTER doing a LINEAR REGRESSION . Change y-list to resid after running a linear correlation regression – 2nd stat resid: Notice the curved pattern in the residuals. NOTE!!!!!! Just because the curved pattern on the residuals looks like a quadratic we cannot determine that until we check the “r” value of other curved functions and see how well the data fits. You should also consider “real-life” implications when deciding. When you see that the residuals are curved you must check the correlation coefficient for the exponential and the quadratic to choose the stronger correlation. A check on the exponential regression yield an r – value of -0.956. (Strong Negative but check out the quadratic….) This is a quadratic regression….. Equation: y=.00946x² 0.839x+18.5 r = 0.966 This value is even stronger than the exponential. Example 2……Is it linear? x y 0 1 -3 0.125 -4 0.0625 3 8 4 16 5 32 Look at the residuals…… There is a curved pattern in the residuals. It is NOT linear – it is either quadratic or exponential. (Positive) Use the “r” value to help you decide. And the Winner is….. Here is the equation you should use for predictions: y = 1(2) x Homework Follow the flowchart.