Statistics and Risk Management Regression Video URL: jukebox.esc13.net/untdeveloper/Videos/Regression.mov Vocabulary List: Variable: an attribute or characteristic of a statistical unit that differs among a population of units (height, age, location, etc.) Independent Variable: the variable that changes or can be controlled; affects the dependent variable. Dependent Variable: the variable that is measured to determine the effect of an independent variable. Correlation: the measurement of a relationship between two variables. Regression: the technique of predictive relationships based upon correlational data. Scatter chart: a graph representing data points (pairs of variables) charted along the x and y axes. Linear Regression: the technique of fitting a straight line (regression line) to the data points on a scatter chart to determine the relationship between two variables. Regression Line: a line drawn through the data points on a scatter chart showing the relationship between the variables. (Easton and McColl, 1997) Easton, V.J., and McColl, J.H. (1997). Statistics glossary. Available from http://www.stats.gla.ac.uk/steps/glossary/index.html Copyright © Texas Education Agency, 2012. All rights reserved. 1 Resources: Introduction to Linear Regression and Correlation Analysis Use this link to: calculate and interpret the simple correlation between two variables, determine whether the correlation is significant, calculate and interpret the simple linear regression equation for a set of data, and understand the assumptions behind regression analysis. www.fordham.edu/economics/vinod/correl-regr.ppt Regression and Correlation Analysis This site analyzes the concepts of regression and correlation, discusses the regression model, explains the least squares method, and defines the relationship between correlation and regression analysis. http://abyss.uoregon.edu/~js/glossary/correlation.html Regression Through this interactive Regression lesson instructors can set up an online lesson that correlates with their text books. The learner tab explains regression, the activity tab contains the actual activity, the help tab provides assistance on using the activity, and the instructor tab allows teachers to set up the activity to fit with their lecture concepts. http://www.shodor.org/interactivate/activities/Regression/ Regression Line Example This video from Khan Academy provides a detailed example of how to calculate and for a regression line. This interactive lesson provides supplemental instruction to accompany a teacher’s lecture on regression. http://www.khanacademy.org/math/statistics/v/regression-line-example Copyright © Texas Education Agency, 2012. All rights reserved. 2 Regression Practice Test Name:_____________________ TRUE and FALSE: 1. The correlation demonstrating the relationship between 2 sets of data is not a calculated value. A. True B. False 2. Correlation and Regression Analysis are related in that they both deal with relationships among variables. A. True B. False 3. The correlation coefficient is a measure of linear association with only one variable. A. True B. False 4. Regression and correlation analysis should be interpreted as establishing a cause‐and‐effect relationship. A. True B. False 5. A correlation coefficient of ‐1 indicates 2 variables are related in a negative linear sense. A. True B. False 6. Linear regression consists of finding the best‐fitting straight line through the points. A. True B. False MATCHING: A. B. C. D. E. Linear Regression Criterion Variable Simple Regression Regression Analysis Least Squares Method 7. __________ When there is only one predictor variable, we use this prediction method. 8. __________ The prediction of scores on one variable from the scores on a second variable. 9. __________ Identifies the relationship between a dependent variable and one or more independent variables. 10. __________ Most widely used procedure for developing estimates of the model parameters. 11. __________ The variable we are predicting. MULTIPLE CHOICE: 12. ____ is the measurement of a relationship between two variables. A. Regression B. Variance C. Correlation D. Deviation 13. ____ is the technique of predictive relationships based upon correlational data. A. Regression B. Variance C. Correlation D. Deviation 14. Values of the correlation coefficient are always between ____ and ____. A. ‐1 and 1 B. ‐2 and 2 C. ‐3 and 3 D. ‐4 and 4 Copyright © Texas Education Agency, 2012. All rights reserved. 3 Regression Practice Test Name:_____________________ TRUE and FALSE: 15. A correlation coefficient of ____ indicates 2 variables are related in a positive linear sense. E. ‐1 F. 0 G. +1 H. +2 16. A correlation coefficient of ____ indicates that there is no linear relationship between the 2 variables. 17. 18. 19. 20. E. ‐1 F. 0 G. +1 H. +2 The best‐fitting line in a linear regression is called the ____. E. Line of Correlation F. Regression Line G. Line of Best Fit H. None of the above If a point is much higher than the regression line, it will have a _________ error of prediction. E. Small F. Positive G. Large H. Negative The Excel formula to find correlation is _______. E. =CORREL(Factor1,Factor2) F. =CALC(Factor1,Factor2) G. CORREL(Factor1,Factor2) H. CALC(Factor1,Factor2) _______ are simply deviations from the mean. E. Outliers F. Raw Scores G. Deviation Scores H. None of the Above Copyright © Texas Education Agency, 2012. All rights reserved. 4 Regression Practice Test KEY 1. B 2. A 3. B 4. B 5. A 6. A 7. C 8. A 9. D 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. E B C A A C B B C A C Copyright © Texas Education Agency, 2012. All rights reserved. 5 Student Assignment 7.1a Regression – Linear Correlation Name:_____________________ You noticed that the average points scored in your high school football conference games seem to correlate with the average night time temperature for the football season for the last 10 years. Temp #1 #2 #3 #4 #5 Score 67.5 55.0 58 60 59 12.6 9.8 10.1 10.0 9.8 Temp Score #6 #7 #8 #9 #10 66.6 60.4 55.8 59.6 62.5 13.6 11.3 11.4 9.8 10.1 Is that a correlation or an inverse correlation? Do you think it is a strong correlation? Why? Copyright © Texas Education Agency, 2012. All rights reserved. 6 Student Assignment 7.2a Regression – Linear Regression Name:_____________________ You are examining the safety record at a plant for an insurance company. You have looked at the accidents per ten thousand hours figure and are trying to identify a correlation with the overtime worked per ten thousand hours. OtHr #1 #2 #3 #4 #5 1000 900 800 700 600 Accidents 2.5 2.6 1.9 1.95 1.85 OtHr #6 #7 #8 #9 #10 Accidents 500 400 300 200 100 1.5 1.4 1.5 1.2 .9 They have a new government contract and expect 1500 of overtime for several periods during the years. Will there be a significant increased risk of more accidents during those periods? Copyright © Texas Education Agency, 2012. All rights reserved. 7 Explore Activity: Height vs. Arm span – Many body measures are related to each other, including your height and arm span. For many people, these two measures are the same. To explore this relationship you will need the following: tape measure (inches or centimeters) computer with statistics software or a spreadsheet (or a graphing calculator with stat functions) several people you can measure (like your classmates!) Measure your height and arm span (you will probably need to work with a partner) and record these two measures in a central location to share with your class. Once everyone in your class has done this you will have a set of data (in pairs) that can be plotted on a scatterplot (you can do this with a computer or even with graph paper and pencil). Plot the points (arm span, height) so arm span will be on the horizontal (x) axis and height on the vertical (y) axis. Note that you do not have to start the two axes at (0, 0). Rather start each at a convenient value slightly below the smallest arm span and smallest height. 1. How would you describe the relationship between arm span and height? 2. Are there any unusual cases? Describe them. A least-squares linear regression equation can be produced to represent the relationship between arm span and height. Your software (or graphing calculator) may have built-in command to produce this equation: height a b arm span where a = the y-intercept (height intercept) and b = the slope of the line (increase in height for 1 unit increase in arm span). You can also calculate this equation from the means and standard deviations of height and arm span: br sy sx a y bx Copyright © Texas Education Agency, 2012. All rights reserved. 8 where r = correlation, s y standard deviation of height, sx stand. dev. of arm span, y mean height, and x mean arm span. All of these can be calculated using formulas from earlier lessons. Graph the regression equation on your scatterplot. Notice it goes “through the middle” of your plotted points. You could now use the equation to make predictions of heights if you know a person’s armspan. 1. Use your regression equation to calculate a person’s height if you know the arm span is 64 inches. 2. Use your regression equation to calculate a person’s height if you know the arm span is 72 inches. Other similar relationships could be explored: Height vs. femur (thigh bone) length Height vs. fore arm (elbow to wrist) Height vs. vertical length of skull Video Links – Check out the relevant links at Khan Academy for more information and examples: http://www.khanacademy.org/#statistics Copyright © Texas Education Agency, 2012. All rights reserved. 9