Regression • This Chapter is on Regression • We will learn the difference between dependent and independent variables • We will be looking at the line of best fit • We are going to see how to calculate the equation of the line of best fit (regression equation), and interpret it Regression Variables, and the line of best fit The equation of a straight line is usually given in the form y = a + bx. If y = a + bx then a is the yintercept (where the line cuts the y-axis) and b is the gradient of the line. y x=0 y=3 x=1 y=5 x=2 y=7 y = 2x + 3 8 6 4 You can draw any line like this by choosing values for x and substituting into the equation. Sketch the equation y = 2x + 3 2 0 0 1 2 3 x 7A Regression Variables, and the line of best fit Independent variable (explanatory) is independent of the other variable. It is plotted on the x-axis. Dependent variable (response) is the one whose values are determined by the independent variable. It is plotted on the y-axis. For example: If we are looking at album sales and stores that stock albums… The album sales will be dependent on the number of stores selling them So album sales are dependent, and the number of stores independent 7A Regression Variables, and the line of best fit The formula for the line of best fit will be in the form: y = a + bx Sxy b Sxx a y bx The regression line goes through the middle of the points plotted y e5 e3 e1 So you must always calculate b first! e4 Mathematically each point is a vertical distance ‘e’ from the line Each of these distances is known as a residual e2 The regression line will minimise the sum of the squares of these residuals x 2 e Minimum 7A Regression Variables, and the line of best fit For the following set of data: a) Calculate Sxx and Sxy. b) Work out the equation of the regression line. Sxx x 2 x n 2 n5 xy 18238 x y x 300 2 22000 x 60 y 288.6 2 16879.14 y 57.72 x y Sxy xy n (300) 2 Sxx 22000 5 Sxy 18238 Sxx 4000 Sxy 922 300 288.6 5 7A Regression Variables, and the line of best fit For the following set of data: a) Calculate Sxx and Sxy. b) Work out the equation of the regression line. y = a + bx Sxy b Sxx a y bx 922 b 4000 a 57.72 (0.230560) b 0.2305 a 43.89 y = 43.89 + 0.2305x n5 xy 18238 x y x 300 2 22000 x 60 y 288.6 2 16879.14 y 57.72 Sxx 4000 Sxy 922 Give answers in full, or if rounded, to 3sf 7A Regression Coding and Regression Equations As with other topics we have looked at, coding can be used to make the numbers easier to work with. However, the coded regression line will most likely be different from the actual regression line To calculate the actual regression line, you must substitute the codes for x and y into the coded regression formula… 7B Regression Coding and Regression Equations The following coding was used to alter a set of data. x2 r 10 t 5y This is the formula for the coded regression line: t 2r 5 Calculate the actual regression line for the original data, x and y. t 2r 5 Substitute the codes for t and r Multiply all parts by 10 to cancel the divide by 10 Expand the bracket Simplify by grouping Divide by 50 to leave y on its own x2 5y 2 5 10 50 y 2 x 2 50 50 y 2x 4 50 50 y 2x 46 y 2 x 46 50 OR: y = (0.04x + 0.92) 7B Regression Coding and Regression Equations Eight Samples of carbon steel were produced with different percentages (c) of carbon in them. Each sample was heated until it melted and the temperature (m) recorded. The results were coded so that: x 10c Sxx x 2 x 36 m 700 y 5 2 3 4 5 6 7 8 Melting Point (y) 35 28 24 16 15 12 8 6 y 144 Sxy 478 36 144 8 Sxy 42 Calculate Sxy and Sxx. xy (36)2 Sxx 204 8 n 1 204 n x y Sxy xy Carbon (x) 2 2 Sxx 170 The following table shows the coded results: x x 478 7B Regression Sxy b Sxx Coding and Regression Equations Carbon (x) 1 2 3 4 5 6 7 8 Melting Point (y) 35 28 24 16 15 12 8 6 b x 204 xy 478 x 36 y 144 Sxx 170 Sxy 42 170 42 2 b 4.048 a y bx Calculate the regression line of y on x. y = a + bx Sxy b Sxx a y bx y = 36.21 - 4.048x 85 21 y y n x x a 144 36 4.048 8 8 a 36.21 507 14 n 7B Regression Coding and Regression Equations y = 36.21 - 4.048x Calculate the regression line of m on c. x 10c m 700 y 5 y 36.21 4.048 x Substitute the codes for y and x Multiply out the bracket Multiply by 5 to cancel the division m 700 36.21 4.048(10c) 5 m 700 36.21 40.48c 5 m 700 181.08 202.4c Add 700 Remember, with longer decimals, make a note of the fraction your calculator gives, so you can get the exact value later on… m 881.08 202.4c 7B Regression Applying and Interpreting the Regression Equation A regression equation can be used to predict the dependent variable, based on a chosen value of the independent variable. Interpolation Estimating a value that is within the data range you have Extrapolation Estimating a value outside the data that you have. As it is outside the data you have, extrapolated values can be unreliable. Generally, avoid extrapolating values unless asked and even then treat answers ‘with caution’… 7C Regression Applying and Interpreting the Regression Equation The results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below. Mass, (x) kg 20 40 60 80 100 Length, y (cm) 48 55.1 56.3 61.2 68 The regression line was calculated to be: y = 43.89 + 0.2305x Estimate the value for y when x = 35kg. Is this Interpolation or Extrapolation? y 43.89 0.2305 x y 43.89 (0.2305 35) y 51.96cm Include the unit! Interpolation as x = 35 is within the data range we have… 7C Regression Applying and Interpreting the Regression Equation The results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below. Mass, (x) kg 20 40 60 80 100 Length, y (cm) 48 55.1 56.3 61.2 68 The regression line was calculated to be: y = 43.89 + 0.2305x Estimate the value for y when x = 120kg. Is this Interpolation or Extrapolation? y 43.89 0.2305 x y 43.89 (0.2305 120) y 71.55cm Include the unit! Extrapolation as x = 120 is outside the data range we have… 7C Regression Applying and Interpreting the Regression Equation The results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below. Mass, (x) kg 20 40 60 80 100 Length, y (cm) 48 55.1 56.3 61.2 68 The regression line was calculated to be: y = 43.89 + 0.2305x Interpret the ’43.89’ in the equation. If x = 0, y = 43.89 If the mass is 0kg, the length of the spring is 43.89cm So the 43.89 represents the starting length of the spring! The x represents mass and the y represents spring length 7C Regression Applying and Interpreting the Regression Equation The results from an experiment in which different masses were placed on a spring and the resulting length of the spring measured, are shown below. Mass, (x) kg 20 40 60 80 100 Length, y (cm) 48 55.1 56.3 61.2 68 The regression line was calculated to be: y = 43.89 + 0.2305x The x represents mass and the y represents spring length Interpret the ’0.2305’ in the equation. If we increase x by 1, y increases by 0.2305 If the mass increases by 1kg, the length of the spring increases by 0.2305cm So the 0.2305 represents the length increase of the spring after adding on an extra kilogram of mass 7C Summary • We have learnt how to calculate a line of best fit • We have used coding and learnt how to ‘undo’ it by substitution • We have learnt how to interpret a regression equation • We have looked at Interpolation and Extrapolation