Having learnt how to calculate the equation for a least squares regression line, you are well on your way to performing a regression analysis. A full regression analysis involves several processes that include: Constructing a scatterplot to investigate the nature of the relationship between the variables Calculating the correlation coefficient (r) to give a measure of strength of the relationship Determining the equation of the regression line Interpreting the coefficients ( a y-intercept and b gradient) of the least-squares line y = a + bx Using the regression line to make predictions Using the coefficient of determination r2 ( to give a measure of the predictive power of the linear relationship Using a residual plot to test the assumptions of linearity Writing a report on your findings. Life expectancies (in years) and birth rate (no.of births/1000 people) have been determined for 10 countries as given below; Birth Rate (per 1000) 30 38 38 43 34 42 31 32 26 34 Life Expectancy (years) 66 54 43 42 49 45 64 61 61 66 1. Construct a scatterplot (use calc.) Show scaled graph complete with labelled axis and a title. 2. Calculate (r) and comment on the strength of relationship r = -0.8069 There appears to be a strong, negative linear association between life expectancy and birth rate. 3. Find the least-squares regression line (using calc.) a = 105.37 y = a + bx y = 105.37 -1.44x b = -1.44 Life expectancy = 105.37 – (1.44 x birth rate) For the regression equation: y = a + bx The slope b predicts the change in y when x changes by one unit. If b is positive, then y increases as x increases. If b is negative, then y decreases as x increases. The y-intercept represents the value of y when x = 0. 4. Interpret coefficients Slope: on average, life expectancies (y) in countries will decrease by 1.44 years for an increase in birth rate (x) on one birth per 1000 people. Intercept: on average, the life expectancy for countries with a zero birth rate is 105.37 years. Regression lines are used to predict y values given x. For example, find the life expectancy for a country with a birth rate of 35 people per 1000 people. 5. Use regression line for predictions Life expectancy = 105.37 – (1.44 x birth rate) If x = 35; y = 105.37 – (1.44 x 35) = 54.97 On average, a country with a birth rate of 35 per 1000 people would have a life expectancy of approximately 55 years. 6. Find ( r2 ) x 100 If r2 = 0.651 r2 x 100= 65.1% Therefore; 65.1% of the variation in life expectancy can be explained by variation in birth rate. When fitting a regression line using the leastsquares method, the greatest assumption made is that the original data is linear. The only way to determine linearity is by investigating the scatterplot and by performing a residual analysis by using the predicted values, and comparing them to actual values. 7. For a birth rate of 31, use the equation to find the predicted life expectancy. y = 105.37 – (1.44 x 31) y = 60.73 (Predicted value when x = 31) Actual value when x = 31 is 64. Residual = 64 – 60.73 Residual = 3.27 A key assumption made when calculating a least squares regression line is that the relationship between the variables is linear. Residual value = data value – predicted value For country A: Predicted life expectancy = 105.4 – 1.44(34) = 56.4 yrs Actual life expectancy = 66 yrs (from table) Residual value = data value – predicted value = 66 56.4 = 9.6 yrs The residual is positive due to actual data value lying above the prediction line. For country B: Predicted life expectancy = 105.4 – 1.44(34) = 56.4 yrs Actual life expectancy = 49 yrs (from table) Residual value = data value – predicted value = 49 56.4 = -7.4 yrs The residual is negative due to actual data value lying below the prediction line. Conclusion: The residual plot shows no clear pattern . The residual coordinates are randomly scattered around the x-axis. This confirms that the use of a linear equation to describe the relationship between life expectancy and birth rate is appropriate. If a residual plot shows points randomly scattered above and below zero, then the original data was linear. If a pattern is present, then a relationship exists but is not linear. From the scatterplot, we see there is a __________ (strong/moderate/weak) ____________ (positive/negative) ___________ (linear/non-linear) relationship between __________ (y variable [DV]) and _________ (x variable [IV]) for this sample. The correlation coefficient is r = ___________ . There are _________ (no?) outliers. The equation of the least-square regression line is; ____ [DV] = ____ (a) + (___ (b) x ___ [IV]) The slope of the regression line predicts that on average, ______(DV) ________ (decreases/increases) by ________ for an (decrease/increase) in ______ (IV) (units). The coefficient of determination indicates that on average, ____ (x 100) % of the variation in the ____ [DV] can be explained by the variation in _____ [IV]. The residual plot shows _________ (a/no pattern) indicating the original data was ________ (not linear/linear). 8. Report From the scatterplot, we see there is a strong, positive, linear relationship between life expectancy and birth rate for this sample. The correlation coefficient is r = - 0.807 . There are no apparent outliers. The equation of the least-square regression line is; Life Expectancy = 105.4 – ( 1.44 x Birth Rate) The slope of the regression line predicts that on average, life expectancy decreases by 1.44 years for an increase in birth rate of one per 1000 people. The coefficient of determination indicates that on average, 65.1% of the variation in life expectancy can be explained by the variation in birth rate. The residual plot shows no clear pattern indicating the original data was linear.