Topic 10 Regression Analysis—Estimation Activity 10-5: Verbal SAT Scores Use the data in the SPSS data set SAT99.SAV. a) Does there appear to be a linear relation between the percentage taking the examination and the average Verbal SAT score? A fitted line plot is shown below. There does appear to be a linear relationship between percent taking and Verbal SAT score. Linear Regression 575 Verba l SAT 550 525 500 Verbal SAT = 572.18+ -1.07 * Pct R-Square = 0.79 475 20 40 60 80 Pe rce nt b) c) d) Compute a regression line that uses percentage taking the examination to predict the average Verbal SAT score. Report your regression equation. From the plot we see that Verbal SAT = 572.18-1.07 Percent What would you predict for the average Verbal SAT score for a state where 50% take the examination. Verbal SAT 572.18 1.07(50) 518.68 What would be the change in the average Verbal SAT score for a state where the percentage taking the examination increased by 10 percent? Change 1.07(10) 10.7 e) Construct a 99% confidence interval for 1 . Does it indicate that percentage taking the examination is a useful predictor for the average Verbal SAT score? 1.074 2.678(.080) 1.288 1 .860 The interval does not 0. So percent taking the examination is a useful predictor of average Verbal SAT score. Activity 10-6: Physical Measurements a) b) Enter the physical measurements taken on members of your class in an SPSS data set. Do the two measurements appear to be linearly related? Results will vary according to the data. Compute a regression line that uses forearm length to predict height. Report your regression equation. Results will vary according to the data. What would you predict would be the height for a person whose forearm measures 17.5 inches? Results will vary according to the data. d) Construct a 90% confidence interval for 1 . Results will vary according to the data. e) Construct a 95% confidence interval for predicting the average height of people whose forearm measures 19 inches. Results will vary according to the data. f) Construct a 99% confidence interval for predicting the height of an individual whose forearm measures 15 inches. Results will vary according to the data. c) Activity 10-7: Aircraft Operating Costs The SPSS data set AIRCRAFT.SAV contains data on a number of commercial aircraft. The variables are described in the variable view. a) Does there appear to be a linear relation between fuel consumption and operating cost? A fitted line plot is shown below. It does show a linear relationship between fuel consumption and operating cost. 7000 Linear Regression Operating Cost ($/hr) = 451.61 + 1.84 * fuel R-Square = 0.92 Operating Cost ($/hr) 6000 5000 4000 3000 2000 1000 2000 3000 Fue l Consumption (gal/hr) b) c) d) Compute a regression line that uses fuel consumption to predict operating cost. Report your regression equation. From the fitted line plot we see that the regression equation is Operating Cost = 451.61+1.84 Fuel Consumption If the fuel consumption for an aircraft increases by 50 gallons per hour, by how much would you predict that the operating cost would change? Change = 1.84(50) = 92 Construct a 99% confidence interval for 1 . Does it indicate that fuel consumption is a useful predictor for operating cost? 1.837 2.771(.109) 1.535 1 2.139 The interval does not contain 0. So, fuel consumption is a useful predictor of operating cost. e) Construct a 90% confidence interval for the average operating cost for airplanes that consume 2500 gallons per hour. From SPSS we get that the confidence interval is (4084.7239, 5285.18495). Activity 10-7: Measuring Cherry Trees The SPSS data set CHERRY_TREES.SAV contains the diameter, height, and volume of a sample of black cherry trees. a) Do the height and volume appear to be linearly related? A fitted line plot is shown below. It indicates that there is a linear relationship between height and volume. Linear Regression Volum e (cubic feet) 30.0 Volume (cubic feet) = -34.25 + 0.72 * hgt R-Square = 0.65 25.0 20.0 15.0 10.0 65 70 75 80 85 He ight (fe et) b) c) d) e) f) Fit a regression line that uses height to predict the volume. Report your regression equation. From the fitted line plot we see that Volume = -34.25 + .72 Height What would you predict the volume to be for a tree that is 77 feet tall? Volume 34.25 .72(77) 21.19 By how much will the volume change if a tree’s height increased by 2 feet? .72(2)=1.44 Construct a 95% confidence interval for predicting the volume of an individual tree with a height of 65 feet. From SPSS we get that the prediction interval is (4.10286, 20.69744) Create a new variable that is the height times the square of the diameter. Use your new variable as the explanatory variable to fit a regression that uses it to predict volume. Volume 1.095 .002 Height Diameter 2 g) Which of these two regression models does the better job of predicting volume? Explain your reasoning. For this we need to look at R2. The results are shown below. They show that height times the square of the diameter is a better predictor. Predictor Height Height x Diameter2 R2 .655 .868 Activity 10-8: Aircraft (cont’d.) a) Describe the relationship between Speed and Range for the aircraft listed in the SPSS data set AIRCRAFT.SAV. Is it strong or weak? Linear or non-linear? A fitted line plot is shown below. It shows that there is a relationship, but that it is not a linear one. 5000 Linear Regression 4000 Range (mi) 3000 Range (mi) = -5879.74 + 16.21 * speed R-Square = 0.72 2000 1000 0 350 400 450 500 Spe ed (mph) For non-linear relationships practitioners use mathematical transformations to “linearize” the relationship. Frequently used transformations are logarithms and square roots. Try various combinations of logarithms and square roots on the variables Speed and Range. Which one gives the most linear relationship? You can find these functions in the Functions box in Transform > Compute. The logarithm function is LG10, and the square root function is SQRT. The combinations you should try are b) Speed vs. log Range Log Speed vs. Range Speed vs. Square Root Range Square Root Speed vs. Range The fitted line plots are shown below. The best one is clearly Speed vs. logRange. 5000 Linear Regression 3.50 4000 lgrange = 0.64 + 0.01 * speed R-Square = 0.91 3.25 3.00 2.75 3000 Range (mi) lgrange Range (mi) = -40390.12 + 15781.47 * lgspeed R-Square = 0.67 2000 1000 2.50 0 350 400 450 500 2.55 Spe ed (mph) 2.60 2.65 lgspe e d 2.70 Linear Regression 60.00 5000 Linear Regressi on Linear Regression 4000 40.00 3000 Range (mi) = -12732.33 + 668.55 * sqrtspeed R-Square = 0.70 2000 1000 20.00 Range (mi) sqrtrange sqrtrange = -61.07 + 0.21 * speed R-Square = 0.83 0 350 400 450 Spe ed (mph) c) d) 500 18.00 19.00 20.00 21.00 22.00 23.00 sqrtspe e d Fit a regression line using the transformation you chose in part (b). Report your regression equation. Using Analyze > Regression > Linear we get the following more accurate results. Log Range = .636 + .005 Speed What would you predict as the range for an aircraft having a speed of 500 miles per hour. If the response variable is Log Range you will need to raise 10 to the power of your prediction. If the response is Square Root Range you will need to square your prediction. LogRange .636 .005(500) 3.136 Range 103.136 1367.73