Topic 10

advertisement
Topic 10
Regression Analysis—Estimation
Activity 10-5: Verbal SAT Scores
Use the data in the SPSS data set SAT99.SAV.
a) Does there appear to be a linear relation between the percentage taking the examination and the average
Verbal SAT score?
A fitted line plot is shown below. There does appear to be a linear relationship between percent taking and
Verbal SAT score.

Linear Regression
 

575







Verba l SAT

550










525




500




 
 








Verbal SAT = 572.18+ -1.07 * Pct
R-Square = 0.79
475
20
40
60
80
Pe rce nt
b)
c)
d)
Compute a regression line that uses percentage taking the examination to predict the average Verbal SAT
score. Report your regression equation.
From the plot we see that Verbal SAT = 572.18-1.07 Percent
What would you predict for the average Verbal SAT score for a state where 50% take the examination.
Verbal SAT  572.18  1.07(50)  518.68
What would be the change in the average Verbal SAT score for a state where the percentage taking the
examination increased by 10 percent?
Change  1.07(10)  10.7
e)
Construct a 99% confidence interval for 1 . Does it indicate that percentage taking the examination is a
useful predictor for the average Verbal SAT score?
1.074  2.678(.080)  1.288  1  .860
The interval does not 0. So percent taking the examination is a useful predictor of average Verbal SAT
score.
Activity 10-6: Physical Measurements
a)
b)
Enter the physical measurements taken on members of your class in an SPSS data set. Do the two
measurements appear to be linearly related?
Results will vary according to the data.
Compute a regression line that uses forearm length to predict height. Report your regression equation.
Results will vary according to the data.
What would you predict would be the height for a person whose forearm measures 17.5 inches?
Results will vary according to the data.
d) Construct a 90% confidence interval for 1 .
Results will vary according to the data.
e) Construct a 95% confidence interval for predicting the average height of people whose forearm measures
19 inches.
Results will vary according to the data.
f)
Construct a 99% confidence interval for predicting the height of an individual whose forearm measures 15
inches.
Results will vary according to the data.
c)
Activity 10-7: Aircraft Operating Costs
The SPSS data set AIRCRAFT.SAV contains data on a number of commercial aircraft. The variables are described
in the variable view.
a)
Does there appear to be a linear relation between fuel consumption and operating cost?
A fitted line plot is shown below. It does show a linear relationship between fuel consumption and
operating cost.

7000
Linear Regression
Operating Cost ($/hr) = 451.61 + 1.84 * fuel
R-Square = 0.92
Operating Cost ($/hr)


6000



5000

4000
 


3000

2000


 
 



1000
2000
3000
Fue l Consumption (gal/hr)
b)
c)
d)
Compute a regression line that uses fuel consumption to predict operating cost. Report your regression
equation.
From the fitted line plot we see that the regression equation is
Operating Cost = 451.61+1.84 Fuel Consumption
If the fuel consumption for an aircraft increases by 50 gallons per hour, by how much would you predict
that the operating cost would change?
Change = 1.84(50) = 92
Construct a 99% confidence interval for 1 . Does it indicate that fuel consumption is a useful predictor for
operating cost?
1.837  2.771(.109)  1.535  1  2.139
The interval does not contain 0. So, fuel consumption is a useful predictor of operating cost.
e)
Construct a 90% confidence interval for the average operating cost for airplanes that consume 2500
gallons per hour.
From SPSS we get that the confidence interval is (4084.7239, 5285.18495).
Activity 10-7: Measuring Cherry Trees
The SPSS data set CHERRY_TREES.SAV contains the diameter, height, and volume of a sample of black cherry
trees.
a)
Do the height and volume appear to be linearly related?
A fitted line plot is shown below. It indicates that there is a linear relationship between height and
volume.

Linear Regression
Volum e (cubic feet)
30.0

Volume (cubic feet) = -34.25 + 0.72 * hgt
R-Square = 0.65

25.0








20.0




15.0
10.0



65
70
75
80
85
He ight (fe et)
b)
c)
d)
e)
f)
Fit a regression line that uses height to predict the volume. Report your regression equation.
From the fitted line plot we see that Volume = -34.25 + .72 Height
What would you predict the volume to be for a tree that is 77 feet tall?
Volume  34.25  .72(77)  21.19
By how much will the volume change if a tree’s height increased by 2 feet?
.72(2)=1.44
Construct a 95% confidence interval for predicting the volume of an individual tree with a height of 65
feet.
From SPSS we get that the prediction interval is (4.10286, 20.69744)
Create a new variable that is the height times the square of the diameter. Use your new variable as the
explanatory variable to fit a regression that uses it to predict volume.
Volume  1.095  .002  Height  Diameter 2 
g)
Which of these two regression models does the better job of predicting volume? Explain your reasoning.
For this we need to look at R2. The results are shown below. They show that height times the square of
the diameter is a better predictor.
Predictor
Height
Height x Diameter2
R2
.655
.868
Activity 10-8: Aircraft (cont’d.)
a)
Describe the relationship between Speed and Range for the aircraft listed in the SPSS data set
AIRCRAFT.SAV. Is it strong or weak? Linear or non-linear?
A fitted line plot is shown below. It shows that there is a relationship, but that it is not a linear one.
5000

Linear Regression
4000
Range (mi)



3000

Range (mi) = -5879.74 + 16.21 * speed

 
R-Square = 0.72

2000



1000







 
0
350
400
450
500
Spe ed (mph)
For non-linear relationships practitioners use mathematical transformations to “linearize” the relationship.
Frequently used transformations are logarithms and square roots. Try various combinations of logarithms
and square roots on the variables Speed and Range. Which one gives the most linear relationship? You
can find these functions in the Functions box in Transform > Compute. The logarithm function is LG10,
and the square root function is SQRT. The combinations you should try are
b)
Speed vs. log Range
Log Speed vs. Range
Speed vs. Square Root Range
Square Root Speed vs. Range
The fitted line plots are shown below. The best one is clearly Speed vs. logRange.

5000
Linear Regression



3.50
4000
lgrange = 0.64 + 0.01 * speed

 
R-Square = 0.91


3.25

 
3.00



2.75
 





3000
Range (mi)
lgrange





Range (mi) = -40390.12 + 15781.47 * lgspeed

R-Square = 0.67
2000


1000





 



 

2.50
0
350
400
450
500
2.55
Spe ed (mph)
2.60
2.65
lgspe e d
2.70
Linear Regression

60.00
5000
Linear Regressi on

Linear Regression
4000




40.00






 



3000

Range (mi) = -12732.33 + 668.55 * sqrtspeed


R-Square = 0.70

2000

 
1000



20.00
Range (mi)
sqrtrange

sqrtrange = -61.07 + 0.21 * speed

 
R-Square = 0.83










 
0
350
400
450
Spe ed (mph)
c)
d)
500
18.00
19.00
20.00
21.00
22.00
23.00
sqrtspe e d
Fit a regression line using the transformation you chose in part (b). Report your regression equation.
Using Analyze > Regression > Linear we get the following more accurate results.
Log Range = .636 + .005 Speed
What would you predict as the range for an aircraft having a speed of 500 miles per hour. If the response
variable is Log Range you will need to raise 10 to the power of your prediction. If the response is Square
Root Range you will need to square your prediction.
LogRange  .636  .005(500)  3.136  Range  103.136  1367.73
Download