Homework 8 Key

advertisement
Stat 4220 Homework
Due March 27
1) A research group is studying whether the height of a building affects the ground temperature
around the building. To find out they randomly sampled 250 buildings and recorded their height
and the temperature around the building. The output from the regression is shown below. Test
whether there is evidence of building height affecting the ground temperature.
Regression Analysis: Temperature versus Height
The regression equation is
Temperature = 75.0 - 0.00164 Height
Predictor
Constant
Height
S = 3.34230
Coef
74.9795
-0.001638
SE Coef
0.3069
0.001072
R-Sq = 0.5%
T
244.35
-1.53
P
0.000
0.127
R-Sq(adj) = 0
H0: beta1=0
HA: beta1 ne 0
Alpha=0.05 or something reasonable
Df = 249
T=-1.53
p-value = 0.127
Fail to reject
Our data does not show building height significantly affects ground temperature
2) A study on the average wattage of Laramie power lines sampled 100 randomly chosen
power lines and found a 99% confidence interval of (1158.8, 1262.3) Watts.
Which of the following sentences is statistically accurate?
a) If we did a new study of power lines in Laramie we have a 99% probability of getting a
confidence interval for the Wattage between 1158.8 and 1262.3
b) We are 99% confident that the true population average Wattage for all power lines
anywhere in the US is between 1158.8 and 1262.3
c) Of all Laramie power lines 99% of them will have a Wattage level that falls within the
interval (8.8, 10.3)
d) There is a 99% probability the next confidence interval done would correctly capture
the true average wattage of power lines in Laramie
e) 99% of the time that an interval on the wattage of power lines is made from Laramie
power lines the population average will be between 8.8 and 10.3
3) A regression model was fit to determine how time studying for a test affects grade. The
plot of the residuals is given below. Based on this plot which assumptions necessary for
regression do you think may have been violated?
Time studying Residual Plot
100
80
60
Residuals
40
20
0
-20
0
2
4
6
8
10
12
-40
-60
-80
Time studying
Two good answers: The best is the constant variance assumption (homoscedasticity) since there
appears to be a fan shape.
A student who argues normality should be checked because those two (or three) at the top
might just be outliers deserves to have it correct as well.
4) Which of the following data sets has the highest value of R?
A)
B)
C)
D)
Plot B is the only plot has any value of R. The other plots are categorical
A study on how the time of exercise affects heart rate had the following output
5) According to the output, if I exercise for time=150, what should be my heart rate?
82.5234
6) After exercising everybody has different heart rates, which means there is a lot of
variability in heart rates. How much of that variability is explained by exercise time?
R^2=0.5791743
A study was done to compare tree height with trunk thickness. The following output
was generated from the regression model.
Simple linear regression results:
Dependent Variable: Tree Height
Independent Variable: Trunk Size
Height = 26.540844 + 8.024617
Trunk
Sample size: 51
R (correlation coefficient) = 0.9415
R-sq = 0.88648456
Estimate of error standard
deviation: 8.624407
Parameter
Intercept
Trunk
Estimate
Std. Err.
26.540844
2.7416365
8.024617 0.41022122
Parameter estimates:
7) Assuming the conditions are met test if trunk size is a good predictor of tree height
H0: beta1=0
HA: beta1 ne 0
Alpha=0.05
T=8.024617/0.41022122=19.56
P-value = 0*2 =0
Reject
Trunk size is a good predictor of tree height
The output below studies whether salary should increase each year that you get older.
Simple linear regression results:
Dependent Variable: Salary
Independent Variable: Age
Salary = 43130.348 + 8.739329 Age
Sample size: 100
R (correlation coefficient) = 0.0142
R-sq = 2.0028706E-4
Estimate of error standard deviation:
10103.091
Parameter estimates:
Parameter
Intercept
Slope
Estimate
43130.348
Std. Err.
DF
T-Stat
P-Value
3426.885 98 12.5858755 <0.0001
8.739329 62.372787 98 0.14011447 0.00489
8) Would it be a good idea to use this model to predict salary given a specific age?
No – the residual plot is clearly not random scatter showing the regression assumptions
are terribly violated
9) The temperature of the reactor in a nuclear submarine is normally distributed. A
random sample of 3 different times showed an average temperature of 324°C with a
standard deviation of 54°C. Find a 95% confidence interval for the true average
temperature of the sub’s reactor. T=4.303
(189.846, 458.154)
10) A 95% confidence interval for μ1-μ2, based on two independent samples of sizes
38 and 40, respectively, is (45.6, 56.7).
a) Is the difference between the two sample means included in the 95%
confidence interval? yes
b) Is the difference between the two population means included in the 95%
confidence interval? maybe
c) Would the interval contain more values if the samples size were
increased? no
d) Is the probability that the difference between the two population means,
μ1-μ2, falls between 45.6 and 56.7 equal to 0.95? no
14) The average August temperatures (y) and geographic latitudes (x) of 20 cities in the
United States were studied. The regression equation for these data is
Temperature = 113.6 – 1.01*(latitude)
a. What is the slope of the line?
-1.01
Make sure they have the negative
b. Interpret the slope (how the mean August temperature is affected by a change in latitude)
An increase of one latitude is a 1.01 drop in temperature
c. Estimate the mean August temperature for a city with latitude of 32.
81.28
d. San Francisco has a latitude of 38. What would you predict for the mean August
temperature of San Francisco?
75.22
e. Given that the mean August temperature in San Francisco is actually 64 calculate the
residual (prediction error) for San Francisco.
-11.22 Make sure they have the negative
f.
The latitude at the equator is 0. Estimate the average August temperature at the equator.
113.6
g. Explain why we should not use this equation to estimate average August temperature at the
equator.
It is extrapolation (the data set was American cities, and none are on the equator)
15) A car was driven 20 different times with different octane levels. Using the output from the
regression, give a 99% confidence interval for the effect of octane on the car.
Simple linear regression results:
Dependent Variable: mileage
Independent Variable: octane
mileage = -53.426544 + 0.8503097 octane
Sample size: 20
R (correlation coefficient) = 0.9134
R-sq = 0.8343458
Estimate of error standard deviation:
1.8180993
Parameter estimates:
Parameter
Intercept
Slope
Estimate
-53.426544
Std. Err.
DF
T-Stat
P-Value
7.824635 18 -6.827992 <0.0001
0.8503097 0.08930362 18
9.52156 <0.0001
0.85+-2.878*0.0893=(0.594, 1.105)
16) 50 different companies are competing for a bid with WYDOT to build roads. Each company
submitted a sample of their asphalt for the WYDOT to test. The plot below shows the relationship
between asphalt strength and the asphalt tar concentration from each company.
What would be an appropriate conclusion based on this graph?
Check all that apply (there may be more than one correct answer)
________ The more tar that is put into the asphalt the stronger the asphalt will be
______X__ High levels of tar concentration are associated with stronger asphalt
________ Tar causes asphalt to be stronger
______X__ There is a correlation between tar concentration and asphalt strength
________ The stronger the asphalt is the more tar that will be put into it
17) a) 3+-1.984*1.2/sqrt(214) = (2.837, 3.1627)
b) We are 95% confident the average is between 2.837 and 3.1627
c) It would be more narrow
Download