Homework 8

advertisement
Stat 4220 Homework
Due March 27
1) A research group is studying whether the height of a building affects the ground temperature
around the building. To find out they randomly sampled 250 buildings and recorded their height
and the temperature around the building. The output from the regression is shown below. Test
whether there is evidence of building height affecting the ground temperature.
Regression Analysis: Temperature versus Height
The regression equation is
Temperature = 75.0 - 0.00164 Height
Predictor
Constant
Height
S = 3.34230
Coef
74.9795
-0.001638
SE Coef
0.3069
0.001072
R-Sq = 0.5%
T
244.35
-1.53
P
0.000
0.127
R-Sq(adj) = 0
2) A study on the average wattage of Laramie power lines sampled 100 randomly chosen
power lines and found a 99% confidence interval of (1158.8, 1262.3) Watts.
Which of the following sentences is statistically accurate?
a) If we did a new study of power lines in Laramie we have a 99% probability of getting a
confidence interval for the Wattage between 1158.8 and 1262.3
b) We are 99% confident that the true population average Wattage for all power lines
anywhere in the US is between 1158.8 and 1262.3
c) Of all Laramie power lines 99% of them will have a Wattage level that falls within the
interval (1158.8, 1262.3)
d) There is a 99% probability the next confidence interval done would correctly capture the
true average wattage of power lines in Laramie
e) 99% of the time that an interval on the wattage of power lines is made from Laramie
power lines the population average will be in (1158.8, 1262.3)
3) A regression model was fit to determine how time studying for a test affects grade. The
plot of the residuals is given below. Based on this plot which assumptions necessary for
regression do you think may have been violated?
Time studying Residual Plot
100
80
60
Residuals
40
20
0
-20
0
2
4
6
8
-40
-60
-80
Time studying
4) Which of the following data sets has the highest value of R?
A)
B)
C)
D)
10
12
A study on how the time of exercise affects heart rate had the following output
5) According to the output, if I exercise for time=150, what should be my heart rate?
6) After exercising everybody has different heart rates, which means there is a lot of
variability in heart rates. How much of that variability is explained by exercise time?
A study was done to compare tree height with trunk thickness. The following output
was generated from the regression model.
Simple linear regression results:
Dependent Variable: Tree Height
Independent Variable: Trunk Size
Height = 26.540844 + 8.024617
Trunk
Sample size: 51
R (correlation coefficient) = 0.9415
R-sq = 0.88648456
Estimate of error standard
deviation: 8.624407
Parameter estimates:
Parameter
Intercept
Trunk
Estimate
Std. Err.
26.540844
2.7416365
8.024617 0.41022122
7) Assuming the conditions are met, test if trunk size is a good predictor of tree height
The output below studies whether salary should increase each year that you get older.
Simple linear regression results:
Dependent Variable: Salary
Independent Variable: Age
Salary = 43130.348 + 8.739329 Age
Sample size: 100
R (correlation coefficient) = 0.0142
R-sq = 2.0028706E-4
Estimate of error standard deviation:
10103.091
Parameter estimates:
Parameter
Intercept
Slope
Estimate
43130.348
Std. Err.
DF
T-Stat
P-Value
3426.885 98 12.5858755 <0.0001
8.739329 62.372787 98 0.14011447 0.00489
8) Would it be a good idea to use this model to predict salary given a specific age?
9) The temperature of the reactor in a nuclear submarine is normally distributed. A
random sample of 3 different times showed an average temperature of 324°C with a
standard deviation of 54°C. Find a 95% confidence interval for the true average
temperature of the sub’s reactor.
10) A 95% confidence interval for μ1-μ2, based on two independent samples of sizes
38 and 40, respectively, is (45.6, 56.7).
a) Is the difference between the two sample means included in the 95%
confidence interval?
b) Is the difference between the two population means included in the 95%
confidence interval?
c) Would the interval contain more values if the samples size were
increased?
d) Is the probability that the difference between the two population means,
μ1-μ2, falls between 45.6 and 56.7 equal to 0.95?
11) The average August temperatures (y) and geographic latitudes (x) of 20 cities in the
United States were studied. The regression equation for these data is
Temperature = 113.6 – 1.01*(latitude)
a.
b.
c.
d.
What is the slope of the line?
Interpret the slope (how the mean August temperature is affected by a change in latitude)
Estimate the mean August temperature for a city with latitude of 32.
San Francisco has a latitude of 38. What would you predict for the mean August
temperature of San Francisco?
e. Given that the mean August temperature in San Francisco is actually 64 calculate the
residual (prediction error) for San Francisco.
f. The latitude at the equator is 0. Estimate the average August temperature at the equator.
g. Explain why we should not use this equation to estimate average August temperature at the
equator.
12) A car was driven 20 different times with different octane levels. Using the output from the
regression, give a 99% confidence interval for the effect of octane on the car.
Simple linear regression results:
Dependent Variable: mileage
Independent Variable: octane
mileage = -53.426544 + 0.8503097 octane
Sample size: 20
R (correlation coefficient) = 0.9134
R-sq = 0.8343458
Estimate of error standard deviation:
1.8180993
Parameter estimates:
Parameter
Intercept
Slope
Estimate
-53.426544
Std. Err.
DF
T-Stat
P-Value
7.824635 18 -6.827992 <0.0001
0.8503097 0.08930362 18
9.52156 <0.0001
13) 50 different companies are competing for a bid with WYDOT to build roads. Each company
submitted a sample of their asphalt for the WYDOT to test. The plot below shows the relationship
between asphalt strength and the asphalt tar concentration from each company.
What would be an appropriate conclusion based on this graph?
Check all that apply (there may be more than one correct answer)
________ The more tar that is put into the asphalt the stronger the asphalt will be
________ High levels of tar concentration are associated with stronger asphalt
________ Tar causes asphalt to be stronger
________ There is a correlation between tar concentration and asphalt strength
________ The stronger the asphalt is the more tar that will be put into it
14) In January 2013, the journal Pediatrics published data collected from 214 mother and infant pairs of
low-income African-American mothers aged 18 to 35 years in central North Carolina. Data was
collected on the number of televisions in the household. The data showed a mean of 3.0 televisions
with a standard deviation of 1.2.
“Maternal Characteristics and perception of Temperament Associated with Infant TV Exposure“ Pediatrics February 1, 2013 vol. 131 no. 2 e390-e397 doi: 10.1542/peds.20121224 http://pediatrics.aappublications.org.libproxy.uwyo.edu/content/131/2/e390/T2.expansion.html
a) Construct a 95% Confidence interval
b) Interpret your interval with a proper English sentence
c) If a computer got the t-score instead of the t-table, how would the interval change?
Download