Stat 4220 Homework Due March 27 1) A research group is studying whether the height of a building affects the ground temperature around the building. To find out they randomly sampled 250 buildings and recorded their height and the temperature around the building. The output from the regression is shown below. Test whether there is evidence of building height affecting the ground temperature. Regression Analysis: Temperature versus Height The regression equation is Temperature = 75.0 - 0.00164 Height Predictor Constant Height S = 3.34230 Coef 74.9795 -0.001638 SE Coef 0.3069 0.001072 R-Sq = 0.5% T 244.35 -1.53 P 0.000 0.127 R-Sq(adj) = 0 H0: beta1=0 HA: beta1 ne 0 Alpha=0.05 or something reasonable Df = 249 T=-1.53 p-value = 0.127 Fail to reject Our data does not show building height significantly affects ground temperature 2) A study on the average wattage of Laramie power lines sampled 100 randomly chosen power lines and found a 99% confidence interval of (1158.8, 1262.3) Watts. Which of the following sentences is statistically accurate? a) If we did a new study of power lines in Laramie we have a 99% probability of getting a confidence interval for the Wattage between 1158.8 and 1262.3 b) We are 99% confident that the true population average Wattage for all power lines anywhere in the US is between 1158.8 and 1262.3 c) Of all Laramie power lines 99% of them will have a Wattage level that falls within the interval (8.8, 10.3) d) There is a 99% probability the next confidence interval done would correctly capture the true average wattage of power lines in Laramie e) 99% of the time that an interval on the wattage of power lines is made from Laramie power lines the population average will be between 8.8 and 10.3 3) A regression model was fit to determine how time studying for a test affects grade. The plot of the residuals is given below. Based on this plot which assumptions necessary for regression do you think may have been violated? Time studying Residual Plot 100 80 60 Residuals 40 20 0 -20 0 2 4 6 8 10 12 -40 -60 -80 Time studying Two good answers: The best is the constant variance assumption (homoscedasticity) since there appears to be a fan shape. A student who argues normality should be checked because those two (or three) at the top might just be outliers deserves to have it correct as well. 4) Which of the following data sets has the highest value of R? A) B) C) D) Plot B is the only plot has any value of R. The other plots are categorical A study on how the time of exercise affects heart rate had the following output 5) According to the output, if I exercise for time=150, what should be my heart rate? 82.5234 6) After exercising everybody has different heart rates, which means there is a lot of variability in heart rates. How much of that variability is explained by exercise time? R^2=0.5791743 A study was done to compare tree height with trunk thickness. The following output was generated from the regression model. Simple linear regression results: Dependent Variable: Tree Height Independent Variable: Trunk Size Height = 26.540844 + 8.024617 Trunk Sample size: 51 R (correlation coefficient) = 0.9415 R-sq = 0.88648456 Estimate of error standard deviation: 8.624407 Parameter Intercept Trunk Estimate Std. Err. 26.540844 2.7416365 8.024617 0.41022122 Parameter estimates: 7) Assuming the conditions are met test if trunk size is a good predictor of tree height H0: beta1=0 HA: beta1 ne 0 Alpha=0.05 T=8.024617/0.41022122=19.56 P-value = 0*2 =0 Reject Trunk size is a good predictor of tree height The output below studies whether salary should increase each year that you get older. Simple linear regression results: Dependent Variable: Salary Independent Variable: Age Salary = 43130.348 + 8.739329 Age Sample size: 100 R (correlation coefficient) = 0.0142 R-sq = 2.0028706E-4 Estimate of error standard deviation: 10103.091 Parameter estimates: Parameter Intercept Slope Estimate 43130.348 Std. Err. DF T-Stat P-Value 3426.885 98 12.5858755 <0.0001 8.739329 62.372787 98 0.14011447 0.00489 8) Would it be a good idea to use this model to predict salary given a specific age? No – the residual plot is clearly not random scatter showing the regression assumptions are terribly violated 9) The temperature of the reactor in a nuclear submarine is normally distributed. A random sample of 3 different times showed an average temperature of 324°C with a standard deviation of 54°C. Find a 95% confidence interval for the true average temperature of the sub’s reactor. T=4.303 (189.846, 458.154) 10) A 95% confidence interval for μ1-μ2, based on two independent samples of sizes 38 and 40, respectively, is (45.6, 56.7). a) Is the difference between the two sample means included in the 95% confidence interval? yes b) Is the difference between the two population means included in the 95% confidence interval? maybe c) Would the interval contain more values if the samples size were increased? no d) Is the probability that the difference between the two population means, μ1-μ2, falls between 45.6 and 56.7 equal to 0.95? no 14) The average August temperatures (y) and geographic latitudes (x) of 20 cities in the United States were studied. The regression equation for these data is Temperature = 113.6 – 1.01*(latitude) a. What is the slope of the line? -1.01 Make sure they have the negative b. Interpret the slope (how the mean August temperature is affected by a change in latitude) An increase of one latitude is a 1.01 drop in temperature c. Estimate the mean August temperature for a city with latitude of 32. 81.28 d. San Francisco has a latitude of 38. What would you predict for the mean August temperature of San Francisco? 75.22 e. Given that the mean August temperature in San Francisco is actually 64 calculate the residual (prediction error) for San Francisco. -11.22 Make sure they have the negative f. The latitude at the equator is 0. Estimate the average August temperature at the equator. 113.6 g. Explain why we should not use this equation to estimate average August temperature at the equator. It is extrapolation (the data set was American cities, and none are on the equator) 15) A car was driven 20 different times with different octane levels. Using the output from the regression, give a 99% confidence interval for the effect of octane on the car. Simple linear regression results: Dependent Variable: mileage Independent Variable: octane mileage = -53.426544 + 0.8503097 octane Sample size: 20 R (correlation coefficient) = 0.9134 R-sq = 0.8343458 Estimate of error standard deviation: 1.8180993 Parameter estimates: Parameter Intercept Slope Estimate -53.426544 Std. Err. DF T-Stat P-Value 7.824635 18 -6.827992 <0.0001 0.8503097 0.08930362 18 9.52156 <0.0001 0.85+-2.878*0.0893=(0.594, 1.105) 16) 50 different companies are competing for a bid with WYDOT to build roads. Each company submitted a sample of their asphalt for the WYDOT to test. The plot below shows the relationship between asphalt strength and the asphalt tar concentration from each company. What would be an appropriate conclusion based on this graph? Check all that apply (there may be more than one correct answer) ________ The more tar that is put into the asphalt the stronger the asphalt will be ______X__ High levels of tar concentration are associated with stronger asphalt ________ Tar causes asphalt to be stronger ______X__ There is a correlation between tar concentration and asphalt strength ________ The stronger the asphalt is the more tar that will be put into it 17) a) 3+-1.984*1.2/sqrt(214) = (2.837, 3.1627) b) We are 95% confident the average is between 2.837 and 3.1627 c) It would be more narrow