Regression Problems: 1. A researcher wants to know if there is a relationship between the number of shopping centers in a state and the retail sales (in billions $) of that state. A random sample of 8 states is listed below. After determining, via a scatter-plot, that the data followed a linear pattern, the regression line was found. Using the given data and the given regression output answer the following questions. State 1 2 3 4 5 6 7 8 a. b. c. d. e. f. Num 630 370 616 700 430 568 1200 2976 Sales 15.5 7.5 13.9 18.7 8.2 13.2 23.0 87.3 Output a= b= r= -4.930 0.030 0.991 What is the equation of the regression line? Interpret the slope in the words of the problem. Find r2 and interpret its meaning in the words of the problem. Is there a significant linear relationship between Num and Sales. Find the error for predicting the sales of a state with 1200 stores. Use the regression line to predict the sales for a state with 100 stores a. y-hat = -4.930 + 0.030x b. Slope = 0.030 means that for every increase of 1 shopping center retail sales increases 0.030 billion dollars, on average. c. r2 = .982 means that there is a 98.2% reduction in error in predicting retail sales by using number of shopping centers over the sample mean. d. The test statistic was 18.30 with a p-value 1.7 * 10-6 so there is sufficient evidence that the true slope is different from 0, meaning there is a significant linear relationship between num and sales. Or, the 95% confidence interval for the true slope was 0.026 and 0.034, which does not contain 0, so there is sufficient evidence that the true slope is different from 0, meaning there is a significant linear relationship between num and sales. e. x = 1200, y = 23.0 y-hat = 31.07 so error = 23.0 – 31.07 = -8.07 f. This is an example of extrapolation so you should not do it. 2. A pharmaceutical company is investigating the relationship between advertising expenditures and the sales of some over-the-counter (OTC) drugs. The following data represents a sample of 10 common OTC drugs. Find the equation of the regression line, using Advertising dollars as the independent variable and Sales as the response variable. Interpret the slope of the line in the words of the problem. Find r 2 and interpret it in the words of the problem. Use the line to predict the Sales if Advertising dollars = $50 million. Note that AD = Advertising dollars in millions and S = Sales in millions $. AD 22 25 29 35 38 42 46 52 65 88 S 64 74 82 90 100 120 120 142 180 230 Calculator Output a = 6.629, b = 2.569, r = .996 y-hat = 6.629 + 2.569x The slope = 2.569 means that for every increase of $1 million Advertising, sales increases $2.569 million on average. r2 = .993 so there is a 99.3% reduction in error for predicting Sales using advertising dollars. y-hat = 6.629 + 2.596(50) = 135.079 3. A chemical company wants to study the effect of extraction time on the efficiency of an extraction process. They obtained a random sample of extraction times and the corresponding efficiency scores. The output from Excel is given below. What is the regression line? Interpret the slope and R2 in the words of the problem. Use the regression line to estimate the efficiency for an extraction time of 20. You can assume 20 is in the range of the x’s. Regression Statistics Multiple R 0.864 R Square 0.746 Std Error 5.139 Obs 15 Coefficients Intercept 39.022 Time 0.764 Std Error 4.173079 0.123639 t Stat 9.350943 6.178365 P-value 3.9E-07 3.33E-05 Lower 95% 30.00684 0.496782 Upper 95% 48.03761 1.030995 y-hat = 39.022 + .764x The slope = .764 means that for every increase of 1 unit in extraction time efficiency score increases .764 units on average. r2 = .746 means that there is a 74.6% reduction is error for predicting efficiency using extraction time. y-hat = 39.022 + .764 (20) = 54.302 The model is useful because the true slope is significantly different from 0, because the 95% CI for the true slops is 0.497 to 1.031 and the is r2 = .746 reasonably high. 4. The following is output from Excel for regression analysis. The researcher wanted to predict the total cholesterol (mg/100ml) using weight (kg) as the predictor variable. Using the output, please answer the following questions? a. Use ŷ to predict the total cholesterol for a subject who weighs 70kg. b. Find the coefficient of determination and explain what this means in the words of the problem? c. Find and interpret 95% Confidence interval for B. d. Do you think weight is a good predictor total cholesterol, Explain? SUMMARY OUTPUT Regression Multiple R R Square Standard Error Observations Intercept Weight Statistics 0.265293 0.070381 76.65431 25 Coeff Std Err t Stat 199.30 85.82 2.322 1.62 1.229 1.320 ANOVA Source Regress Residual Total df 1 23 24 P-value 0.0294 0.1999 Lower 95% 21.77 -0.921 SS MS F 10231 10231 1.741 135145 5875.8 145377 Upper 95% 376.825 4.1656 y-hat = 199.30 + 1.62x for x = 70 y-hat = 312.7 r2 = .070 means that there is a 7% reduction in error for predicting total cholesterol using weight. 95% CI for B (-.921, 4.1656), means that we 95% sure that the true mean slope is between 0.921 and 4.166. This is not a good model because r2 is low and B could be 0, because 0 is in the 95% CI for B.