Regression Problems and Answers (Word)

advertisement

a. b. c. d. e.

Regression Problems:

1.

A researcher wants to know if there is a relationship between the number of shopping centers in a state and the retail sales (in billions $) of that state. A random sample of 8 states is listed below. After determining, via a scatter-plot, that the data followed a linear pattern, the regression line was found. Using the given data and the given regression output answer the following questions.

State Num Sales Output

1 630 15.5

2 370 7.5

3 616 13.9

4 700 18.7

5 430 8.2

6 568 13.2

7 1200 23.0

8 2976 87.3 a = b = r =

-4.930

0.030

0.991

What is the equation of the regression line?

Interpret the slope in the words of the problem.

Find r 2 and interpret its meaning in the words of the problem.

Find the error for predicting the sales of a state with 1200 stores.

Use the regression line to predict the sales for a state with 100 stores a.

y-hat = -4.930 + 0.030x b.

Slope = 0.030 means that for every increase of 1 shopping center retail sales increases

0.030 billion dollars, on average. c.

r 2 = .982 means that there is a 98.2% reduction in error in predicting retail sales by using number of shopping centers. d.

x = 1200, y = 23.0 y-hat = 31.07 so error = 23.0 – 31.07 = -8.07 e.

This is an example of extrapolation so you should not do it.

AD

22

25

S

64

74

29

35

82

90

38 100

42 120

46 120

52 142

65 180

88 230

2.

A pharmaceutical company is investigating the relationship between advertising expenditures and the sales of some over-the-counter (OTC) drugs. The following data represents a sample of 10 common OTC drugs. Find the equation of the regression line, using Advertising dollars as the independent variable and Sales as the response variable.

Interpret the slope of the line in the words of the problem. Find r 2 and interpret it in the words of the problem. Use the line to predict the Sales if Advertising dollars = $50 million. Note that AD = Advertising dollars in millions and S = Sales in millions $.

Calculator Output a = 6.629, b = 2.569, r = .996 y-hat = 6.629 + 2.569x

The slope = 2.569 means that for every increase of $1 million Advertising, sales increases $2.569 million on average. r 2 = .993 so there is a 99.3% reduction in error for predicting Sales using advertising dollars. y-hat = 6.629 + 2.596(50) = 135.079

3.

A chemical company wants to study the effect of extraction time on the efficiency of an extraction process. They obtained a random sample of extraction times and the corresponding efficiency scores. The output from Excel is given below. What is the regression line? Interpret the slope and R 2 in the words of the problem. Use the regression line to estimate the efficiency for an extraction time of 20. You can assume

20 is in the range of the x’s.

Regression Statistics

Multiple R

R Square

Std Error

Obs

0.864

0.746

5.139

15

Intercept

Time

Coefficients

39.022

0.764

Std Error t Stat

4.173079 9.350943

0.123639 6.178365

P-value

3.9E-07

3.33E-05

Lower 95% Upper 95%

30.00684

0.496782

48.03761

1.030995 y-hat = 39.022 + .764x

The slope = .764 means that for every increase of 1 unit in extraction time efficiency score increases .764 units on average. r 2 = .746 means that there is a 74.6% reduction is error for predicting efficiency using extraction time. y-hat = 39.022 + .764 (20) = 54.302

The model is useful because the true slope is significantly different from 0, because the 95% CI for the true slops is 0.497 to 1.031 and the is r 2 = .746 reasonably high.

4.

The following is output from Excel for regression analysis. The researcher wanted to predict the total cholesterol (mg/100ml) using weight (kg) as the predictor variable.

Using the output, please answer the following questions? a.

Use ŷ to predict the total cholesterol for a subject who weighs 70kg. b.

Find the coefficient of determination and explain what this means in the words of the problem? c.

Find and interpret 95% Confidence interval for B. d.

Do you think weight is a good predictor total cholesterol, Explain?

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.265293

R Square

Standard Error

Observations

0.070381

76.65431

25

ANOVA

Source

Regress

Residual

Total df

1

23

24

SS MS F

10231 10231 1.741

135145 5875.8

145377

Coeff Std Err t Stat P-value

Intercept 199.30 85.82 2.322 0.0294

Weight 1.62 1.229 1.320 0.1999 y-hat = 199.30 + 1.62x  for x = 70 y-hat = 312.7

Lower 95%

21.77

-0.921

Upper 95%

376.825

4.1656 r 2 = .070 means that there is a 7% reduction in error for predicting total cholesterol using weight.

95% CI for B (-.921, 4.1656), means that we 95% sure that the true mean slope is between -

0.921 and 4.166.

This is not a good model because r 2 is low and B could be 0, because 0 is in the 95% CI for B.

Download