Least-Squares Regression part 4

advertisement

-0AP Statistics

The Least-Squares Regression Line

1.

Following are the mean heights of Kalama children:

Age (months) 18 19 20 21 22 23 24 25 26 27 28 29

Height (cm) 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5 a) Sketch a scatter plot b) What is the correlation coefficient? Interpret in terms of the problem. c) Calculate and interpret the slope. d) Calculate and interpret the y-intercept. e) Write the equation of the regression line. Draw the regression line. f) Predict the height of a 32 month old child. g) Calculator tricks.

Graphing a scatter plot.

Determining the regression line.

Drawing the regression line.

Using the regression line for predictions.

2.

Good runners take more steps per second as they speed up. Here are the average numbers of steps per second for a group of top female runners at different speeds. The speeds are in feet per second.

Speed (ft/s) 15.86 16.88 17.50 18.62 19.97 21.06 22.11

Steps per second 3.05 3.12 3.17 3.25 3.36 3.46 3.55 a) You want to predict steps per second from running speed. Which is the explanatory variable?

Make a scatterplot on your calculator of the data with this goal in mind. b) What is the correlation coefficient? Interpret in terms of the problem. c) Calculate and interpret the slope. d) Calculate and interpret the y-intercept. e) Write the equation of the regression line. Draw the regression line. f)

If you need to cover 20 ft/s to win a race, predict the steps per second you’ll need to maintain.

3.

According to the article “First-Year Academic Success...”(1999) there is a mild correlation ( r =.

55) between high school GPA and college GPA. The high school GPA’s have a mean of 3.7 and standard deviation of 0.47. The college GPA’s have a mean of 2.86 with standard deviation of 0.85. a) What is the explanatory variable? b) What is the slope of the regression line of college GPA on high school GPA? Intercept?

Interpret these in context of the problem. c)

Billy Bob’s high school GPA is 3.2, what could we expect of him in college?

1

4.

Car dealers across North America use the “Red Book” to help them determine the value of used cars that their customers trade in when purchasing new cars. The book lists on a monthly basis the amount paid at recent used-car auctions and indicates the values according to condition and optional features, but does not inform the dealers as to how odometer readings affect the trade-in value. In an experiment to determine whether the odometer reading should be included, ten 3-year-old cars are randomly selected of the same make, condition, and options. The trade-in value (in $100) and mileage (in 1000s of miles) are shown below.

Odometer 59 92 61 72 52 67 88 62 95 83

Trade-in 37 31 43 39 41 39 35 40 29 33 a) Find the sample regression line for determining how the odometer reading affects the trade-in value of the car. b) Interpret the slope in terms of the problem. c) Calculate and interpret the correlation coefficient. d) Calculate and interpret the coefficient of determination. e) Predict the trade-in value of a car with 60,000 miles. f) What would be the odometer reading of a car with a trade-in value of $4200?

5.

The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several dealers’ lots.

.

A computer printout showing the results of a straight

Plymouth Voyagers Scatter Plot line to the data by the method of least squares gives:

Price = 12.37 – 1.13 Age

R-sq = 75.5%

14

12 a) Find the correlation coefficient for the relationship

10 between price and age of Voyagers based on these data. b) What is the slope of the regression line? Interpret

8 it in the context of these data. c) How will the size of the correlation coefficient change if the 10-year-old Voyager is removed

6

4 from the data set? Explain. d) How will the slope of the LSRL change if the 10-

2 year-old Voyager is removed from the data?

2 4 6 8 10

Age_in_years

6.

The paper “A Cross-National Relationship between Sugar Consumption and Major Depression?”

( Depression and Anxiety [2002]) concluded that there was a strong correlation ( r

.9444

) between refined sugar consumption (calories per person per day) and annual rate of major depression (cases per 100 people) based on data from 6 countries. The average sugar consumption was 340.83 calories per person per day with a standard deviation of 110.56 calories while the annual rate of depression was 4.26 cases with a standard deviation of 1.338 cases. a) What is the slope of the regression line of annual rate of depression based on sugar consumption?

What is the y-intercept? Interpret the two in context of the problem. b) Use the regression line to predict the depression rate of the United States if the average person consumes 300 calories per person per day. c)

New Zealand’s depression rate is 5.7 annual cases per 100 people. Use the model to find the possible sugar consumption. Does the regression line allow us to make this prediction? Explain.

2

Linear Regression part 2

7.

In one of the Boston city parks there has been a problem with muggings in the summer months. A police cadet took a random sample of 10 days (out of the 90-day summer) and compiled the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day.

.

x y

10 15 16 1 4 6 18 12 14 7

5 2 1 9 7 8 1 5 3 6 a) Sketch a scatter plot. What is the regression line? b) What is the correlation coefficient? Interpret in terms of the problem. c) Interpret the slope in terms of the problem. d) Find the coefficient of determination and interpret in terms of the problem. e) Predict the number of muggings if there are 9 police officers on duty.

8.

Foal weight at birth is an indicator of health, so it is of interest to breeders of thoroughbred horses.

Is foal weight related to the weight of the mare? The accompanying data are from the article

“Suckling Behavior Does Not Measure Milk Intake in Horses” ( animal Behavior [1999])

Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Mare weight(kg) 556 638 588 550 580 642 568 642 556 616 549 504 515 551 594

Foal weight(kg) 129 119 132 123.5 112 113.5

95 104 104 93.5 108.5

95 117.5 128 127.5

a) Find the equation of the regression line. b) Interpret the slope in terms of the problem. c) Interpret the y-interest in terms of the problem. d) Calculate and interpret the correlation coefficient. e) Calculate and interpret the coefficient of determination.

9.

One measure of the success of knee surgery is postsurgical range of motion for the knee joint.

Postsurgical range of motion was recorded for 12 patients who had surgery following a knee dislocation. The age of each patient was also recorded (“Reconstruction…” American Journal of

Sports Medicine). The average age was 25.83 years and standard deviation of 7.578 years. The average range of motion was 130.1 degrees with a standard deviation of 11.927 degrees. The correlation coefficient was r = .5534. a) If we use age to try and predict the range of motion, what is the slope? What is the y-intercept?

Interpret the two in context of the problem. b) Use the regression line to predict the range of motion of someone 32 years of age. c) Use the regression line to predict the range of motion of someone 50 years of age. Do you feel this is an accurate prediction? Explain your thoughts.

3

10.

The average prices (in dollars) per ounce of gold and silver for the years 1986 through 1994 are given below.

Year 1986 1987 1988 1989 1990 1991 1992 1993 1994

Gold 368 478 438 383 385 363 345 361 389

Silver 5.47

7.01

6.53

5.50

4.82

4.04

3.94

4.30

5.30

a.

What is the explanatory variable? Explain. b.

Find the regression line for gold predicting silver. c.

Interpret the slope and y-intercept. d.

What is the correlation coefficient? Interpret. e.

Find the regression line for silver predicting gold. f.

Interpret the slope and y-intercept. g.

What is the correlation coefficient? Interpret. Compare your answer to part ‘d’. h.

What is the coefficient of determination? Interpret.

11.

Newsweek gave the following 1994 average weekly earnings from allowances, chores, work, and gifts for children of ages 4 through 12.

Age 4 5 6 7 8 9 10 11 12

Earnings $5.

87 $7.

42 $7.

62 $10.

63 $10.

65 $10.

69 $12.

01 $13.

79 $20.

19 a.

Construct a scatter plot. b.

Interpret the slope in terms of the problem. c.

Find the coefficient of determination and interpret in terms of the problem. d.

Find the correlation coefficient and interpret in terms of the problem. e.

Predict the weekly earnings of a child who is age 16. Do you think this is a good prediction?

Explain.

4

Linear Regression part 3

Residuals

A residual is the difference between an observed value of the response variable and the value predicted by the regression line. residual = observed y – predicted y OR RESID = y

 y

Special property: the mean of the residuals is always zero

Residual plot x: explanatory variable y: RESID ( 2 nd /LIST:RESID )

Shape:

Let’s reexamine the data from #4. This data produces a favorable residual plot that indicates the line is a good model for the data.

Age (months) 18 19 20 21 22 23 24 25 26 27 28 29

Height (cm) 76.1 77.0 78.1 78.2 78.8 79.7 79.9 81.1 81.2 81.8 82.8 83.5

Interpret. Sketch the residual plot.

Now try the following data set. This data produces a residual plot that indicates the line is not a good model. x 1 3 4 5 6 8 10 12 15 y .

.

.

.

.

.

.

Sketch the residual plot.

Interpret.

Note: r =

Now try the following data set. This data also produces a residual plot that indicates the line begins to fail as a good model. x y

2 4 7 9 12 15 20 21 25 27 29 30

9 13 25 30 35 49 65 75 70 73 99 79

Sketch the residual plot.

Note: r =

Interpret.

5

12.

Success in hunting varies greatly among species of animals. Lions, who hunt singly, are rarely successful in more than 10 percent of their hunts. Wild African dogs, who hunt in packs, are among the most efficient of all hunters, succeeding at a rate of over 90 percent of their hunts.

In the early 1960’s, researcher Jane Goodall discovered that chimpanzees were not solely vegetarian in their diets, as had previously been thought. This discovery spurred a tremendous amount of primate research. Some of the latest primatology research has been done on chimpanzees to find out if larger hunting parties increase the chances of a successful hunt. The results of one such research project are summarized in the table for the number of chimpanzees in the hunting party versus the percentage of successful hunts.

Number of Chimps 1 2 3 4 5 6 7 8 9 10 12 13 14 15 16

Percent of Success 20 30 28 42 40 58 45 62 65 63 75 75 78 75 82 a.

Construct a scatter plot. b.

Determine the regression line. c.

Interpret the y-intercept. Does the interpretation make sense in this context? d.

Interpret the slope. e.

Find the correlation coefficient and interpret in terms of the problem. f.

Find the coefficient of determination and interpret in terms of the problem. g.

Sketch the residual plot. Interpret in terms of the problem.

13.

How quickly can athletes return to their sport following injuries requiring surgery? The paper

“Arthroscopic Distal Clavicle Resection for Isolated Atraumatic Osteolysis in Weight Lifters”

(American Journal of Sports Medicine, 1998) discovered their was a moderate positive (r = .55) linear relationship between a lifters age and the number of days after arthroscopic shoulder survgery before being able to return to their sport between 10 weight lifters. The average age of the weight lifters 30.4 with standard deviation of 2.875 years. The average number of days before being able to return to their sport was 3.2 days with a standard deviation of 1.398 days. a.

Determine the line to predict the number of days based on the age of the weight lifter. b.

Determine the coefficient of determination and interpret in terms of the problem. c.

Given the spread of the lifters was from 26 to 34 years old, predict the number of days for a

28 year old lifter. Do you feel this prediction is accurate? Explain.

Linear Regression part 4

6

14.

The growth and decline of forests is a matter of great public and scientific interest. The paper

“Relationships Among Crown Condition, Growth, and Stand Nutrition in Seven Northern Vermont

Sugarbushes” included a scatter plot of y = mean crown dieback (%), which is one indicator of growth retardation, and x = soil pH. A statistical computer package MINITAB gives the following analysis:

The regression equation is dieback=31.0 – 5.79 soil pH

Predictor Coef Stdev

Constant 31.040 5.445 soil pH -5.792 1.363 s=2.981 R-sq=51.5% t-ratio

5.70

-4.25 p

0.000

0.001 a) What is the equation of the least squares line? b) Where else in the printout do you find the information for the slope and y -intercept? c) Roughly, what change in crown dieback would be associated with an increase of 1 in soil pH? d) What value of crown dieback would you predict when soil pH = 4.0? e) Would it be sensible to use the least squares line to predict crown dieback when soil pH = 5.67? f) What is the correlation coefficient?

15.

The following output data from MINITAB shows the number of teachers (in thousands) for each of the states plus the District of Columbia against the number of students (in thousands) enrolled in grades K-12.

Predictor Coef

Constant 4.486

Stdev

2.025

Enroll 0.053401 0.001692 s=2.589 R-sq=81.5% t-ratio

2.22

31.57 p

0.031

0.000 a) What is the equation of the least squares line? Interpret the slope. b) Find the correlation coefficient and coefficient of determination. Interpret in the context of the problem. c) Predict the number of students if the number of teachers in the state is 40,000. d) Predict the number of teachers if the number of students in the state is 35,700.

16.

The following output data from MINITAB shows the height of girls (in cm) based on the number of years old.

Predictor Coef

Constant 76.61

Stdev

1.188

Age(yrs) 6.3661 0.1672 t-ratio

64.52

38.02 p

0.000

0.000 s=1.518 R-sq=99.5% a) What is the equation of the least squares line? Interpret the slope. b) Find the correlation coefficient and coefficient of determination. Interpret in the context of the problem. c) Predict the height of a 3 year old girl. d) Predict the age if a girl is 135 cm.

7

17.

Women made significant gains in the 1970’s in terms of their acceptance into professions that had been traditionally populated by men. To measure just how big these gains were, we will compare the percentage of professional degrees award to women in 1973-1974 to the percentage awarded in

1978-1979 for selected fields of student.

Field Degrees in 73-74 Degrees in 78-79

Dentistry

Law

Medicine

Optometry

Osteopathic medicine

Podiatry

Theology

2.0%

11.5

11.2

4.2

2.8

1.1

5.5

11.9%

28.5

23.1

13.0

15.7

7.2

13.1

Veterinary medicine 11.2 28.9 a) What is the regression line? b) Interpret the slope in terms of the problem. c) Find the coefficient of determination and interpret in terms of the problem. d) Sketch the residual plot. Interpret. e) Find the residual for optometry. f) Find the residual for veterinary medicine. Did the regression line over or under predict?

Explain.

18.

Shells of mollusks function as both part of the skeletal system and as protective armor. It has been argued that many features of these shells were the result of natural selection in the constant battle against predators. The paper “Postmortem Changes in Strength of Gastropod Shells” included scatter plot of data on x = shell height (cm) and y = breaking strength (newtons). The least squares line for a sample of 38 hermit crab shells was

   

.

x . a.

What are the slope and intercept of this line? b.

When shell height increases by 1 cm, by how much does breaking strength tend to change? c.

What breaking strength would you predict when shell height is 2 cm? d.

Does this approximate linear relationship appear to hold for shell heights as small as 1 cm?

Explain your thoughts.

19.

The following is a table of the number of registered automatic weapons (in thousands) of selected states and their corresponding murder rates.

Weapons 116 .

.

.

.

.

.

.

Rates .

.

.

.

.

.

.

a.

Determine the regression line. b.

Predict the number of weapons for a state with a rate of 8.5? c.

Predict the murder rate for a state with 10,000 registered automatic weapons.

8

Linear Regression part 5

20.

The data come from a study of ice cream consumption that spanned the springs and summers of three years. The ice cream consumption (pints per capita per year), family income of consumers

($1000 per year) and the temperature (degrees Fahrenheit) is listed below.

Consumption .

.

.

.

.

.

.

.

.

Income 18 25 .

.

.

.

.

.

.

.

.

Temperature y

41 56 63 68 69 y

65 61 47 32 24 a.

Complete two scatter plots with consumption being the response variable for each plot. b.

Find the two regression lines. c.

Interpret the slopes. d.

Interpret the coefficient of determinations. e.

Sketch and interpret both residual plots. f.

Which do you think is the better predictor of consumption? Explain. g.

Predict the consumption for a temperature of 53 degrees. h.

Predict the consumption for an income of $17,500. i.

Predict the income and temperature for 3 gallons a year.

21.

Given the following data sets, find the regression line. Sketch the residual plot and comment on the likelihood of the regression line being a good model. x y

2 3 4 5 6 7 8 9

86 96 103 110 115 120 130 131 x y

3 6 8 9 11 14 18 20

19 22 39 50 75 87 96 125

9

22.

The following is the points-per-game average for Magic Johnson for each of his 13 years of regular season play and the ensuing playoffs.

Year 80 81 82 83 84 85 86 87 88 89 90 91 96

Season 18 0 .

.

.

.

.

.

.

.

.

.

.

.

Playoffs .

.

.

.

.

.

.

.

.

.

.

.

a) Sketch a scatter plot. Find the sample regression line. b) Interpret the slope in terms of the problem. c) Interpret the y-interest in terms of the problem. d) Calculate and interpret the correlation coefficient. e) Calculate and interpret the coefficient of determination. f) Predict the playoff average for a season average of 20 points. g) Sketch the residual plot. Interpret.

23.

The reasons given by workers for quitting their jobs generally fall into one of two categories: (1) worker quits to seek or take a different job, or (2) worker quits to withdraw from the labor force.

Economic theory suggests that wages and quit rates are related. A MINITAB printout of the simple linear regression of quit rate (100s) and average hourly wage is shown below.

Predictor Coef Stdev t-ratio p

Constant 4.8615 0.5201 9.35 0.000

AveWage -0.34655 0.05866 -5.91 0.000 s=0.4862 R-sq=72.9% a.

Determine the regression line. Interpret the coefficient of determination and slope. b.

If a company has a quite rate of 225 employees, what might their average hourly wage be? c.

What is the correlation coefficient? What does it say about the problem?

24.

The following information comes from the September 1998 issue of Beanie World Magazine.

Name Age(months) as of

9/1998

Retired or Current Value ($)

Batty the Bat

Bumble the Bee

Digger the Red Crab

Echo the Dolphin

Iggy the Iguana

Inch the Inchworm

12

28

40

17

10

28

C

R

R

R

C

R

12

600

150

20

10

20

10

Kiwi the Toucan

Mistic the Unicorn

Nuts the Squirrel

Patty the Platypus

Princess the Bear

Rex the Tyrannosaurus

Splash the Orca Whale

Stripes the Tiger

40

11

21

64

12

40

52

40

R

R

C

R

C

R

R

R

165

45

10

800

65

825

150

400 a.

Discuss how you could display all the above information except the names on a scatter plot. b.

Determine the regression line. c.

Interpret the slope. d.

Predict the value of a Beanie that is 45 weeks old. How could is the prediction? e.

Are there any outliers? Are there any influential points? Explain. f.

What is the residual for Digger the Red Crab? g.

What is the residual for Princess the Bear?

11

Download