Correlation and Regression

advertisement
STAT412
1. Standardized tests, such as the Comprehensive Test of Basic Skills (CTBS), are used by school systems to evaluate the performance
of their students. CTBS scores for two different areas, such as math and reading, can be plotted in a scatterplot for a group of
randomly to see if the scores show evidence of a linear relationship. Construct scatterplot and calculate Pearson correlation coefficient.
Interpret the results!
Student
Reading Score
Math Score
1
47
42
2
71
81
3
64
68
4
35
43
5
43
50
6
60
75
7
38
47
8
59
59
9
67
69
10
56
57
11
67
57
12
57
54
13
69
75
14
38
38
15
54
59
16
76
63
17
53
57
18
40
40
19
47
52
20
23
22
2. The paper "The Relation Between Freely Chosen Meals and Body Habits" (Amer. J. Clinical Nutrition (1983): 32-40) reported
results of an investigation into the relationship between body build and energy intake of an individuals diet. A measure of body build
is the Quetelet index (x) with a high value of x indicating a thickset individual. The variable reflecting energy intake is y=dietary
energy density. There were nine subjects in the investigation. Calculate r with and without subject 9 included. What do you learn?
Subject
x
Y
1
221
.67
2
228
.86
3
223
.78
4
211
.54
5
231
.91
6
215
.44
7
224
.90
8
233
.94
9
268
.93
3. Underinflated or overinflated tires can increase tire wear and decrease gas mileage. A manufacturer of a new tire tested the tire for
wear at different pressures. Construct scatterplot and calculate Pearson correlation coefficient. The researcher for this study calculated
a Pearson correlation coefficient and concluded no relationship existed between the two variables. Was he correct in his conclusion?
Pressure
Mileage(thous)
30
29.5
30
30.2
31
32.1
31
34.5
32
36.3
32
35
33
38.2
33
37.6
34
37.7
34
36.1
35
33.6
35
34.2
36
26.8
36
27.4
4. The following table provides information on life expectancies for a sample of 22 countries. It also lists the number of people per
television set in each country.
Country
Life Expectancy
People Per TV
Angola
44
200
Australia
76.5
2
Cambodia
49.5
177
Canada
76.5
1.7
China
70
8
Egypt
60.5
15
France
78
2.6
Haiti
53.5
234
Iraq
67
18
Japan
79
1.8
Madagascar
52.5
92
Mexico
72
6.6
Morocco
64.5
21
Pakistan
56.5
73
Russia
69
3.2
South Africa
64
11
Sri Lanka
71.5
28
Uganda
51
191
United Kingdom
76
3
United States
75.5
1.3
Vietnam
65
29
Yemen
50
38
Instructions: When Minitab is used to answer a question below, copy the output from Minitab into your document.
a) Use Minitab to produce a scatterplot of life expectancy vs. people per television set. Does there appear to be an association between
the two variables? Elaborate briefly.
b) Have Minitab calculate the value of the Pearson correlation coefficient between life expectancy and people per television.
c) Since the association is so strongly negative, one might conclude that simply sending television sets to the countries with lower life
expectancies would cause their inhabitants to live longer. Comment on this argument.
d) If two variables have a correlation close to +1 or –1, indicating a strong linear association between them, does it follow that there
must be a cause-and-effect relationship between them?
This example illustrates the very important distinction between association and causation. Two variables may be strongly associated
(as measured by the correlation coefficient) without a cause-and-effect relationship existing between them. Often the explanation is
that both variables are related to a third variable not being measured; this variable is often called a lurking variable.
e) In the case of life expectancy and television sets, suggest a lurking variable that is associated both with a country’s life expectancy
and with the prevalence of televisions in the country.
5. If you peruse the bookshelves of a typical college professor, you will find a variety of books ranging from textbooks to esoteric technical
publications to paperback novels. In order to determine whether or not the price of a book can be determined by the number of pages it contains, a
college professor recorded the number of pages and price for 15 books on one shelf. The data are shown below.
Pages
Price
Pages
Price
Pages
Price
104*
32.95
342*
49.95
436
5.95
188*
24.95
378
4.95
458*
60.00
220*
49.95
385
5.99
466*
49.95
264*
79.95
417
4.95
469
5.99
336
4.50
417*
39.75
585
5.95
*Denotes Hardback Book
a. Relating # of pages and price, would you expect a positive or negative correlation?
b. Construct scatterplot and calculate Pearson correlation coefficient. Depict type of book on your scatterplot.
c. Construct scatterplot and calculate Pearson correlation coefficient using just the data for the hardback books.
d. Construct scatterplot and calculate Pearson correlation coefficient using just the data for the paperback books.
e. What do you learn from parts b, c, and d above.
6. Studies have shown that people who suffer sudden cardiac arrest (SCA) have a better chance of survival if a defibrillator is
administered very soon after cardiac arrest. How is survival rate related to the time between when cardiac arrest occurs and when the
defibrillator shock is delivered? This question is addressed in the paper “Improving Survival from Sudden Cardiac Arrest: The Role of
Home Defibrillators” (by J.K. Stross, University of Michigan, February 2002). The accompanying data give y = survival rate (percent)
and x = mean call-to-shock time (minutes) for a cardiac rehabilitative center (where cardiac arrests occurred while victims were
hospitalized and so the call-to-shock time tended to be short) and for four communities of different sizes.
Mean call-to-shock time,x
2
6
7
9
12
Survival Rate, y
90
45
30
5
2
Do the following by hand and on Minitab.
a.
Construct a scatter plot.
b.
Calculate the Pearson correlation coefficient.
c.
Determine equation of least squares line that can be used for predicting a value of y based on a value of x.
d.
Compute SSE =
e.
Why do we call the least squares line the “best fitting line”?
f.
Calculate r2 using the following formula:
 ( y  yˆ )
2
for the least squares line.
r2 
2
2
 ( y  y )   ( y  yˆ )
. Interpret the r2 value.
2
 ( y  y)
g. Using your equation in part c, draw the least squares line on the scatterplot you constructed in part a.
h. Use your prediction equation to predict SCA survival rate for a community with a mean call-to-shock time of 5 min.
7. Physical Characteristics of sharks are of interest to surfers and scuba divers as well as to marine researchers. Because it is difficult
to measure jaw width in living sharks, researchers would like to determine whether it is possible to estimate jaw width from body
length, which is more easily measured. The following data on x = length (in feet) and y = jaw width (in inches) for 44 sharks was
found in various articles appearing in the magazines Skin Diver and Scuba News:
x
18.7
12.3
18.6
16.4
15.7
18.3
14.3
16.6
9.4
18.2
13.2
y
17.5
12.3
21.8
17.2
16.2
19.9
13.3
15.8
10.2
19.0
16.8
x
14.6
15.8
14.9
17.6
12.1
16.4
13.6
15.3
16.1
13.5
19.1
y
13.9
14.7
15.1
18.5
12.0
13.8
14.2
16.9
16.0
15.9
17.9
x
16.7
17.8
16.2
12.6
17.8
13.8
16.2
22.8
16.8
13.6
13.2
y
15.2
18.2
16.7
11.6
17.4
14.2
15.7
21.2
16.3
13.0
13.3
Use Minitab to answer the following questions.
a. Construct scatterplot
b. Calculate Pearson correlation coefficient.
c. Determine the equation for the least squares line.
d. Calculate R2 and interpret
e. Conduct test of H0: B1 = 0 vs Ha: B1  0 at =.05.
f. Estimate the mean jaw width for sharks of length 15ft using a 95% confidence interval.
g. Assess the reasonableness of the assumptions that are required for parts e and f.
x
12.2
15.2
14.7
12.4
13.2
15.8
15.7
19.7
18.7
13.2
16.8
y
14.8
15.9
15.3
11.9
11.6
14.3
14.3
21.3
20.8
12.2
16.9
Download