Stat 101L – Exam

advertisement
Stat 101L – Exam 2
March 7, 2008
Name: ________________________
INSTRUCTIONS: Read the questions carefully and completely. Answer each question
and show work in the space provided. Partial credit will not be given if work is not
shown. When asked to explain, describe, or comment, do so within the context of the
problem. Be sure to include units when dealing with quantitative variables.
1. [15 pts] Short answer.
a) [2] Statistics is about … _______________. (Fill in the blank with one
word.)
b) [5] The correlation between x and y is r = –0.90. Additionally, x =10,
y =20, sx=3.0 and sy=6.0. What are the values of estimated slope, b1, and
the estimated y-intercept, b0, for the line of best fit? Also give the
equation of the line of best fit.
c) [2] What does a correlation coefficient of r = 0 indicate about the
relationship between two numerical variables?
d) [2] A numerical summary of a sample is called a ______________, while
a numerical summary of a population is called a _______________.
e) [2] When participants do not know which treatment group they are in, the
experiment is said to be _________________.
f) [2] What is replication within an experiment?
1
2.
[37 pts] An experiment is done with people with chronic back pain. The
experiment wishes to see the relationship between the dose of a pain medication
and the number of hours of relief from pain. Sixteen individuals with chronic
back pain agree to participate in the experiment. The individuals are randomly
assigned to treatment groups: 0.5 mg, 1 mg, 1.5 mg and 2 mg of a pain medication
so that there are 4 individuals in each treatment group.
a) [2] Answer the question Who? for this problem.
b) [4] Answer the question What? for this problem. Be sure to identify the
type of variable (categorical or numerical) and include units where
appropriate.
c) [3] Why is this an experiment and not an observational study?
d) [3] Is there a placebo? Explain briefly.
e) [2] There is a very important outside variable associated with the
participants that cannot be controlled in this experiment. What is that
variable?
2
f) [3] Below is a plot of hours of relief versus amount of medication.
Relief (hours)
15
10
5
0
0
.5
1
1.5
2
Dose (mg)
Describe the relationship in terms of direction, form, strength, and indicate
any unusual points.
Below is partial JMP output for the least squares regression line, e.g. the line of best fit.
Linear Fit
Predicted Relief (hours) = 1.39 + 3.34*Dose (mg)
Summary of Fit
RSquare
RSquare Adj
Root Mean Square Error
Mean of Response
Observations (or Sum Wgts)
0.454301
0.415323
2.187619
5.5625
16
g) [5] Give an interpretation of the slope within the context of the problem.
3
h) [2] Give an interpretation of the y-intercept within the context of the
problem?
i) [2] Use the least squares regression line to predict the hours of relief for a
dose of 2 mg.
j) [2] One of the participants in the experiment who was given 2 mg of the
pain reliever experienced 9.2 hours of relief. What is the residual for this
participant?
k) [3] Graph the least squares regression line on the plot in f). In order to get
full credit it must be obvious to me that you are using the equation to draw
the line.
l) [2] How much of the variability in the hours of relief can be explained by
the linear relationship with dose?
m) [4] On the next page is a plot of residuals. Describe what you see in the
plot and what this tells you about the least squares regression line for
predicting the hours of relief for a given dosage of the pain reliever.
4
5
4
3
Residual
2
1
0
-1
-2
-3
-4
-5
0
.5
1
1.5
2
Dose (mg)
3.
[8 pts] One of the goals of re-expression is to make the scatter in a scatter plot
more even across all levels of the explanatory variable. With this in mind, the
response in the pain relief experiment (problem 2) was re-expressed using
logarithms (power = 0) and negative reciprocals (power = –1). Refer to the JMP
output Re-expression of Hours of Relief.
a) [2] Describe the plot of residuals for the linear fit for log(Relief).
b) [1] Has the log re-expression achieved the goal of making the scatter more
even across all levels of the explanatory variable?
c) [2] Describe the plot of residuals for the linear fit for –1/Relief.
d) [1] Has the negative reciprocal re-expression achieved the goal of making
the scatter more even across all levels of the explanatory variable?
5
e) [2] Is the linear model for –1/Relief adequate? Explain briefly.
4.
[10 pts] An article in the Des Moines Register on January 14, 2007 reported on
the relationship between folate in the diet and Alzheimer’s disease. An earlier
study in 2005 using information gathered from the Baltimore Longitudinal Study
of Aging reported that diets high in folate might help reduce the risk of
Alzheimer's disease. Source: www.cbsnews.com
In that study the researchers analyzed the diets of 579 volunteers
(359 men, 220 women) 60 and older without Alzheimer's disease
and followed them for nine years. The researchers looked at what
percentage of participants’ diets contained antioxidant vitamins (E,
C, carotenoids) and B vitamins (folate, B-6, and B-12).
During the follow-up period 57 participants developed Alzheimer's
disease.
The researchers then compared the nutrient intake of those who
developed Alzheimer's disease with that of those who did not
develop the disease. They show that those with a higher dietary
intake of folate had an almost 60 percent lower rate of the disease.
a) [3] Why was this an observational study and not an experiment? Explain
briefly.
b) [3] Was this a prospective or a retrospective study? Explain briefly.
c) [2] What was the explanatory variable? Is it numerical or categorical?
d) [2] What was the response variable? Is it numerical or categorical?
6
5. [5 pts] In a game of chance you pay $5 to roll one fair ten-sided die (like the ones
used in lab). The number that you roll indicates how much money you win. Roll
a 9 you win $9, roll a 1 you win $1, etc. Use row 3 in the Table of Random Digits
below to simulate the outcomes of 20 games. How much are your simulated
winnings? Based on this simulation, do you think you can make money playing
this game?
I think I scored _________ out of 75 points on this exam.
Formulas
y
y=∑
r=
n
∑ zx z y
n −1
b1 = r
sy
sx
∑(y − y)
2
sy =
x
x=∑
n −1
n
x−x
y− y
zx =
zx =
sx
sy
b0 = y − b1 x
Table of Random Digits
Row
96299 07196 98642
1
71622 35940 81807
2
03272 41230 81739
3
46376 58596 14365
4
47352 42853 42903
5
20639
59225
74797
63685
97504
∑ (x − x )
2
sx =
n −1
ŷ = b0 + b1 x
residual = y − ŷ
23185
18192
70406
56555
56655
69929
80777
69273
72944
88606
56282
08710
18564
42974
70355
14125
84395
72532
96463
61406
38872
69563
78340
63533
38757
94168
86280
36699
24152
70657
7
Download