Chapter ___ Review: Type the Subject of the Chapter

advertisement
Chapter 4 Review:
More About Relationship
Between Two Variables
Group Members:
Qianya Meng
Nikta Kheiri
Min Kim
1st period
12/14/11
The Big Idea
• Transform the graph to achieve linearity
• Transform exponential graphs: ๐‘ฆ = ๐‘Ž๐‘ ๐‘ฅ to
achieve linearity and come up with a transformed
equation for the use of extrapolation.
• Transform power functions ๐‘ฆ = ๐‘Ž๐‘ฅ ๐‘ to achieve
linearity and come up with a transformed
equation for the use of extrapolation.
• Learn to use marginal distribution and
conditional
• Recognize relationships between two variables.
Vocabulary You Need to Know
• Transforming or re-expressing the data is applying a
function such as the logarithm or square root to a
quantitative variable
• Log Rules:
•
1) logb(mn) = logb(m) + logb(n)
•
2) logb(m/n) = logb(m) – logb(n)
•
3) logb(mn) = n · logb(m)
Vocabulary
• Linear growth increases by a fixed amount in
each equal time period.
• Exponential growth model
• Log y = log a + (log b)x
• Predicted y = ab^x
• Power law model
• Log y = log a + p log x
• Predicted y = ax^p
Vocabulary
• Two-way table describes two categorical variables
• Marginal distributions are the total in each column
and row variable
• Conditional distributions of column variable, given
row variable
• Conditional distributions of row variable, given
column variable
• Simpson’s paradox is a reversal that an association or
comparison that holds for all of several groups can
reverse direction when the data are combined to form
a single group
Vocabulary
• Causation: Changes in x cause changes in y
• Common response: Changes in both x and y
are caused by changes in a lurking variable z
• Confounding: The effect (if any) of x on y is
confounded with the effect of a lurking
variable z
Key Topics Covered in this Chapter
• Modeling nonlinear data
• Relations in categorical data
• Establishing causation
Formulas You Should Know
•
•
•
•
•
•
Exponential growth model
Log y = log a + (log b)x
Predicted y = ab^x
Power law model
Log y = log a + p log x
Predicted y = ax^p
Calculator Key Strokes
• Exponential growth modeling
• Enter the explanatory data into L1 and response data into
L2
• Draw the scatterplot y versus x
• Define L3 as the (natural) logarithm of L2 then make a
scatterplot of (ln) log versus L1
• Perform the least-squares regression on the transformed
data
• Draw the scatterplot
• Plot the residuals versus L1
• With the regression equation in Y1, define Y2 = e^(Y1) or Y2
= log^(Y1).
Calculator Key Strokes
•
•
•
•
•
•
•
•
•
•
Power law modeling
Enter the explanatory data into L1 and response data into L2
Draw the scatterplot y versus x
Define L3 as the (natural) logarithm of L1 and define L4 as the
(natural) logarithm of L2
Plot L4 versus L3
Calculate the regression equation for the transformed data and
store it in Y1
Construct a residual plot
Define Y2 as (10^a)(x^b) or (e^a)(x^b)
Plot Y2 and the scatterplot for the original data together
To make a prediction for the value x = k, evaluate Y2(k) on the home
screen
Helpful Hints
• When the explanatory variable is years, transform
the data to “years since” so that the values are
smaller and don’t create overflow problems
when you perform the inverse transformation
• If there is a clear explanatory/response
relationship, compare the conditional
distributions of the response variable for the
separate values of the explanatory variable
• Even when direct causation is present, it is rarely
a complete explanation of an association
between two variables
Q1
Depths
(m)
Light
intensity
5
168.00
6
120.42
7
86.31
8
61.87
Some college students collected data on the intensity of light at various depths in a lake.
9
Here are their data:
10
a) Make a scatterplot suitable for predicting light intensity from depth. Describe the form 11
of the relationship.
b) To verify that the decrease in light intensity follows an exponential model, calculate the
ratio of light intensity at consecutive depths. Start with 120.42/168.00=.0717. what do
you conclude?
c) Take the natural logarithm(ln) of the light intensity measurements and plot these values
against the corresponding depth. Does this transformation achieve linearity?
d) Calculate the least-square regression equation for the transformed data. Interpret the
slope and y intercept of this equation in this setting.
e) Construct and interpret a residual plot.
f) Perform the inverse transformation to express light intensity as an exponential function
of depth in the lake. Display scatter plot of the original data with the exponential model
superimposed. Is your exponential function a satisfactory model for the data?
g) Use your model to predict the light intensity at a depth of 22 meters. The actual light
intensity reading at the depth was .58 lumens. Does this surprise you?
44.34
31.78
22.78
Answer Q1
•
•
•
•
A) the relationship is strong, negative, and curved.
B) the ratios are all 0.717, so an exponential model is appropriate.
C) it achieves linearity.
D) if x= depth and y=ln(light intensity), then ๐‘ฆ=6.7891-0.3330x.
T5hye i8ntercept, 6.7891, provides an estimate for the average
value of the natural log of the light intensity decreases on average
by 0.3330 for each one meter increase in depth.
• E) the residual plot shows a fairly random scatter and relatively
small residuals, so the linear model is appropriate.
• F) if x=depth and y=light intensity, y=(e^6.789)(e^-.333x). It is a
satisfactory model.
• G) at 22m, the predicted light intensity would be .584 lumens. No,
not surprised.
Q2
Some high school physics students dropped a ball and
measured its height at various points along its descent.
Table 4.3 shows the time since release and the distance
the ball had fallen
a) Make a scatterplot suitable for predicting distance
fallen from time since release. describe the direction,
form, and strength of the relationship.
b) Perform an appropriate transformation to achieve
linearity . Then find a least-square regression model for
the transformed data.
c) Comment on the quality of your model in (b) by
referring to a residual plot and ๐‘Ÿ 2 .
time
distance
.16
12.1
.24
29.8
.25
32.7
.3
42.8
.3
44.2
.32
55.8
.36
63.5
.36
65.1
.5
124.6
.5
129.7
.57
150.2
d) Make a scatter plot of the point (time, ๐‘‘๐‘–๐‘ ๐‘ก๐‘Ž๐‘›๐‘๐‘’) to
see if this transformation works. Then find a leastsquare regression model for the transformed data.
e) Comment on the quality of your model in (d) by
referring to a residual plot and ๐‘Ÿ 2
f) Use the two models you obtained in (b) and (d) to
predict the distance that the object had fallen after 0.47
seconds. Which prediction do you think is closer to the
actual value? Why?
.61
182.2
.61
1189.4
.68
220.4
.72
254.0
.72
261.0
.83
334.6
.88
375.5
.89
399.1
Answer Q2
• (a) relationship is curved, strong, and positive.
• (b) if x = time and y = distance, predicted y = 0.99 +
490.416x^2
• (c) r^2 = 0.9984 and the residual plot shows random
scatter and fairly small-sized residuals, so this looks like
an appropriate model
• (d) yes. Square-root of the predicted y = 0.1046 +
22.0428x
• (e) r^2 = 0.9986 and the residual plot show no pattern,
which suggest a good model
• (f) using model from (b): 109.32 cm. using model from
(d): 109.51cm
Q3
Here are data from eight schools on smoking among students and among
their parents.
a) How many students are described in the two-way table ?
b)What percent of these students smoke?
c) Give the marginal distribution of parents’ smoking behavior, both in counts
and in percents.
d)Calculate three conditional distributions of students’ smoking behavior:
one for each of the three parental smoking categories. Describe the
relationship between the smoking behaviors of students and their parents
in a few sentences.
Neither parent smoke
One parent smoke
Both parents smoke
Students does not
smoke
1168
1823
1380
Student smoke
188
416
400
Answer Q3
• A) 5375 students
• B) 18.7%
• C) both parents smoke: 1780, 33.1%. One parent smokes: 2239, 41.7%.
Neither parents smoke: 1356, 25.2%.
• D) student smokes, given both parents smoke: 400/(400+1380)=.2247.
student doesn’t smoke, given both parents smoke:
1380/(400+1380)=.7753. student smoke, given one parent smokes:
416/(416+1823)=.1858. student doesn’t smoke, given one parent smokes:
1823/(416+1823)=.8142. student smokes, given neither parent smokes :
188?(188+1168)=.1386. student doesn’t smoke, given that neither parent
smokes: 1168/(188+1168)=.8614. students who smoke are most likely to
come from families where one or more of their parents smoke.
Q4
Whether a convicted murder gets the death penalty seems to be influenced
by the race of the victim. Here are data on 326 cases in which the defendants
was convicted of murder
a) Use these data to make a two-way table of defendant’s race vs. death
penalty
b)Show that Simpson’s paradox holds: a higher percent of white defendants
are sentenced to death overall, but for the black and white victims a higher
percent of black defendants are sentenced to death.
c) Use the data to explain why the paradox hold in language that a judge
could understand
White defendant
Black defendant
White victim
Black victim
White victim
Black victim
Death
19
0
11
6
Not
132
9
52
97
Answer Q4
• A) white defendant: 19 yes, 141 no. Black defendant:
17 yes, 149 no.
• B) overall death penalty: 11.9% of white defendants,
10.2% of Black defendants. For white victims, 12.6%
and 17.5%; for black victims, 0% and 5.8%.
• C) the death penalty is more likely when the victim
was white(14%) rather than lack (5.4%). Because
most convicted killers are of the same race as their
victims, whites are more often sentenced to death.
Q5
A study showed that woman who work in the production of
computer chips have abnormally high numbers of miscarriages.
The union claimed that exposure to chemical used in production
causes the miscarriage. Another possible explanation is that
these workers spend most of their time standing up. Can we
conclude that exposure to chemicals causes more miscarriages?
Why or why not?
Answer Q5
• No. The “number of hours standing up at
work” is a confounding variable.
Q6
A study finds that high school students who take the SAT, enroll
in an SAT coaching courses, and then take the SAT a second time
raise their SAT mathematics scores from a mean of 521 to a
mean of 561. what factors other taking the course might explain
this improvement?
Answer Q6
• The variable “knowledge gained as a result of
taking the SAT previously is a confounding
variable.
Download