Chapter 4 More About Relationships Between Two Variables

advertisement
Chapter 4
More About Relationships
Between Two Variables
4.1 Transforming to Achieve Linearity
4.2 Relationship Between Categorical Variables
4.3 Establishing Causation
How do you determine if data is linear?
• Look at the graph (is it straight?)
• Look at the residual plot (is it scattered?)
• Look at the correlation coefficient, r (is it close
to -1 or 1?)
If the answer to any of these questions is no,
then a line is probably not a good fit and a
curved function may be more appropriate.
Curved Functions Tested in AP Stats
Exponential Regression 𝒚 = 𝒂𝒃𝒙
• (x, log y) is linear
• Linear Regression on (x, log y) is 𝑙𝑜𝑔 𝑦 = 𝑎𝑥 + 𝑏
• Algebraically solve for 𝑦
Power Regression 𝒚 = 𝒂𝒙𝒃
• (log x, log y) is linear
• Linear Regression on (log x, log y) is 𝑙𝑜𝑔 𝑦 = 𝑎𝑙𝑜𝑔 𝑥 + 𝑏
• Algebraically solve for 𝑦
What if the data is not linear?
• Transform the data to determine
whether Exponential or Power
Regression is appropriate.
• Run a Linear Regression on the
transformed data.
• Perform an inverse transformation
to turn the equation into
Exponential or Power.
Transform the Data to Determine if Exponential or
Power Regression is Appropriate
1. Enter data into L1 and L2.
2. See that it is not linear. (scatterplot is curved, residual
is curved)
3. Transform the data into logarithms.
– Enter L3 = log x and L4 = log y.
4. Look at (x, log y) and (log x, log y) for linearity.
– If (x, log y) is linear, use exponential regression.
– If (log x, log y) is linear, use power regression.
Run a Linear Regression on
the Transformed Data
Exponential
Power
1. Run Linear Regression on
(x, log y) 4:LinReg L1,L4
2. Write as
𝑙𝑜𝑔 𝑦 = 𝑎𝑥 + 𝑏
3. Define your variables as
they were originally
(x = ?, 𝑦 = predicted ?)
1. Run Linear Regression on
(log x, log y) 4:LinReg L3,L4
2. Write as
𝑙𝑜𝑔 𝑦 = 𝑎 𝑙𝑜𝑔 𝑥 + 𝑏
3. Define your variables as
they were originally
(x = ?, 𝑦 = predicted ?)
Perform an Inverse Transformation to Turn the
Equation into Exponential or Power
Exponential Example
Power Example
𝑙𝑜𝑔𝑦 = 𝑎𝑥 + 𝑏 → 𝑦 = 𝑎𝑏 𝑥
𝑙𝑜𝑔𝑦 = 𝑎𝑙𝑜𝑔𝑥 + 𝑏 → 𝑦 = 𝑎𝑥 𝑏
𝑙𝑜𝑔 𝑦 = 5𝑥 + 2
𝑙𝑜𝑔 𝑦 = 5𝑙𝑜𝑔𝑥 + 2
10log 𝑦 = 105𝑥+2
𝑦 = 105𝑥 102
𝑦 = 105 𝑥 102
𝑦 = 100,000 𝑥 100
𝑙𝑜𝑔 𝑦 = 𝑙𝑜𝑔 𝑥 5 + 2
5
10log 𝑦 = 10𝑙𝑜𝑔𝑥 +2
𝑦 = 𝑥 5 102
𝑦 = 𝑥 5 100
𝑦 = 100 100,000
𝑥
𝑦 = 100𝑥 5
Non-Linear Regression in the Calculator
1. Enter data into L1 and L2.
2. See that it is not linear. (scatterplot is curved, residual
is curved)
3. Run 0:ExpReg L1, L2, Y1
4. Run A:PwrReg L1, L2, Y1
5. See which fits the data better
– Look at the graph and see which curve follows the data better.
– Look at r and r2 to see which line (x, log y) or (log x, log y) fits the data
better.
6. Write out the equation from the calculator (NO LOGS) and
define x = ? and 𝑦 = predicted ?
By Hand vs. The Calculator
x = L1, y = L2, log x = L3, log y = L4
LinReg L1, L4 = ExpReg L1, L2
𝑙𝑜𝑔𝑦 = 𝑎𝑥 + 𝑏
𝑦 = 𝑎𝑏
𝑥
LinReg L3, L4 = PwrReg L1, L2
𝑙𝑜𝑔𝑦 = 𝑎𝑙𝑜𝑔𝑥 + 𝑏
𝑦 = 𝑎𝑥
𝑏
Categorical Data in Two Way Tables
Marginal Distribution: the distribution of only
one of the variables.
Chocolate
Vanilla
Strawberry
Freshmen
10
12
16
Sophomores
11
19
22
Juniors
25
7
13
Seniors
10
22
2
Find the marginal distribution of ice cream flavors.
Categorical Data in Two Way Tables
Conditional Distribution: the distribution of
one variable given a specific condition of the
other variable.
Chocolate
Vanilla
Strawberry
Freshmen
10
12
16
Sophomores
11
19
22
Juniors
25
7
13
Seniors
10
22
2
Find the conditional distribution of grade level
among those who prefer chocolate.
1.
2.
3.
4.
5.
Chocolate
Vanilla
Strawberry
Freshmen
10
12
16
Sophomores
11
19
22
Juniors
25
7
13
Seniors
10
22
2
What percent of students like strawberry?
What percent of seniors like vanilla?
What percent of chocolate lovers are juniors?
What percent of students are freshmen?
What percent of students are vanilla loving
seniors?
6. What percent of upper classmen like chocolate?
Simpson’s Paradox
Suppose two people, Lisa and Bart, are editors for the St. Louis Post
Dispatch. Answer the following questions given the data below:
Lisa
Bart
Week 1 Week 2
Total
60 / 100 1 / 10 61 / 110
9 / 10 30 / 100 39 / 110
What percentage of articles did Lisa edit in Week 1? _________ Bart? _________
Who edited a higher percentage of articles in Week 1? ______________________
What percentage of articles did Lisa edit in Week 2? _________ Bart? _________
Who edited a higher percentage of articles in Week 2? ______________________
What percentage of articles did Lisa edit Total? _________ Bart? _________
Who edited a higher percentage of articles in Total? ______________________
HOW CAN THIS BE??
In the first week, Lisa improves 60 percent of the articles
she edits while Bart improves 90 percent of the articles he
edits. In the second week, Lisa improves just 10 percent of
the articles she edits, while Bart improves 30 percent.
Both times, Bart improved a much higher percentage of
articles than Lisa—yet when the two tests are combined,
Lisa has improved a much higher percentage than Bart!
Lisa
Bart
Week 1
60.0%
90.0%
Week 2
10.0%
30.0%
Total
55.5%
35.5%
Establishing Causation
Download