Regression II

advertisement
Regression II
OK
OK
Non-normal
OK
Non-linear
Non-normal
OK
Non-linear
Non-normal
Unequal
variance
Non-linear regression
• There are nearly unlimited options here
• Keep it simple! Only use a particular
non-linear fit if the data strongly suggest
it
• I’ll discuss three types:
– Quadratic regression
– Smoothing
– Logistic regression
Non-linear regression
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Non-linear regression
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Complex; goes through
all the data points
Simpler; still provides
good fit to the data
Non-linear regression
• Three types of non-linear regression:
– Quadratic regression
– Smoothing
– Logistic regression
Quadratic regression
• Y = a + bX + cX2
• Fits a parabolic curve to predict Y from
X
• Often fitted using least-squares minimize MSresiduals
Quadratic regression
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Quadratic regression
c>0
c<0
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Quadratic regression
• Y = a + bX + cX2
• Three parameters to estimate from the
data: a, b, and c
• More complex model
• Requires more data to get a good fit
Smoothing
• Runs a line (without any formula)
through the data
• Can curve, or be straight – depends on
data
• Several types: kernel, spline, lowess
• Each has a smoothing parameter to
determine how much the line bends
Logistic Regression
• Used when Y is discrete – either 0 or 1
• Example: survival
• Predicts the odds of success for Y
against X
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
LD50
Quick Reference Summary:
Confidence Interval for
Regression Slope
• What is it for? Estimating the slope of the linear
equation Y =  + X between an explanatory variable
X and a response variable Y
• What does it assume? Relationship between X and
Y is linear; each Y at a given X is a random sample
from a normal distribution with equal variance
• Parameter: 
• Estimate: b
• Degrees of freedom: n-2
• Formulae:
b  t (2),df SE b    b  t (2),df SE b
SE b 


MSresidual 
MS residual
 X
 X
2
i
2
(Y

Y
)
 b (X i  X )(Yi  Y )
 i
n 2
Quick Reference Summary:
t-test for Regression Slope
• What is it for? To test the null hypothesis that
the population parameter  equals a null
hypothesized value, usually 0
• What does it assume? Same as regression
slope C.I.
• Test statistic: t
• Null distribution: t with n-2 d.f.
• Formula:
b
t
SE b
T-test for Regression Slope
Null hypothesis
=0
Sample
Test statistic
b
t
SE b

compare
Null distribution
t with n-2 df
How unusual is this test statistic?
P < 0.05
Reject Ho
P > 0.05
Fail to reject Ho
Class Activity
• Are taller people smarter, or dumber,
than short people in this class?
• Trivia quiz, followed by group
calculation
Trivia quiz
• Get out blank piece of paper
• Number from 1-10
• Answer each multiple choice question
Question 1
• Which of the following has the longest
recorded life span?
A. Termite
B. Indian elephant
C. Freshwater oyster
D. Chimpanzee
Question 2
• What was the first genetically
engineered organism?
A. Corn
B. Mouse
C. Sheep
D. Tobacco
Question 3
• What animal has the highest blood
pressure?
A. Giraffe
B. Blue whale
C. Elephant
D. Flea
Question 4
• What happens to the critical value of a Chisquared distribution (with constant ) as you
increase the degrees of freedom?
A. Increases
B. Decreases
C. Stays the same
D. None of the above
Question 5
• In the TV show The Simpsons, what is
the name of Springfield Elementary`s
Lunchlady?
A. Lurleen
B. Mary
C. Ashley
D. Doris
Question 6
•
Which of the following means: “the quality by which a
person claims to know something intuitively, instinctively,
or from the gut without regard to evidence, logic,
intellectual examination, or actual facts”
A. Factuality
B. Statistics
C. Truthiness
D. Hypothesis
Question 7
• Who invented the ANOVA?
A. Dr. Harmon
B. Karl Pearson
C. R. A. Fisher
D. Kareem Abdul-Jabar
Question 8
• An experiment that investigates all
treatment combinations of two or more
variables is called a(n):
A. Randomized block design
B. Kruskal-Wallace design
C. Factorial design
D. Interaction
Question 9
•
After class one day, Shelly comes home and decides to make chocolate chip
cookies. The bag she uses contains 200 chocolate chips, and she ends up
making 20 cookies, which gives an average of 10 chips per cookie. She wants
that first one she (randomly) chooses to be the perfect cookie--what is the
likelihood that that first cookie will have at least 13 chocolate chips?
A. About 5%
B. About 30%
C. About 10%
D. About 20%
Question 10
• Which of the following is NOT an
assumption of linear regression?
• A. Relationship between X and Y is linear
• B. Each Y at a given X is a random sample
• C. Equal variance at each Y
• D. X is drawn from a normal
distribution
Now, use your data
• Test the following null hypothesis:
• Ho: The slope of the relationship
between height (X) and score on the
trivia quiz (Y) is zero (=0)
n
b
X
i
i 1
 X Yi  Y 
n
X
i 1
 X
2
i

SE b 
MSresidual 

b
t
SE b
MS residual
 X
 X
2
i
2
(Y

Y
)
 b (X i  X )(Yi  Y )
 i
n 2
Download