Stat 301 – Lecture 34 Sex of Turtles or female) of turtles?

advertisement
Stat 301 – Lecture 34
Sex of Turtles

What determines the sex (male
or female) of turtles?


Genetics?
Environment?
1
Sex of Turtles

Experiment:
Turtle eggs (all one species)
from Illinois.
 Several eggs in a box.
 Three boxes incubated at each
of five different temperatures.

2
Sex of Turtles
Temperature Female Male %Male
27.2
9
1
10%
27.7
3
7
70%
28.3
0
13
100%
28.4
3
7
70%
29.9
1
10
91%
3
Stat 301 – Lecture 34
Sex of Turtles
Temperature Female Male %Male
27.2
8
0
0%
27.7
2
4
67%
28.3
3
6
67%
28.4
3
5
63%
29.9
0
8
100%
4
Sex of Turtles
Temperature Female Male %Male
27.2
8
1
11%
27.7
2
6
75%
28.3
1
7
88%
28.4
2
7
78%
29.9
0
9
100%
5
Sex of Turtles

Proportion of males





Overall: 91/136 = 0.67
Temp < 27.5: 2/27 = 0.07
Temp < 28.0: 19/51 = 0.37
Temp < 28.5: 64/108 = 0.59
Temp < 30.0: 91/136 = 0.67
6
Stat 301 – Lecture 34
Sex of Turtles
Proportion of male turtles vs. incubation temperature
Proportion of Male Turtles
1.0
0.8
0.6
0.4
0.2
0.0
27.0
27.5
28.0
28.5
29.0
29.5
30.0
Incubation Temperature
7
Sex of Turtles
Is there some way to predict the
proportion of male turtles given
the incubation temperature?
 At what temperature will you get
a 50/50 split of males and
females?

8
Comment
One is interested in predicting a
chance, probability, proportion or
percentage.
 Unlike other prediction situations,
the response is bounded.

9
Stat 301 – Lecture 34
Logistic Regression
Logistic regression is a statistical
technique that can be used in
binary response problems.
 Logistic regression is different
from ordinary least squares
regression.

10
Sex of Turtles

Binary response



Yi = 1
Yi = 0
Male
Female
Probability


Prob(Yi = 1) = i
Prob(Yi = 0) = 1 - i
11
Model
The mean response is πi.
 There is a constraint on the
response.
0
1

12
Stat 301 – Lecture 34
Curvilinear Response

When the response variable is
binary, or a binomial
proportion, the shape of the
mean response is a curve.
13
Curvilinear Response
y
1.0
0.5
0.0
50
100
150
x
14
Curvilinear Response
y
1.0
0.5
0.0
50
100
150
x
15
Stat 301 – Lecture 34
Curvilinear Model

Logistic model
e( 0 1Xi )
i 
1 e( 0 1Xi )
16
Logistic Model

The logit transformation
1

Use the observed proportion,
to estimate
17
Combined Turtle Data
Temperature
Female
Male
Total
Proportion
Male,
27.2
25
2
27 0.0741 -2.5267
27.7
7
17
24 0.7083
0.8873
28.3
4
26
30 0.8667
1.8718
28.4
8
19
27 0.7037
0.8650
29.9
1
28
28 0.9643
3.2958
18
Stat 301 – Lecture 34
Maximum Likelihood
An alternative to the method of
least squares is the method of
maximum likelihood.
 The idea is to come up with
estimates of model parameters
that maximize the likelihood of
getting the data we have.

19
Maximum Likelihood
Choose 0 and 1 so as to
maximize the likelihood.
 Similar to least squares, we will
get two equations with two
unknowns (0 and 1).

20
Maximum Likelihood

Need computer software to
perform the analysis.

JMP.
21
Stat 301 – Lecture 34
Logistic Regression

Combined turtle data
1
61.3183
2.2110
22
Sex of Turtles
Temp
pred i, ˆi
pred
27.2
27.7
28.3
28.4
29.9
-1.1791
-0.0736
1.2530
1.4741
4.7906
0.235
0.482
0.778
0.814
0.992
23
Logistic Regression
Sex of Turtles
Fitted Curve Plot
1.0
Proportion Male
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
27
28
29
30
Temperature
24
Stat 301 – Lecture 34
Sex of Turtles

Temperature to give a 50:50
split

Logistic regression:
o
 27.7329
25
Interpretation

The coefficients in a logistic
regression are often difficult to
interpret because the effect of
increasing X by one unit varies
depending on where X is. This
is the essence of a nonlinear
model.
26
Interpretation

Consider first the interpretation
of the odds ratio,
If = 0.75, then the odds ratio
is 3 to 1. Males are three times
as likely as females.
27
Stat 301 – Lecture 34
Interpretation

In logistic regression we model
the log-odds. The predicted
log-odds is given by the linear
equation, in the turtle example;
1
61.3183
2.2110
28
Interpretation

The predicted odds for that
value of Xi is:
1
 So if we increase Xi by 1 unit,
we multiply the predicted odds
.
9.125
by:
29
Turtle Example

At 27 degrees the predicted
odds for a male turtle are 0.20,
about 1 to 5, that is it is 5 times
more likely to be a female than
a male.
30
Stat 301 – Lecture 34
Turtle Example

At 28 degrees the predicted
odds for a male are 9.125 times
bigger than at 27 degrees,
1.825. Now males are almost
twice as likely as females.
31
Turtle Example

At 29 degrees the predicted
odds for a male are 9.125 times
bigger than at 28 degrees,
16.65. Now males are over 16
times more likely than females.
32
Interpretation

The intercept can be
interpreted if the value of zero
for the explanatory variable
makes sense within the context
of the problem.
33
Stat 301 – Lecture 34
Turtle Example

Turtle eggs will not incubate at
a temperature of zero
(freezing), therefore the
intercept does not have an
interpretation for this context.
34
Inference for Logistic
Regression
Whole Model Test– analogous
to model F test in SLR.
 Parameter Estimates –
analogous to individual t-tests
in SLR.


Wald 2-test.
35
Inference for Logistic
Regression

Whole Model Test:
Chi Square = 49.566,
P-value < 0.0001
Because the P-value is so small,
temperature is a statistically
significant predictor of the
proportion male.
36
Stat 301 – Lecture 34
Inference for Logistic
Regression

Individual parameter estimates
Chi Square for Temperature is
26.33, P-value < 0.0001
Temperature is a statistically
significant predictor of the
proportion male.

37
JMP Data Table
Temperature
Sex
Count
27.2
2. Female
25
27.7
2. Female
7
28.3
2. Female
4
28.4
2. Female
8
29.9
2. Female
1
27.2
1. Male
2
27.7
1. Male
17
28.3
1. Male
26
28.4
1. Male
19
29.9
1. Male
27
38
JMP Analyze

Fit Y by X
Y, Response: Sex
 X, Factor: Temperature
 Freq: Count

39
Download