Stat 301 – Lecture 34 Sex of Turtles What determines the sex (male or female) of turtles? Genetics? Environment? 1 Sex of Turtles Experiment: Turtle eggs (all one species) from Illinois. Several eggs in a box. Three boxes incubated at each of five different temperatures. 2 Sex of Turtles Temperature Female Male %Male 27.2 9 1 10% 27.7 3 7 70% 28.3 0 13 100% 28.4 3 7 70% 29.9 1 10 91% 3 Stat 301 – Lecture 34 Sex of Turtles Temperature Female Male %Male 27.2 8 0 0% 27.7 2 4 67% 28.3 3 6 67% 28.4 3 5 63% 29.9 0 8 100% 4 Sex of Turtles Temperature Female Male %Male 27.2 8 1 11% 27.7 2 6 75% 28.3 1 7 88% 28.4 2 7 78% 29.9 0 9 100% 5 Sex of Turtles Proportion of males Overall: 91/136 = 0.67 Temp < 27.5: 2/27 = 0.07 Temp < 28.0: 19/51 = 0.37 Temp < 28.5: 64/108 = 0.59 Temp < 30.0: 91/136 = 0.67 6 Stat 301 – Lecture 34 Sex of Turtles Proportion of male turtles vs. incubation temperature Proportion of Male Turtles 1.0 0.8 0.6 0.4 0.2 0.0 27.0 27.5 28.0 28.5 29.0 29.5 30.0 Incubation Temperature 7 Sex of Turtles Is there some way to predict the proportion of male turtles given the incubation temperature? At what temperature will you get a 50/50 split of males and females? 8 Comment One is interested in predicting a chance, probability, proportion or percentage. Unlike other prediction situations, the response is bounded. 9 Stat 301 – Lecture 34 Logistic Regression Logistic regression is a statistical technique that can be used in binary response problems. Logistic regression is different from ordinary least squares regression. 10 Sex of Turtles Binary response Yi = 1 Yi = 0 Male Female Probability Prob(Yi = 1) = i Prob(Yi = 0) = 1 - i 11 Model The mean response is πi. There is a constraint on the response. 0 1 12 Stat 301 – Lecture 34 Curvilinear Response When the response variable is binary, or a binomial proportion, the shape of the mean response is a curve. 13 Curvilinear Response y 1.0 0.5 0.0 50 100 150 x 14 Curvilinear Response y 1.0 0.5 0.0 50 100 150 x 15 Stat 301 – Lecture 34 Curvilinear Model Logistic model e( 0 1Xi ) i 1 e( 0 1Xi ) 16 Logistic Model The logit transformation 1 Use the observed proportion, to estimate 17 Combined Turtle Data Temperature Female Male Total Proportion Male, 27.2 25 2 27 0.0741 -2.5267 27.7 7 17 24 0.7083 0.8873 28.3 4 26 30 0.8667 1.8718 28.4 8 19 27 0.7037 0.8650 29.9 1 28 28 0.9643 3.2958 18 Stat 301 – Lecture 34 Maximum Likelihood An alternative to the method of least squares is the method of maximum likelihood. The idea is to come up with estimates of model parameters that maximize the likelihood of getting the data we have. 19 Maximum Likelihood Choose 0 and 1 so as to maximize the likelihood. Similar to least squares, we will get two equations with two unknowns (0 and 1). 20 Maximum Likelihood Need computer software to perform the analysis. JMP. 21 Stat 301 – Lecture 34 Logistic Regression Combined turtle data 1 61.3183 2.2110 22 Sex of Turtles Temp pred i, ˆi pred 27.2 27.7 28.3 28.4 29.9 -1.1791 -0.0736 1.2530 1.4741 4.7906 0.235 0.482 0.778 0.814 0.992 23 Logistic Regression Sex of Turtles Fitted Curve Plot 1.0 Proportion Male 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 27 28 29 30 Temperature 24 Stat 301 – Lecture 34 Sex of Turtles Temperature to give a 50:50 split Logistic regression: o 27.7329 25 Interpretation The coefficients in a logistic regression are often difficult to interpret because the effect of increasing X by one unit varies depending on where X is. This is the essence of a nonlinear model. 26 Interpretation Consider first the interpretation of the odds ratio, If = 0.75, then the odds ratio is 3 to 1. Males are three times as likely as females. 27 Stat 301 – Lecture 34 Interpretation In logistic regression we model the log-odds. The predicted log-odds is given by the linear equation, in the turtle example; 1 61.3183 2.2110 28 Interpretation The predicted odds for that value of Xi is: 1 So if we increase Xi by 1 unit, we multiply the predicted odds . 9.125 by: 29 Turtle Example At 27 degrees the predicted odds for a male turtle are 0.20, about 1 to 5, that is it is 5 times more likely to be a female than a male. 30 Stat 301 – Lecture 34 Turtle Example At 28 degrees the predicted odds for a male are 9.125 times bigger than at 27 degrees, 1.825. Now males are almost twice as likely as females. 31 Turtle Example At 29 degrees the predicted odds for a male are 9.125 times bigger than at 28 degrees, 16.65. Now males are over 16 times more likely than females. 32 Interpretation The intercept can be interpreted if the value of zero for the explanatory variable makes sense within the context of the problem. 33 Stat 301 – Lecture 34 Turtle Example Turtle eggs will not incubate at a temperature of zero (freezing), therefore the intercept does not have an interpretation for this context. 34 Inference for Logistic Regression Whole Model Test– analogous to model F test in SLR. Parameter Estimates – analogous to individual t-tests in SLR. Wald 2-test. 35 Inference for Logistic Regression Whole Model Test: Chi Square = 49.566, P-value < 0.0001 Because the P-value is so small, temperature is a statistically significant predictor of the proportion male. 36 Stat 301 – Lecture 34 Inference for Logistic Regression Individual parameter estimates Chi Square for Temperature is 26.33, P-value < 0.0001 Temperature is a statistically significant predictor of the proportion male. 37 JMP Data Table Temperature Sex Count 27.2 2. Female 25 27.7 2. Female 7 28.3 2. Female 4 28.4 2. Female 8 29.9 2. Female 1 27.2 1. Male 2 27.7 1. Male 17 28.3 1. Male 26 28.4 1. Male 19 29.9 1. Male 27 38 JMP Analyze Fit Y by X Y, Response: Sex X, Factor: Temperature Freq: Count 39