3 Interpretation of Logistic Model Estimates

advertisement
Building a Probability of Detection Model for Mount Graham Red
Squirrel
Valerie Cousineau
Kevin Doubleday
Ruoyu Huang
1
Executive Summary
Tim Jessen, a MS student from the School of Natural Resources and Environment, came to the
Statistics Consulting Lab for assistance with a research project involving the detection of a species of
squirrel using a non-invasive sampling technique. Initially, Tim was interested in estimating the
probability of detecting a squirrel on an array of hair tubes, and wanted to adjust for the distance on
the array at which a squirrel was detected. It was determined that construction of a model to predict
the odds of detecting this species of squirrel on an array without adjusting for distance was more
feasible. Average weekly temperature, average weekly humidity, and weekly precipitation were used
as the predictors with a repeated subject term for each array included. A logistic model was
constructed and the details of the model, including the interpretation of the predictor estimates, were
reported.
2
Detailed Summary
2.1
Background
The Mount Graham red squirrel (MGRS) is a territorial
animal that resides at a location called a midden. The
MGRS protects its midden and will generally seek food
and resources radially from this central location. Of
interest is detection of the MGRS near a midden.
Detection will be carried out using a series of hair tubes
filled with food and placed at 10, 20, 30, and 50 meters
away from a midden. Figure 1 shows the sampling
scheme with the midden as the dark circle in the middle of
the array of hair tubes. The hair tubes will be checked
three times, once per week. Presence of hair from MGRS
indicates detection at that particular midden. For this
study, 32 middens were selected based on known presence
of a MGRS.
2.2
Figure 1. Array of Hair Tubes Around a Midden
The Logistic Prediction Model
We would like to build a prediction model for the probability of detecting a MGRS on an array
(Figure 1) while adjusting for average temperature, average humidity, and amount of precipitation
occurring during the sampling time frame. We will model the probability that an MGRS is detected
on an individual array using the binary outcome of MGRS detected (Y = 1) versus MGRS not
detected (Y = 0). Note that a MGRS is detected if at least one hair tube on the array contains hair
from a MGRS during the sampling time frame. A logistic model for predicting the odds of detection
was fit with a repeated subject effect for each unique sampling site and average temperature, average
humidity, and total precipitation included as covariates. The repeated subject effect accounts for the
variability within each sampling site across the three sampling times. The final logistic model is
tabulated in Table 1.
3
Interpretation of Logistic Model Estimates
3.0
Explanation of Log Odds and Probability?
Odds are defined as the ratio of the probability an event occurs and the probability an event does not
occur. For instance, if the probability of some event A is 0.8, written as P(A) = 0.8, then the odds of
event A occurring are
Odds of A = P(A)/(1 – P(A)) = 0.8/0.2 = 4
Log odds of an event will be the natural logarithm of the odds of the event. Some texts will report
this as ln(odds). In the example above for the odds of event A being 4 we would obtain the following,
log(odds of A) = log(4) = 1.386
Probability of event A above can be derived if we know the odds of the event as P(event A) = (Odds
of A)/(1+ (Odds of A)). In the above example we could convert odds back to probability,
P(A) = 4/(1+4) = 0.8
which matches our probability from above.
3.1
What is Being Modeled?
The logistic model returns estimates for the intercept and each predictor. The independent variable in
a logistic model is the log odds of Y=1, where we are using the natural logarithm. In this case we are
modelling the log odds of detecting a MGRS at an array.
3.2
Interpretation of the Intercept
The intercept estimates the log odds of detection of a MGRS if all the other variables in the model
take a value of zero. In other words, if the average temperature is zero degrees, the average humidity
is zero, and there is no precipitation then the log odds of detection is 1.2677. We derive the odds of
detecting an MGRS as e1.2677 ≈ 3.55. The intercept in this case does not have a particularly useful
interpretation since the variables temperature, precipitation, and humidity are not likely to all take
values of zero.
3.3
Interpretation of the Predictor Estimates
The predictor estimates from a logistic model can be interpreted in two ways. First, the estimate can
be interpreted as the change in the log odds of detection for a one unit increase in the predictor when
all other predictors are held constant. The second interpretation is that the exponentiated estimate is
the ratio of the odds of detection for two days whose predictor values are one unit apart (eestimate ≈
Odds Ratio) if all other predictors are held constant. Consider the estimate for temperature (-0.0066).
The first interpretation of this estimate would say that for a one degree increase in temperature the log
Table 1. Parameter Estimates for Logistic Prediction Model of MGRS Detection.
Parameter
Model Estimate
log(OR)
Standard Error
95% C.I.
p-value
Intercept
1.2677
0.8715
(-0.44, 2.98)
0.15
Temperature
-0.0066
0.0379
(-0.08, 0.07)
0.86
Humidity
-0.0064
0.0174
(-0.04, 0.03)
0.71
Precipitation
-0.0109
0.0170
(-0.04, 0.02)
0.52
odds of detection decrease by 0.0066.
The second interpretation is a little more useful. The exponentiated estimate can be interpreted as the
odds ratio of detection for any two temperatures that are one degree apart. For instance if we are
interested in the change in detection odds for the temperatures 10 and 11 degrees we could write,
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝑤ℎ𝑒𝑛 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑖𝑠 11
≈ 𝑒 −0.0066 = 0.99.
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝑤ℎ𝑒𝑛 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑖𝑠 10
We can then see that for each one degree increase in temperature there is roughly a one percent
decrease in the odds of detecting a MGRS at an array. Since a one degree increase in temperature
may not be of interest the odds ratio for a larger increase in temperature can be calculated by
multiplying the estimate by the desired increase before exponentiation. For instance,
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝑤ℎ𝑒𝑛 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑖𝑠 20
≈ 𝑒 10∗(−0.0066) = 0.94.
𝑂𝑑𝑑𝑠 𝑜𝑓 𝑑𝑒𝑡𝑒𝑐𝑡𝑖𝑜𝑛 𝑤ℎ𝑒𝑛 𝑡𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑖𝑠 10
Hence, for a 10 degree increase in temperature the odds of detecting a MGRS decrease by about 6%.
Similar interpretations can be applied to the other predictors.
We can also estimate log odds, odds, and probability of a certain event. For instance, say we are
interested in estimations for a week with an average temperature of 10o, an average humidity of 30%,
and 0 inches of precipitation. We could find the log odds of detecting a MGRS as,
log(odds) ≈ 1.2677 – 0.0066*Temp – 0.0064*Humidity – 0.0109*Precip
= 1.2677 – 0.0066*10 – 0.0064*30 – 0.0109*0
= 1.0097
So, the log odds of detecting a MGRS given an average temperature of 10o, an average humidity of
30%, and no precipitation are 1.0097. This means the odds of detection under these conditions are
odds = e1.0097 = 2.74. Additionally, the probability of detecting a MGRS under these conditions is
P(detection of MGRS given conditions) = 2.74/(1 + 2.74) = 0.733.
Note also that a negative estimate for a predictor indicates that the odds of detection are predicted to
decrease for an increase in the predictor. Similarly, a positive estimate indicates that the odds of
detection are predicted to increase for an increase in the predictor.
Last, it is worth noting that the p-values for all predictors are very large, all greater than 0.50. While
p-values are not usually evaluated in prediction models, the fact that they are all so small could be
viewed as evidence that they are not adding significant predictive value to the model.
Download