Maximum Entropy applications for modeling spatial data

advertisement
Maximum Entropy
spatial modeling with imperfect
data
Spatial modeling with imperfect data
• Example: species distribution models
• What’s wrong with available data and why we
want to use it anyway
• Comparison of different approaches
• Maximum Entropy approach: Maxent
• Model evaluation: AUC
Building models with imperfect data
• Example: constructing (or reconstructing) species
distributions
• Paleoecology, conservation, speciation, invasion
• Often data is presence-only (Elith et al. 2006):
–
–
–
–
•
•
•
•
Museum records
Herbaria
Fossil locals
Reported sightings
Sparse data
Spatial bias
Temporal bias
Uncertainty in absence records
Building models with imperfect data
Elith et al. 2006:
• Evaluated methods for modeling species
distributions using presence-only data
• Compared 16 methods for 226 species in 6
geographic regions
• Models were built using presence-only data
and climate and environmental layers
• Evaluated against independent presenceabsence datasets
Building models with imperfect data
Elith et al. 2006:
• Models that use only occurrences:
– Envelope
• BIOCLIM
– Distance based
• DOMAIN, LIVES
• Models that characterize background (psuedo-absence):
– GLM, GAM, MARS, GARP, MAXENT, BRT, GDM, MARSCOMM
• Some models implemented in several ways
Building models with imperfect data
Elith et al. 2006:
Building models with imperfect data
Elith et al. 2006:
Building models with imperfect data
Elith et al. 2006:
Maxent:
A maximum entropy approach
• Occurrence is a Lat-Long pair denoting location of
observation\collection
• Layers that inform the model are from same
geographic area in raster format
• Model represents approximation of the realized niche
for species
• Assumed that the realized niche and the fundamental
niche for the species coincide
• Increasing sampling in larger geographic area (and thus
including more variation in environmental conditions
encountered by the species) may increase the fraction
of fundamental niche represented by occurrences
Fundamental Niche
Fundamental Niche
Realized Niche
Fundamental Niche
Maxent methods
• The approximation of an unknown probability distribution should satisfy
any known constraints, and subject to those constraints should have
maximum entropy (Jaynes, 1957)
• Maximum Entropy is an epistemic approach to Bayes’ rule
• The Monkey Example:
– A team of monkeys are employed to create images by throwing balls at a grid
of bins
– Every so often the grid is removed and replaced by a new one
– Eventually the monkeys will create multiple copies of each possible
arrangement
Maxent methods
• The Monkey Example (cont.):
– Given some evidence about true grid some of the
monkey’s grids can be ruled out
– Those left constitute the feasible set, and that which
appears most often is a reasonable choice
– Assuming the monkeys are not biased, this choice is
consistent with the data but noncommittal about
information we do not have
Maxent methods
• π is the unknown probability distribution over a
finite set X (the set of pixels or points in the study
area)
• The distribution defines a non-negative
probability π(x) to each point x
• These probabilities sum to 1
• Best approximation of π is the probability
distribution π(hat)
• The entropy of π(hat) is:
Maxent methods
• Constraints on π(hat) for layers informing model:
– Linear features- continuous variables should be close to
their observed values (their mean at occurrence localities)
– Quadratic features- variance of continuous variables
should be close to observed values
– Product features- covariance of two continuous variables
should be close to observed values
– Threshold feature- proportion of model that has values
above a threshold for a continuous variable should be
close to observed proportion
– Binary feature- the proportion of each category in a
categorical feature should be close to the observed
proportions
Maxent methods
• Regularization parameter Bj governs how close the
constraints need to match the observed value (without
regularization they must be equal)
• Program allows a user-specified proportion of
occurrence locals to be reserved from model training
for model testing
• absences can be randomly selected (pseudo-absences
for presence only) or specified by user (if P-A data
available)
• Model will run for either a set number of iterations or
until the gain from each iteration falls below a set
threshold
Maxent example:
brown-throated three-toed sloth, Bradypus variegatus
Maxent example:
brown-throated three-toed sloth, Bradypus variegatus
Log contribution of each variable to the raw prediction value
Maxent example:
brown-throated three-toed sloth, Bradypus variegatus
Other Maxent Applications
(Siva 1990)
Model Evaluation
Area under ROC curve (AUC)
• Receiver Operating Characteristic
• Contingency Table:
Actual Value (Data)
Predicted
Outcome
(Model)
Presence
(pos)
Absence
(neg)
Presence
(pos)
True Positive
(TP)
False Positive
(FP)
Absence
(neg)
False Negative
(FN)
True Negative
(TN)
Model Evaluation
Area under ROC curve (AUC)
• Sensitivity- True Positive Rate (TPR)
Actual Value (Data)
Predicted
Outcome
(Model)
Presence
(pos)
Absence
(neg)
Presence
(pos)
True Positive
(TP)
False Positive
(FP)
Absence
(neg)
False Negative
(FN)
True Negative
(TN)
Model Evaluation
Area under ROC curve (AUC)
• Specificity- True Negative Rate (TNR)
Actual Value (Data)
Predicted
Outcome
(Model)
Presence
(pos)
Absence
(neg)
Presence
(pos)
True Positive
(TP)
False Positive
(FP)
Absence
(neg)
False Negative
(FN)
True Negative
(TN)
Model Evaluation
Area under ROC curve (AUC)
• Specificity- True Negative Rate (TNR)
• ROC is Sensitivity by (1- Specificity)=(FPR)
Actual Value (Data)
Predicted
Outcome
(Model)
Presence
(pos)
Absence
(neg)
Presence
(pos)
True Positive
(TP)
False Positive
(FP)
Absence
(neg)
False Negative
(FN)
True Negative
(TN)
Model Evaluation
Area under ROC curve (AUC)
• An example:
Actual Value (Data)
Predicted
Outcome
(Model)
P=100
N=100
P=91
TP=63
FP=28
N=109
FN=37
TN=72
• TPR = 63/100 = .63
• FPR = 28/100 = .28
A
P=100
N=100
P=91
TP=63
FP=28
N=109
FN=37
TN=72
P=100
N=100
P=154
TP=77
FP=77
N=46
FN=23
TN=23
P=100
N=100
P=112
TP=24
FP=88
N=88
FN=76
TN=12
B
C
Image from wikipedia
A
P=100
N=100
P=91
TP=63
FP=28
N=109
FN=37
TN=72
P=100
N=100
P=154
TP=77
FP=77
N=46
FN=23
TN=23
P=100
N=100
P=112
TP=24
FP=88
N=88
FN=76
TN=12
B
C
C’
P=100
N=100
P=112
TP=88
FP=24
N=88
FN=12
TN=76
Image from wikipedia
TN
TP
FP
FN
TN
TN
FP
FN
FN
TP
TP
TP
TP
TNTN
FP
FP
FN
Model Evaluation
Area under ROC curve (AUC)
1
False Positive
Rate
True Positive Rate
1
1 False Positive Rate
True Positive Rate
1
1
0
0
• 0
• 0
True Positive Rate
•
•
•
•
•
•
•
•
•
0
0
False Positive Rate
AUC > 0.5 Higher Predictive Power
AUC = 0.5 Random Chance
AUC < 0.5 Worse than Random
1
References:
Elith, J., Graham, C. H., Anderson, R. P., Dudı´k, M., Ferrier, S., Guisan, A., Hijmans, R. J.,Huettmann, F.,
Leathwick, J. R., Lehmann, A., Li, J., Lohmann, L. G., Loiselle, B. A., Manion, G.,Moritz, C., Nakamura, M.,
Nakazawa, Y., Overton, J. McC., Peterson, A. T., Phillips, S. J.,Richardson, K. S., Scachetti-Pereira, R.,
Schapire, R. E., Sobero´n, J., Williams, S., Wisz, M. S. and Zimmermann, N. E. 2006. Novel methods improve
prediction of species’ distributions from occurrence data. Ecography 29: 129 -151
Jaynes, E.T. , 1957. Information theory and statistical mechanics. Phys. Rev. 106, 620 - 630
Lobo, J. M., Jiménez-Valverde, A. and Real R. 2007. AUC a misleading measure of the performance of predictive
distribution models. Global Ecology and Biogeography
Phillips, S. J., Dudik, M. and Schapire, R. E. 2004. A maximum entropy approach to species distribution
modeling. Proceedings of the 21st International Conference on Machine Learning, Banff, Canada
Phillips, S. J., Anderson, R. P. and Schapire, R. E. 2006. Maximum entropy modeling of species geographic
distributions. Ecological Modeling 190: 231-259
Siva, D. S., 1990. Bayesian Inductive Inference Maximum Entropy & Neutron Scattering. Los Alamos Science,
Summer: 180 – 206
Maxent program website (its free): http://www.cs.princeton.edu/~schapire/maxent/
Download