Document

advertisement

Some dates

check out “outline version 3.0.pdf”

• Return reviews to reviewees (use track changes and

“comments) by 20 Sep – send also to Lee Hsiang

• Revised version to Lee Hsiang by 27 Sep

• Writing assignment (Sepkoski and 10 commandments: ASAP )

• All R assignments in one annotated R file by 20 Sep noon; results (including plots), description of data you downloaded and interpretation in a separate pdf file.

All models are wrong but some are useful.

-Box 1976 J. Am. Stat. Assoc. 71:791-799

Introduction to likelihood, AIC and model selection

Models

• Model

– an idea (formalized or not) about how the world (or a part of it) works.

– may be conceptual, verbal or mathematical.

– competing hypotheses can be represented as different models.

– mathematical models can be developed to project the consequences of hypotheses

Parameter – a true characteristic of the “population” of interest

Estimator – an equation or process used to produce a parameter estimate from observed data

4

Illustrating likelihood (Steve Wang’s e.g.)

Clothing

(Observed)

Jacket

Weather

(unknown)

Cold

0.8

Hot

0.1

Tshirts

Total

0.2

1.0

0.9

1.0

Rows: statement about an unknown and unobserved state

Likelihood (weather|clothing)

Columns: state of observed data

Probability(clothing|weather)

Total

0.9

1.1

5

Illustration of MLE

• (Steve Wang’s example from 2010)

• Sampled 297 gastropod shells from mid Palaeocene Alabama. 138 had drill holes.

• Of all gastropods that are in that locality (including those not sampled or not preserved), what proportion drilled? Let this = p

• Let n = sampled gastropods, let x = drilled of those sampled

Illustration of MLE

• (Steve Wang’s example from 2010)

• Sampled 297 gastropod shells from mid Palaeocene Alabama. 138 had drill holes.

• Of all gastropods that are in that locality (including those not sampled or not preserved), what proportion drilled? Let this = p

• Let n = sampled gastropods, let x = drilled of those sampled

• L(p)=P(x|p)

The likelihood of the true parameter is the probability of the observed data (x) given the parameter

Binomial distribution

• A gastropod sampled can be of two states: drilled or not drilled.

• Hence this data can be modeled using a binomial distribution

• Repeat n trials under the same conditions (= picking up gastropods)

• Each trial has two outcomes, drilled or NOT drilled

• The probability of being drilled is p and NOT drilled is 1-p

If A is an event, then

𝑃 𝐴 = 1 − 𝑃 𝐴

A

Not A

Binomial distribution

• A gastropod sampled can be of two states: drilled or not drilled.

• Hence this data can be modeled using a binomial distribution

• Repeat n trials under the same conditions (= picking up gastropods)

• Each trial has two outcomes, drilled or NOT drilled

• The probability of being drilled is p and NOT drilled is 1-p

• The trials are independent (because one gastropod is drilled has nothing to do with another gastropod being drilled…. Think about this one)

• 𝑃 π‘₯ 𝑝 = 𝑛!

π‘₯! 𝑛−π‘₯ !

𝑝 π‘₯ (1 − 𝑝) 𝑛−π‘₯

=

• 𝐿(𝑝) = 𝑛!

π‘₯! 𝑛−π‘₯ !

𝑝 π‘₯ (1 − 𝑝) 𝑛−π‘₯ 𝑛 π‘˜ 𝑝 π‘₯ (1 − 𝑝) 𝑛−π‘₯

Binomial distribution

• 𝑃 π‘₯ 𝑝 = 𝑛!

π‘₯! 𝑛−π‘₯ !

𝑝 π‘₯ (1 − 𝑝) 𝑛−π‘₯ n = 297 (total gastropods)

X = 138 (drilled)

110 120 130 x

140 150 160 170

• 𝐿(𝑝) = 𝑛!

π‘₯! 𝑛−π‘₯ !

𝑝 π‘₯ (1 − 𝑝) 𝑛−π‘₯

0.35

0.40

0.45

p

0.50

0.55

0.60

Find MLE

• Find the value of p that maximizes L(p)

• 𝐿(𝑝) = 𝑛!

π‘₯! 𝑛−π‘₯ !

𝑝 π‘₯ (1 − 𝑝) 𝑛−π‘₯

= 𝑛 π‘₯ 𝑝 π‘₯ (1 − 𝑝) 𝑛−π‘₯

• log 𝐿 𝑝 = log( 𝑛!

π‘₯! 𝑛−π‘₯ !

) + π‘₯π‘™π‘œπ‘” 𝑝 + 𝑛 − π‘₯ log 1 − 𝑝

• Solve for p at which d/dp log L(p) is zero to get x/n

• a trivial example with a closed form estimate, but mostly not true, have to use computer intensive ways to “search” for the maximum

0.35

0.40

0.45

p

0.50

0.55

0.60

Examples of Paleo Maximum

Likelihood uses

• Foote (2003). "Origination and extinction through the Phanerozoic:

A new approach." Journal of Geology 111(2): 125-148.

• Hunt (2007). "The relative importance of directional change, random walks, and stasis in the evolution of fossil lineages."

Proceedings of the National Academy of Sciences of the United

States of America 104(47): 18404-18408.

• Wagner (2000). "Likelihood tests of hypothesized durations: determining and accommodating biasing factors." Paleobiology

26(3): 431-449.

• Liow, Fortelius et al. (2008). "Higher origination and extinction rates in larger mammals." Proceedings of the National Academy of

Sciences of the United States of America 105(16): 6097-6102.

• And many many more

Simple Likelihood Ratio example

• What can we do with a Likelihood?

• Compare two models using likelihood

– by using a likelihood ratio, limited use, like hypothesis testing, but sometimes useful

• 779 ancestor-descendent mammal pairs in NAm (Cenozoic)

• 442 of descendants are larger than their ancestors (56.7%)

• Based in these data, is there a general size increase in mammals in NAm over time or not?

Likelihood Ratio Test (LRT)

• used to evaluate the difference between nested models

• One model is considered nested in another if the first model can be generated by imposing restrictions on the parameters of the second

• Λ =

𝐿(𝑝 π‘œ

)

• 442 of descendants are larger than their ancestors (56.7%)

• Λ =

𝐿(𝑝 π‘œ

)

=

𝐿(.5)

𝐿(.567)

Likelihood Ratio Test (LRT)

• used to evaluate the difference between nested models

• One model is considered nested in another if the first model can be generated by imposing restrictions on the parameters of the second

• Λ =

𝐿(𝑝 π‘œ

)

• 442 of descendants are larger than their ancestors (56.7%)

• Λ =

𝐿(𝑝 π‘œ

)

=

𝐿(.5)

𝐿(.567)

R class exercise, hint the function you need is called choose to write the binomial coefficient 𝑛 π‘₯

.

• Write this equation using R 𝐿(𝑝) = 𝑛!

𝑝 π‘₯ (1 − 𝑝) 𝑛−π‘₯

= 𝑛 𝑝 π‘₯ π‘₯! 𝑛−π‘₯ !

π‘₯

(1 − 𝑝) 𝑛−π‘₯

#class exercise mammal body size example from Wang 2010 a=(choose(779, 442))*(0.5^442)*((1-0.5)^(779-442)) b=(choose(779, 442))*(0.567^442)*((1-0.567)^(779-442)) lambda=a/b

-2*log(lambda)

Test statistic is -2log( Λ )

(follows a chi sq distribution so we can use that)

H testing, L ratios, Model comparison

• Hypothesis testing compares 2 models, one null and one alternative (L ratio test as well).

• Sometimes the null makes sense, but often it does not, hence the only thing we are finding out with a significance test is whether we have enough data to show that null is false.

• Why is previous example ok?

Valid inference

Fischer 1922: valid inference=

1. model specification

2. estimation of model parameters

3. estimation of precision

• In much of science, neither the model parameters nor the model is known!

• Hence problem with step 1. --- missing model formulation and selection!

• We need to be able to formulate models and select among them, but how?

Kullback-Leibler Distance Illustration (basis for AIC)

KL dist 1

Model 1

TRUTH

KL dist 2

Model 2 KL dist 4

KL dist 3

Model 3

Model 4

In practice

• Make a set of candidate models: think long and hard about

“realistic” and non trivial-models (~4-20)

• Write them down and identify (or make) a global model

• A global model is one where all variables thought to be important are included

• If global model fits data adequately, the selected model that is more parsimonious will also fit the data (this is an empirical result)

• Note: if a really good model is not in the candidate set, we’re screwed*

• Compare the models (model comparison)

• There are many information criteria “out there”

• There we argue for the AIC (Akaike Information Criteria)

Akaike Information Criteria

• AIC is based on the Kullback-Leibler distance (solid theoretical basis)

• Does not assume that “truth” is in the candidate set of models.

• Measures relative distance to truth

• AIC chooses the candidate model with the smallest expected K-L distance.

• Balance between narrowing distance to truth and precision of estimates

• Can be calculated using either least squares regression

(caveats) and better, likelihood , 𝐿 θ π‘‘π‘Žπ‘‘π‘Ž, π‘šπ‘œπ‘‘π‘’π‘™

• 𝐴𝐼𝐢 = 2𝐾 − 2log(𝐿)

𝐴𝐼𝐢 = −2log(𝐿) + 2𝐾

K

Akaike Information Criteria

• Balance model fit with estimator precision

• 𝐴𝐼𝐢 = 2𝐾 − 2log(𝐿)

• Models with small AIC values are preferred

• However it is differences between AICs for different models that is crucial

26

Kullback-Leibler Distance Illustration (basis for AIC)

KL dist 1

Model 1

TRUTH

KL dist 2

Model 2 KL dist 4

KL dist 3

Model 3

Model 4

ΔAIC

Model 1

Model 2

Model 3

Model 4

ΔAIC

• 

AIC = AIC –min(AIC)

• 

AIC measures the distance between the AIC for the model being considered and the AIC of the model with the lowest

AIC in the candidate set of models

• Information lost when given model is used instead of the top model

• Absolute AIC values have little meaning, they reflect sample size mostly

• A rule of thumb is



AICc

ο‚£

2 substantial support and should be used for making inferences.



AICc of about 4 to 7 have considerably less support



AICc > 10 have essentially no support

29

AIC model weights

• Model weights can be calculated that are ad-hoc measures of support for each model in the candidate set

• 𝐴𝐼𝐢 𝑀𝑑𝑗 = exp(− 𝑖=π‘š 𝑖=1

Δ𝐴𝐼𝐢 𝑗 𝑒π‘₯𝑝(−

)

2

Δ𝐴𝐼𝐢

2 π‘š )

• Uses:

– For getting model averaged estimates

– -evaluate the importance of specific factors (by summing across models with the overlapping variables)

30

Adjusted AICs

• Adjusted AIC for amount of data versus number of parameters trying to estimate (small sample adjustment)

2𝐾(𝐾 + 1)

𝐴𝐼𝐢 𝑐

= 𝐴𝐼𝐢 + 𝑛 − 𝐾 − 1

• Adjusted AIC for overdispersion/poor model fit (coun data)

𝑄𝐴𝐼𝐢 = −[2log(𝐿( θ )/ )]+2K

• Both

𝑄𝐴𝐼𝐢 𝑐

= 𝑄𝐴𝐼𝐢 +

2𝐾(𝐾+1) 𝑛−𝐾−1

31

R demo

References

• Readings:

– Gary White’s lecture notes on Model selection

– Johnson and Omland TREE 2004 (an easy to read over view on model selection written for biologists, but should be understandable to paleobiologists)

• Burnham and Anderson 2002 Model Selection and

Multimodel Inference: A practical Information-Theoretic

Approach

– This is “the” book on model selection. The material for this lecture came largely out of Chapters 1-3

33

Continued R assignment

• Using the same data you downloaded, tabulate the number of occurrences per unit time (of your choice) and plot that (either for the entire taxonomic group, or split up into genera or species, depending on how much data you happen to have, using an R script

• Fit models of occurrences ~ time as you see fit after plotting out the data.

We showed some non linear models in class, but you might like some linear models as well, check out http://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html

• Write a summary of your observations

• This is more a simple exercise in “semi-canned” model fitting just so you feel comfortable with models and comparing them using AIC. All the maximum likelihood is done for you in R, but hopefully you understand the principle from the short lecture. There are many model fitting and optimization “machines” in R.

• In preparation for next lecture, download: (only runs in windows) http://warnercnr.colostate.edu/~gwhite/mark/mark.htm#Introduction

weblinks

http://www.r-bloggers.com/fitting-a-modelby-maximum-likelihood/

And watch another TED talk http://www.ted.com/talks/peter_donnelly_sho ws_how_stats_fool_juries.html

Download