Slide 1

Ordered probit models

1

Ordered Probit

• Many discrete outcomes are to questions that have a natural ordering but no quantitative interpretation:

• Examples:

– Self reported health status

• (excellent, very good, good, fair, poor)

– Do you agree with the following statement

• Strongly agree, agree, disagree, strongly disagree

2

• Can use the same type of model as in the previous section to analyze these outcomes

• Another ‘latent variable’ model

• Key to the model: there is a monotonic ordering of the qualitative responses

3

Self reported health status

• Excellent, very good, good, fair, poor

• Coded as 1, 2, 3, 4, 5 on National Health

Interview Survey

• We will code as 5,4,3,2,1 (easier to think of this way)

• Asked on every major health survey

• Important predictor of health outcomes, e.g. mortality

• Key question: what predicts health status?

4

• Important to note – the numbers 1-5 mean nothing in terms of their value, just an ordering to show you the lowest to highest

• The example below is easily adapted to include categorical variables with any number of outcomes

5

Model

• y i

* = latent index of reported health

• The latent index measures your own scale of health. Once y i

* crosses a certain value you report poor, then good, then very good, then excellent health

6

• y i

= (1,2,3,4,5) for (fair, poor, VG, G, excel)

• Interval decision rule

• y i

=1 if y i

* ≤ u

1

• y i

=2 if u

1

• y i

=3 if u

2

< y

< y i i

* ≤ u

* ≤ u

2

3

• y i

=4 if u

3

< y i

* ≤ u

4

• y i

=5 if y i

* > u

4

7

• As with logit and probit models, we will assume y i

* is a function of observed and unobserved variables

• y i

* = β

0

+ x

1i

β

1

+ x

2i

β

2

…. x ki

β k

+ ε i

• y i

* = x i

β + ε i

8

• The threshold values (u

1

, u

2

, u

3

, u

4

) are unknown. We do not know the value of the index necessary to push you from very good to excellent.

• In theory, the threshold values are different for everyone

• Computer will not only estimate the β’s, but also the thresholds – average across people

9

• As with probit and logit, the model will be determined by the assumed distribution of ε

• In practice, most people pick nornal, generating an ‘ordered probit’ (I have no idea why)

• We will generate the math for the probit version

10

Probabilities

• Lets do the outliers, Pr(y i

Pr(y i

=5) first

=1) and

• Pr(y i

=1)

• = Pr(y i

• = Pr(x i

• =Pr(ε i

• = Φ[u

1

* ≤ u

1

)

β +ε i

≤ u

1

- x

≤ u

1

β)

)

- x i i

β] = 1- Φ[x i

β – u

1

]

11

• Pr(y i

=5)

• = Pr(y i

* > u

4

)

• = Pr(x i

β +ε i

> u

4

)

• =Pr(ε i

> u

4

• = 1 - Φ[u

4

- x i

β)

- x i

β] = Φ[x i

β – u

4

]

12

Sample one for y=3

• Pr(y i

=3) = Pr(u

2

< y i

* ≤ u

3

)

= Pr(y i

* ≤ u

3

) – Pr(y i

* ≤ u

2

)

= Pr(x i

β +ε i

= Pr(ε i

≤ u

3

≤ u

- x i

3

) – Pr(x

β) - Pr(ε i i

β +ε i

≤ u

2

≤ u

- x i

β)

2

)

= Φ[u

3

- x i

β] - Φ[u

2

- x i

β]

= 1 - Φ[x i

β - u

3

] – 1 + Φ[x i

β - u

2

]

= Φ[x i

β - u

2

] - Φ[x i

β - u

3

]

13

Summary

• Pr(y i

=1) = 1- Φ[x i

β – u

1

]

• Pr(y i

=2) = Φ[x i

β – u

1

] - Φ[x i

β – u

2

]

• Pr(y i

=3) = Φ[x i

β – u

2

] - Φ[x i

β – u

3

]

• Pr(y i

=4) = Φ[x i

β – u

3

] - Φ[x i

β – u

4

]

• Pr(y i

=5) = Φ[x i

β – u

4

]

14

Likelihood function

• There are 5 possible choices for each person

• Only 1 is observed

• L = Σ i ln[Pr(y i

=k)] for k

15

Programming example

• Cancer control supplement to 1994

National Health Interview Survey

• Question: what observed characteristics predict self reported health (1-5 scale)

• 1=poor, 5=excellent

• Key covariates: income, education, age, current and former smoking status

• Programs

• sr_health_status.do, .dta, .log

16

• desc;

• male byte %9.0g =1 if male

• age byte %9.0g age in years

• educ byte %9.0g years of education

• smoke byte %9.0g current smoker

• smoke5 byte %9.0g smoked in past 5 years

• black float %9.0g =1 if respondent is black

• othrace float %9.0g =1 if other race (white is ref)

• sr_health float %9.0g 1-5 self reported health,

• 5=excel, 1=poor

• famincl float %9.0g log family income

17

• tab sr_health;

•

•

•

•

•

•

1-5 self | reported | health, |

•

•

5=excel, |

1=poor | Freq. Percent Cum.

• ------------+-----------------------------------

• 1 | 342 2.65 2.65

2 | 991 7.68 10.33

3 | 3,068 23.78 34.12

4 | 3,855 29.88 64.00

• 5 | 4,644 36.00 100.00

• ------------+-----------------------------------

• Total | 12,900 100.00

18

In STATA

• oprobit sr_health male age educ famincl black othrace smoke smoke5;

19

• Ordered probit estimates Number of obs = 12900

•

•

LR chi2(8) = 2379.61

Prob > chi2 = 0.0000

• Log likelihood = -16401.987 Pseudo R2 = 0.0676

•

•

•

•

•

• ------------------------------------------------------------------------------

• sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval]

• -------------+----------------------------------------------------------------

• male | .1281241 .0195747 6.55 0.000 .0897583 .1664899

age | -.0202308 .0008499 -23.80 0.000 -.0218966 -.018565

educ | .0827086 .0038547 21.46 0.000 .0751535 .0902637

famincl | .2398957 .0112206 21.38 0.000 .2179037 .2618878

black | -.221508 .029528 -7.50 0.000 -.2793818 -.1636341

othrace | -.2425083 .0480047 -5.05 0.000 -.3365958 -.1484208

•

• smoke | -.2086096 .0219779 -9.49 0.000 -.2516855 -.1655337

smoke5 | -.1529619 .0357995 -4.27 0.000 -.2231277 -.0827961

• -------------+----------------------------------------------------------------

• _cut1 | .4858634 .113179 (Ancillary parameters)

• _cut2 | 1.269036 .11282

•

•

_cut3 | 2.247251 .1138171

_cut4 | 3.094606 .1145781

• ------------------------------------------------------------------------------

20

Interpret coefficients

• Marginal effects/changes in probabilities are now a function of 2 things

– Point of expansion (x’s)

– Frame of reference for outcome (y)

• STATA

– Picks mean values for x’s

– You pick the value of y

21

Continuous x’s

• Consider y=5

• d Pr(y i

=5)/dx i

= d Φ[x i

β – u

4

]/dx i

= βφ[x i

β – u

4

]

• Consider y=3

• d Pr(y i

=3)/dx i

= βφ[x i

β – u

3

] - βφ[x i

β – u

4

]

22

Discrete X’s

• x i

β = β

– X

2i

0

+ x

1i

β

1

+ x

2i

β is yes or no (1 or 0)

2

…. x ki

β k

• ΔPr(y i

=5) =

• Φ[β

0

+ x

1i

- Φ[β

0

β

1

+ x

+ β

1i

2

β

1

+ x

3i

β

3

+ x

3i

β

3

+.. x ki

β k

]

…. x ki

β k

]

• Change in the probabilities when x

2i and x

2i

=0

=1

23

Ask for marginal effects

• mfx compute, predict(outcome(5));

24

• mfx compute, predict(outcome(5));

• Marginal effects after oprobit

•

• y = Pr(sr_health==5) (predict, outcome(5))

= .34103717

• ------------------------------------------------------------------------------

• variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

• ---------+--------------------------------------------------------------------

• male*| .0471251 .00722 6.53 0.000 .03298 .06127 .438062

•

•

•

•

• age | -.0074214 .00031 -23.77 0.000 -.008033 -.00681 39.8412

educ | .0303405 .00142 21.42 0.000 .027565 .033116 13.2402

famincl | .0880025 .00412 21.37 0.000 .07993 .096075 10.2131

black*| -.0781411 .00996 -7.84 0.000 -.097665 -.058617 .124264

othrace*| -.0843227 .01567 -5.38 0.000 -.115043 -.053602 .04124

•

• smoke*| -.0749785 .00773 -9.71 0.000 -.09012 -.059837 .289147

smoke5*| -.0545062 .01235 -4.41 0.000 -.078719 -.030294 .081395

• ------------------------------------------------------------------------------

• (*) dy/dx is for discrete change of dummy variable from 0 to 1

25

Interpret the results

• Males are 4.7 percentage points more likely to report excellent

• Each year of age decreases chance of reporting excellent by 0.7 percentage points

• Current smokers are 7.5 percentage points less likely to report excellent health

26

Minor notes about estimation

• Wald tests/-2 log likelihood tests are done the exact same was as in PROBIT and LOGIT

27

• Use PRCHANGE to calculate marginal effect for a specific person prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16);

– When a variable is NOT specified (famincl),

STATA takes the sample mean.

28

• PRCHANGE will produce results for all outcomes

•

•

• male

•

•

Avg|Chg| 1 2 3 4

0->1 .0203868 -.0020257 -.00886671 -.02677558 -.01329902

5

0->1 .05096698

29

•

•

• age

• Avg|Chg| 1 2 3 4

• Min->Max .13358317 .0184785 .06797072 .17686112 .07064757

-+1/2 .00321942 .00032518 .00141642 .00424452 .00206241

-+sd/2 .03728014 .00382077 .01648743 .04910323 .0237889

• MargEfct .00321947 .00032515 .00141639 .00424462 .00206252

30

Slide 1

Ordered probit models

Ordered Probit

Self reported health status

Model

Probabilities

Sample one for y=3

Summary

Likelihood function

Programming example

In STATA

Interpret coefficients

Continuous x’s

Discrete X’s

Ask for marginal effects

Interpret the results

Minor notes about estimation

Related documents

Products

Support

Slide 1

Ordered probit models

Ordered Probit

Self reported health status

Model

Probabilities

Sample one for y=3

Summary

Likelihood function

Programming example

In STATA

Interpret coefficients

Continuous x’s

Discrete X’s

Ask for marginal effects

Interpret the results

Minor notes about estimation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib