Analyzing Proportions

advertisement
A single proportion
Multiple proportions
Analyzing Proportions
Maarten L. Buis
Institut für Soziologie
Eberhard Karls Universität Tübingen
www.maartenbuis.nl
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
The problem
I
A proportion is bounded between 0 and 1, this means that:
I
I
I
the effect of explanatory variables tends to be non-linear,
and
the variance tends to decrease when the mean gets closer
to one of the boundaries.
This makes linear regression unattractive.
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
Solutions
I
model the distribution of the dependent variable(s) with
either
I
I
I
I
a beta distribution, betafit
a zero/one inflated beta distribution, zoib
a Dirichlet distribution, dirifit
model how the mean proportion relates to explanatory
variables using
I
I
a fractional logit, glm
a fractional multinomial logit, fmlogit
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
the beta distribution
I
A flexible distribution bounded between 0 and 1 (excluding
0 and 1)
I
Two parameters: the mean and a scale parameter.
I
The variance is a function of the mean and the scale
parameter: the variance is largest when the mean is 0.5.
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
Some pictures
µ=.5, φ= 4
σ=.22
6
density
4
2
0
0
.2
.4
.6
y
Maarten L. Buis
Analyzing Proportions
.8
1
A single proportion
Multiple proportions
Some pictures
µ=.5, φ= 6
σ=.19
6
density
4
2
0
0
.2
.4
.6
y
Maarten L. Buis
Analyzing Proportions
.8
1
A single proportion
Multiple proportions
Some pictures
µ=.5, φ= 10
σ=.15
6
density
4
2
0
0
.2
.4
.6
y
Maarten L. Buis
Analyzing Proportions
.8
1
A single proportion
Multiple proportions
Some pictures
µ=.5, φ= 20
σ=.11
6
density
4
2
0
0
.2
.4
.6
y
Maarten L. Buis
Analyzing Proportions
.8
1
A single proportion
Multiple proportions
Some pictures
µ=.25, φ= 20
σ=.09
6
density
4
2
0
0
.2
.4
.6
y
Maarten L. Buis
Analyzing Proportions
.8
1
A single proportion
Multiple proportions
Some pictures
µ=.15, φ= 20
σ=.08
6
density
4
2
0
0
.2
.4
.6
y
Maarten L. Buis
Analyzing Proportions
.8
1
A single proportion
Multiple proportions
Some pictures
µ=.5, φ= 2
σ=.29
6
density
4
2
0
0
.2
.4
.6
y
Maarten L. Buis
Analyzing Proportions
.8
1
A single proportion
Multiple proportions
Some pictures
µ=.5, φ= 1
σ=.35
6
density
4
2
0
0
.2
.4
.6
y
Maarten L. Buis
Analyzing Proportions
.8
1
A single proportion
Multiple proportions
betafit
I
Fits a beta distribution, where the mean and scale
parameter are functions of explanatory variables
I
Various types of partial and marginal effects: dbetafit
I
Can be installed by typing in Stata ssc install
betafit
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear
(Spending on different categories by Dutch cities in 2005)
. betafit governing , mu(minorityleft noleft houseval popdens) nolog
ML fit of beta (mu, phi)
Number of obs
=
Wald chi2(4)
=
Log likelihood = 768.06704
Prob > chi2
=
Std. Err.
z
governing
Coef.
minorityleft
noleft
houseval
popdens
_cons
-.102143
.1047123
.2970051
-.1247097
-2.607601
.059603
.0611709
.0483488
.0262695
.0856001
-1.71
1.71
6.14
-4.75
-30.46
0.087
0.087
0.000
0.000
0.000
-.2189627
-.0151804
.2022432
-.176197
-2.775374
.0146768
.2246049
.391767
-.0732223
-2.439828
/ln_phi
4.168403
.0715323
58.27
0.000
4.028202
4.308604
phi
64.61219
4.621857
56.15986
74.33662
Maarten L. Buis
P>|z|
394
109.99
0.0000
[95% Conf. Interval]
Analyzing Proportions
A single proportion
Multiple proportions
example
. dbetafit, at(minorityleft 0 noleft 0)
discrete
change
minorityleft
noleft
houseval
popdens
Min --> Max
coef.
se
-.0084
.0094
.0879
-.0494
Marginal
Effects
houseval
popdens
.0047
.0057
.0217
.0075
.0101
-.0099
MFX at x
coef.
se
.0254
-.0107
E(governing|x) =
minorityleft
noleft
houseval
popdens
+-SD/2
coef.
se
.0022
.0019
+-1/2
coef.
.0255
-.0107
se
.0056
.0021
Max MFX
coef.
se
.0056
.0021
.0743
-.0312
.0121
.0066
.0945
x
0
0
1.492
.7629
mean
.434
.3858
1.492
.7629
sd
.4963
.4874
.3971
.9303
min
0
0
.72
.025
max
1
1
3.63
5.711
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
What about 0s and 1s?
I
I
betafit ignores 0s and 1s.
If we want to include those, we have to make a decision
about how those 0s and 1s came about:
I
I
I
I
0s and 1s represent very low or very high proportions that
“by accident” resulted in a proportion of 0 or 1.
Implies a fractional logit, which in Stata can be estimated
using glm.
0s and 1s represent distinct processes
Implies a zero-one inflated beta, which in Stata can be
estimated using zoib
I
Alternatively, you can transform your dependent variable to
“push” your 0s and 1s a tiny bit inwards
I
Smithson and Verkuilen (2006) propose
y’ = (y*(N - 1) + .5)/N
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
Fractional logit
I
I
0s and 1s occur through the same process as the other
proportions
Only models the mean, this means:
I
I
I
less sensitive to errors in other parts of the model, e.g. the
variance, but
not suitable when interest is in other quantities than the
mean, e.g. the variance
Can be estimated with glm in combination with the
link(logit) family(binomial) robust options.
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. use "http://fmwww.bc.edu/repec/bocode/k/k401.dta", clear
(source: Papke and Wooldridge 1996)
. replace totemp = totemp/10000
(4734 real changes made)
. glm prate mrate totemp age sole, ///
>
family(binomial) link(logit) vce(robust) nolog
note: prate has noninteger values
Generalized linear models
Optimization
: ML
No. of obs
Residual df
Scale parameter
(1/df) Deviance
(1/df) Pearson
[Binomial]
[Logit]
AIC
BIC
Deviance
= 1023.737134
Pearson
= 1377.971352
Variance function: V(u) = u*(1-u/1)
Link function
: g(u) = ln(u/(1-u))
Log pseudolikelihood = -1366.491144
prate
Coef.
mrate
totemp
age
sole
_cons
.5734427
-.0577987
.0308946
.3635964
1.074062
Robust
Std. Err.
.0799253
.011467
.0027881
.0476003
.0489076
z
7.17
-5.04
11.08
7.64
21.96
Maarten L. Buis
P>|z|
0.000
0.000
0.000
0.000
0.000
=
=
=
=
=
4734
4729
1
.2164807
.2913875
= .5794217
= -38995.55
[95% Conf. Interval]
.416792
-.0802736
.0254301
.2703017
.9782051
.7300934
-.0353238
.0363591
.4568912
1.169919
Analyzing Proportions
A single proportion
Multiple proportions
example
. mfx, at(mean sole=0)
Marginal effects after glm
y = Predicted mean prate (predict)
= .86775841
variable
mrate
totemp
age
sole*
dy/dx
.0658047
-.0066326
.0035453
.0364495
Std. Err.
.00803
.00132
.00033
.00471
z
8.19
-5.02
10.69
7.73
P>|z|
[
0.000
0.000
0.000
0.000
.050058 .081551
-.009224 -.004041
.002895 .004195
.027209
.04569
95% C.I.
]
X
.746335
.462107
13.1398
0
(*) dy/dx is for discrete change of dummy variable from 0 to 1
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
zoib: zero one inflated beta
I
A zero/one inflated beta model consists of three parts:
I
I
I
I
a logistic regression model for whether or not the proportion
equals 0,
a logistic regression model for whether or not the proportion
equals 1,
a beta model for the proportions between 0 and 1.
This model is for situations where you believe that the
decisions for proportions of 0 and/or 1 are governed by a
different process as the other proportions.
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. zoib prate mrate totemp age sole, ///
>
oneinflate( mrate totemp age sole) robust nolog
ML fit of oib
Number of obs
Wald chi2(4)
Log pseudolikelihood = -1293.6594
Prob > chi2
Robust
Std. Err.
z
P>|z|
=
=
=
4734
136.47
0.0000
prate
Coef.
proportion
mrate
totemp
age
sole
_cons
[95% Conf. Interval]
.1524644
-.0265332
.0216248
.0604715
.8738362
.0466905
.0092522
.0020206
.0376378
.0354738
3.27
-2.87
10.70
1.61
24.63
0.001
0.004
0.000
0.108
0.000
.0609527
-.0446673
.0176645
-.0132972
.8043088
.243976
-.0083992
.0255852
.1342402
.9433636
oneinflate
mrate
totemp
age
sole
_cons
.7935556
-.1416409
.020835
.9044132
-1.472011
.0653962
.0354509
.003494
.0654829
.0702084
12.13
-4.00
5.96
13.81
-20.97
0.000
0.000
0.000
0.000
0.000
.6653814
-.2111235
.0139869
.7760692
-1.609617
.9217297
-.0721584
.0276832
1.032757
-1.334405
1.77591
.0358677
49.51
0.000
1.705611
1.84621
ln_phi
_cons
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. mfx, predict(pr) at(mean sole = 0)
Marginal effects after zoib
y = Proportion (predict, pr)
= .85369833
variable
mrate
totemp
age
sole*
dy/dx
.0566366
-.0100315
.0034952
.053115
Std. Err.
.00679
.00208
.00031
.0047
z
8.34
-4.83
11.37
11.31
P>|z|
[
0.000
0.000
0.000
0.000
.043326 .069947
-.014104 -.005959
.002893 .004098
.043908 .062322
95% C.I.
]
X
.746335
.462107
13.1398
0
(*) dy/dx is for discrete change of dummy variable from 0 to 1
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. mfx, predict(pr1) at(mean sole = 0)
Marginal effects after zoib
y = probability of having value 1 (predict, pr1)
= .33817515
variable
mrate
totemp
age
sole*
dy/dx
.1776078
-.031701
.0046631
.2198069
Std. Err.
.01555
.00796
.00078
.01548
z
11.42
-3.98
6.01
14.20
P>|z|
[
0.000
0.000
0.000
0.000
.147125 .208091
-.047304 -.016098
.003143 .006184
.189476 .250138
95% C.I.
]
X
.746335
.462107
13.1398
0
(*) dy/dx is for discrete change of dummy variable from 0 to 1
.
. mfx, predict(prcond) at(mean sole = 0)
Marginal effects after zoib
y = proportion conditional on not having value 0 or 1 (predict, prcond)
=
.778942
variable
mrate
totemp
age
sole*
dy/dx
.0262531
-.0045688
.0037236
.0102369
Std. Err.
.00781
.0016
.00035
.00632
z
3.36
-2.86
10.57
1.62
P>|z|
[
0.001
0.004
0.000
0.105
.010948 .041558
-.007698 -.001439
.003033 .004414
-.002149 .022623
95% C.I.
]
X
.746335
.462107
13.1398
0
(*) dy/dx is for discrete change of dummy variable from 0 to 1
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
Comparing models
beta
mrate
0.027
(3.30)
beta with
transformed y
0.033
(17.20)
totemp
-0.005
(-2.86)
-0.006
(-5.23)
-0.007
(-5.02)
-0.010
(-4.83)
age
0.004
(10.54)
0.001
(8.38)
0.004
(10.69)
0.003
(11.37)
sole (d)
0.011
(1.62)
2711
0.038
(13.74)
4734
0.036
(7.73)
4734
0.053
(11.31)
4734
N
flogit
zoib
0.066
(8.19)
0.057
(8.34)
Marginal effects; z statistics in parentheses
(d) for discrete change of dummy variable from 0 to 1
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
types of questions
I
How are the proportions related to one another?
I
I
I
I
I
Proportions are automatically (negatively) correlated: if you
spent more on one thing, there is less left over for the rest.
The question is how much association between proportions
exist nett of this automatic correlation.
Literature exists on this question, most notably Aitchinson
(2003 [1986]).
Have not been implemented in Stata.
How are the proportions related to explanatory variables?
I
Two options:
I
I
I
dirifit: Fits a Dirichlet distribution, which is an extension
of the beta distribution to multiple proportions.
fmlogit: Fits a fractional multinomial logit, which is an
extension of the fractional logit to multiple proportions.
Both assume that all correlation between proportions is due
to the ‘automatic correlation’
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear
(Spending on different categories by Dutch cities in 2005)
. replace social = social + education + recreation
(395 real changes made)
. dirifit governing safety social urbanplanning, ///
>
muvar(minorityleft noleft houseval popdens) nolog
ML fit of Dirichlet (mu, phi)
Number of obs
Wald chi2(12)
Log likelihood = 1725.1477
Prob > chi2
Coef.
Std. Err.
z
P>|z|
=
=
=
392
189.54
0.0000
[95% Conf. Interval]
mu2
minorityleft
noleft
houseval
popdens
_cons
.1215461
.0423453
-.104733
.0030234
.7199195
.0900454
.0925709
.0729776
.0385816
.129466
1.35
0.46
-1.44
0.08
5.56
0.177
0.647
0.151
0.938
0.000
-.0549397
-.1390903
-.2477665
-.0725951
.4661708
.2980319
.2237809
.0383004
.0786419
.9736682
mu3
minorityleft
noleft
houseval
popdens
_cons
.0492506
-.2055446
-.4818642
.1282351
2.237157
.078676
.0813187
.0694126
.0331747
.1187476
0.63
-2.53
-6.94
3.87
18.84
0.531
0.011
0.000
0.000
0.000
-.1049516
-.3649264
-.6179105
.063214
2.004415
.2034528
-.0461629
-.3458179
.1932563
2.469898
mu4
minorityleft
noleft
houseval
popdens
_cons
.1530504
-.0205669
-.1563326
.1285933
.9378096
.0863215
.0891623
.070531
.0354878
.1247067
1.77
-0.23
-2.22
3.62
7.52
0.076
0.818
0.027
0.000
0.000
-.0161367
-.1953218
-.2945709
.0590385
.693389
.3222375
.1541879
-.0180943
.1981481
1.18223
/ln_phi
3.60327
.0405736
88.81
0.000
3.523747
3.682792
phi
36.71809
1.489786
33.91125
39.75726
mu2 = safety
mu3 = social
mu4 = urbanplanning
base outcome = governing
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. ddirifit, at(minorityleft 0 noleft 0 )
discrete
change
Min --> Max
coef.
se
+-SD/2
coef.
se
+-1/2
coef.
se
governing
minorityleft
noleft
houseval
popdens
-.0078
.0099
.0937
-.0461
.0066
.0074
.0233
.0115
.0115
-.0087
.0024
.0027
.0293
-.0093
.0062
.0029
safety
minorityleft
noleft
houseval
popdens
.0072
.0257
.0926
-.0792
.0088
.0096
.0254
.0152
.013
-.0149
.003
.0035
.0333
-.0159
.0077
.0037
social
minorityleft
noleft
houseval
popdens
-.0159
-.0527
-.264
.0865
.0114
.012
.0304
.0251
-.0366
.0164
.0045
.0042
-.0935
.0174
.0114
.0045
urbanplann~g
minorityleft
noleft
houseval
popdens
.0165
.0171
.0777
.0387
.0097
.0102
.0265
.0219
.0121
.0073
.0033
.0034
.0309
.0078
.0085
.0036
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
Marginal
Effects
MFX at x
coef.
se
governing
houseval
popdens
.0293
-.0093
.0061
.0029
safety
houseval
popdens
.0334
-.0159
.0077
.0037
social
houseval
popdens
-.0937
.0174
.0115
.0045
.031
.0078
.0085
.0035
urbanplann~g
houseval
popdens
E(governing|x) =
E(safety|x) =
E(social|x) =
E(urbanplann~g|x) =
.0993
.175
.5032
.2225
x
0
0
1.483
.7839
mean
.4337
.3878
1.483
.7839
minorityleft
noleft
houseval
popdens
sd
.4962
.4879
.3902
.9408
min
0
0
.72
.025
max
1
1
3.63
5.711
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. fmlogit governing safety social urbanplanning, ///
>
eta(minorityleft noleft houseval popdens) nolog
ML fit of fractional multinomial logit
Number of obs
Wald chi2(12)
Prob > chi2
Log pseudolikelihood = -480.22927
Coef.
Robust
Std. Err.
z
P>|z|
=
=
=
392
232.02
0.0000
[95% Conf. Interval]
eta_safety
minorityleft
noleft
houseval
popdens
_cons
.1893804
.0826287
-.1389243
.0115828
.7472656
.0596091
.0617178
.0555301
.021217
.0920767
3.18
1.34
-2.50
0.55
8.12
0.001
0.181
0.012
0.585
0.000
.0725486
-.038336
-.2477614
-.0300018
.5667985
.3062121
.2035934
-.0300872
.0531673
.9277326
eta_social
minorityleft
noleft
houseval
popdens
_cons
.1274304
-.1631202
-.5152946
.1456129
2.289081
.0813587
.0843848
.0849965
.0257634
.1415551
1.57
-1.93
-6.06
5.65
16.17
0.117
0.053
0.000
0.000
0.000
-.0320297
-.3285114
-.6818845
.0951176
2.011638
.2868905
.002271
-.3487046
.1961081
2.566524
eta_urbanp~g
minorityleft
noleft
houseval
popdens
_cons
.234597
.0302709
-.1766753
.1601436
.9790566
.1064367
.1142185
.0856439
.0417171
.1526485
2.20
0.27
-2.06
3.84
6.41
0.028
0.791
0.039
0.000
0.000
.0259849
-.1935932
-.3445343
.0783796
.6798709
.443209
.254135
-.0088163
.2419076
1.278242
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
. dfmlogit, at(minorityleft 0 noleft 0 )
discrete
change
Min --> Max
coef.
se
+-SD/2
coef.
se
+-1/2
coef.
se
governing
minorityleft
noleft
houseval
popdens
-.0138
.0056
.1036
-.0525
.0066
.0072
.0258
.0085
.0124
-.0103
.0027
.0021
.0317
-.011
.0069
.0022
safety
minorityleft
noleft
houseval
popdens
.0065
.0252
.0847
-.0839
.0064
.0069
.0184
.0105
.0123
-.016
.0024
.0024
.0313
-.017
.0062
.0025
social
minorityleft
noleft
houseval
popdens
-.0122
-.0513
-.2721
.0792
.0128
.0129
.0458
.0315
-.0377
.016
.007
.0045
-.0963
.017
.0177
.0047
urbanplann~g
minorityleft
noleft
houseval
popdens
.0196
.0205
.0838
.0572
.014
.0152
.0422
.038
.0131
.0103
.0055
.0057
.0333
.011
.0141
.0061
Maarten L. Buis
Analyzing Proportions
A single proportion
Multiple proportions
example
Marginal
Effects
MFX at x
coef.
se
governing
houseval
popdens
.0317
-.011
.0065
.0024
safety
houseval
popdens
.0314
-.017
.0061
.0026
social
houseval
popdens
-.0966
.017
.0178
.0047
urbanplann~g
houseval
popdens
.0335
.011
.0139
.006
E(governing|x) =
E(safety|x) =
E(social|x) =
E(urbanplann~g|x) =
.098
.1699
.5046
.2276
x
0
0
1.483
.7839
mean
.4337
.3878
1.483
.7839
minorityleft
noleft
houseval
popdens
sd
.4962
.4879
.3902
.9408
min
0
0
.72
.025
max
1
1
3.63
5.711
Maarten L. Buis
Analyzing Proportions
Summary
I
I
Proportions are bounded: regress won’t work well.
one proportion:
I
I
I
I
no 0s and/or 1s: betafit or fractional logit
0s and/or 1s: zoib or fractional logit
interest in variance: fractional logit won’t work
multiple proportions:
I
I
relationship between these proportions: no solution in Stata
(yet)
relationship between mean proportions and explanatory
variables: dirifit or fmlogit
Maarten L. Buis
Analyzing Proportions
References
Aitchison, J.
The Statistical Analysis of Compositional Data.
Caldwell, NJ: The Blackburn Press, 2003 [1986].
Ferrari, S.L.P. and Cribari-Neto, F.
Beta regression for modelling rates and proportions.
Journal of Applied Statistics, 31(7): 799–815, 2004.
Paolino, P.
Maximum likelihood estimation of models with beta-distributed dependent variables.
Political Analysis, 9(4): 325–346, 2001.
Papke, L.E. and Wooldridge, J.M.
Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation
Rates.
Journal of Applied Econometrics, 11(6):619–632, 1996.
Smithson, M. and Verkuilen, J.
A better lemon squeezer? Maximum likelihood regression with beta-distributed dependent variables.
Psychological Methods, 11(1):54–71, 2006.
Maarten L. Buis
Analyzing Proportions
Download