A single proportion Multiple proportions Analyzing Proportions Maarten L. Buis Institut für Soziologie Eberhard Karls Universität Tübingen www.maartenbuis.nl Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions The problem I A proportion is bounded between 0 and 1, this means that: I I I the effect of explanatory variables tends to be non-linear, and the variance tends to decrease when the mean gets closer to one of the boundaries. This makes linear regression unattractive. Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions Solutions I model the distribution of the dependent variable(s) with either I I I I a beta distribution, betafit a zero/one inflated beta distribution, zoib a Dirichlet distribution, dirifit model how the mean proportion relates to explanatory variables using I I a fractional logit, glm a fractional multinomial logit, fmlogit Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions the beta distribution I A flexible distribution bounded between 0 and 1 (excluding 0 and 1) I Two parameters: the mean and a scale parameter. I The variance is a function of the mean and the scale parameter: the variance is largest when the mean is 0.5. Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions Some pictures µ=.5, φ= 4 σ=.22 6 density 4 2 0 0 .2 .4 .6 y Maarten L. Buis Analyzing Proportions .8 1 A single proportion Multiple proportions Some pictures µ=.5, φ= 6 σ=.19 6 density 4 2 0 0 .2 .4 .6 y Maarten L. Buis Analyzing Proportions .8 1 A single proportion Multiple proportions Some pictures µ=.5, φ= 10 σ=.15 6 density 4 2 0 0 .2 .4 .6 y Maarten L. Buis Analyzing Proportions .8 1 A single proportion Multiple proportions Some pictures µ=.5, φ= 20 σ=.11 6 density 4 2 0 0 .2 .4 .6 y Maarten L. Buis Analyzing Proportions .8 1 A single proportion Multiple proportions Some pictures µ=.25, φ= 20 σ=.09 6 density 4 2 0 0 .2 .4 .6 y Maarten L. Buis Analyzing Proportions .8 1 A single proportion Multiple proportions Some pictures µ=.15, φ= 20 σ=.08 6 density 4 2 0 0 .2 .4 .6 y Maarten L. Buis Analyzing Proportions .8 1 A single proportion Multiple proportions Some pictures µ=.5, φ= 2 σ=.29 6 density 4 2 0 0 .2 .4 .6 y Maarten L. Buis Analyzing Proportions .8 1 A single proportion Multiple proportions Some pictures µ=.5, φ= 1 σ=.35 6 density 4 2 0 0 .2 .4 .6 y Maarten L. Buis Analyzing Proportions .8 1 A single proportion Multiple proportions betafit I Fits a beta distribution, where the mean and scale parameter are functions of explanatory variables I Various types of partial and marginal effects: dbetafit I Can be installed by typing in Stata ssc install betafit Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear (Spending on different categories by Dutch cities in 2005) . betafit governing , mu(minorityleft noleft houseval popdens) nolog ML fit of beta (mu, phi) Number of obs = Wald chi2(4) = Log likelihood = 768.06704 Prob > chi2 = Std. Err. z governing Coef. minorityleft noleft houseval popdens _cons -.102143 .1047123 .2970051 -.1247097 -2.607601 .059603 .0611709 .0483488 .0262695 .0856001 -1.71 1.71 6.14 -4.75 -30.46 0.087 0.087 0.000 0.000 0.000 -.2189627 -.0151804 .2022432 -.176197 -2.775374 .0146768 .2246049 .391767 -.0732223 -2.439828 /ln_phi 4.168403 .0715323 58.27 0.000 4.028202 4.308604 phi 64.61219 4.621857 56.15986 74.33662 Maarten L. Buis P>|z| 394 109.99 0.0000 [95% Conf. Interval] Analyzing Proportions A single proportion Multiple proportions example . dbetafit, at(minorityleft 0 noleft 0) discrete change minorityleft noleft houseval popdens Min --> Max coef. se -.0084 .0094 .0879 -.0494 Marginal Effects houseval popdens .0047 .0057 .0217 .0075 .0101 -.0099 MFX at x coef. se .0254 -.0107 E(governing|x) = minorityleft noleft houseval popdens +-SD/2 coef. se .0022 .0019 +-1/2 coef. .0255 -.0107 se .0056 .0021 Max MFX coef. se .0056 .0021 .0743 -.0312 .0121 .0066 .0945 x 0 0 1.492 .7629 mean .434 .3858 1.492 .7629 sd .4963 .4874 .3971 .9303 min 0 0 .72 .025 max 1 1 3.63 5.711 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions What about 0s and 1s? I I betafit ignores 0s and 1s. If we want to include those, we have to make a decision about how those 0s and 1s came about: I I I I 0s and 1s represent very low or very high proportions that “by accident” resulted in a proportion of 0 or 1. Implies a fractional logit, which in Stata can be estimated using glm. 0s and 1s represent distinct processes Implies a zero-one inflated beta, which in Stata can be estimated using zoib I Alternatively, you can transform your dependent variable to “push” your 0s and 1s a tiny bit inwards I Smithson and Verkuilen (2006) propose y’ = (y*(N - 1) + .5)/N Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions Fractional logit I I 0s and 1s occur through the same process as the other proportions Only models the mean, this means: I I I less sensitive to errors in other parts of the model, e.g. the variance, but not suitable when interest is in other quantities than the mean, e.g. the variance Can be estimated with glm in combination with the link(logit) family(binomial) robust options. Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . use "http://fmwww.bc.edu/repec/bocode/k/k401.dta", clear (source: Papke and Wooldridge 1996) . replace totemp = totemp/10000 (4734 real changes made) . glm prate mrate totemp age sole, /// > family(binomial) link(logit) vce(robust) nolog note: prate has noninteger values Generalized linear models Optimization : ML No. of obs Residual df Scale parameter (1/df) Deviance (1/df) Pearson [Binomial] [Logit] AIC BIC Deviance = 1023.737134 Pearson = 1377.971352 Variance function: V(u) = u*(1-u/1) Link function : g(u) = ln(u/(1-u)) Log pseudolikelihood = -1366.491144 prate Coef. mrate totemp age sole _cons .5734427 -.0577987 .0308946 .3635964 1.074062 Robust Std. Err. .0799253 .011467 .0027881 .0476003 .0489076 z 7.17 -5.04 11.08 7.64 21.96 Maarten L. Buis P>|z| 0.000 0.000 0.000 0.000 0.000 = = = = = 4734 4729 1 .2164807 .2913875 = .5794217 = -38995.55 [95% Conf. Interval] .416792 -.0802736 .0254301 .2703017 .9782051 .7300934 -.0353238 .0363591 .4568912 1.169919 Analyzing Proportions A single proportion Multiple proportions example . mfx, at(mean sole=0) Marginal effects after glm y = Predicted mean prate (predict) = .86775841 variable mrate totemp age sole* dy/dx .0658047 -.0066326 .0035453 .0364495 Std. Err. .00803 .00132 .00033 .00471 z 8.19 -5.02 10.69 7.73 P>|z| [ 0.000 0.000 0.000 0.000 .050058 .081551 -.009224 -.004041 .002895 .004195 .027209 .04569 95% C.I. ] X .746335 .462107 13.1398 0 (*) dy/dx is for discrete change of dummy variable from 0 to 1 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions zoib: zero one inflated beta I A zero/one inflated beta model consists of three parts: I I I I a logistic regression model for whether or not the proportion equals 0, a logistic regression model for whether or not the proportion equals 1, a beta model for the proportions between 0 and 1. This model is for situations where you believe that the decisions for proportions of 0 and/or 1 are governed by a different process as the other proportions. Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . zoib prate mrate totemp age sole, /// > oneinflate( mrate totemp age sole) robust nolog ML fit of oib Number of obs Wald chi2(4) Log pseudolikelihood = -1293.6594 Prob > chi2 Robust Std. Err. z P>|z| = = = 4734 136.47 0.0000 prate Coef. proportion mrate totemp age sole _cons [95% Conf. Interval] .1524644 -.0265332 .0216248 .0604715 .8738362 .0466905 .0092522 .0020206 .0376378 .0354738 3.27 -2.87 10.70 1.61 24.63 0.001 0.004 0.000 0.108 0.000 .0609527 -.0446673 .0176645 -.0132972 .8043088 .243976 -.0083992 .0255852 .1342402 .9433636 oneinflate mrate totemp age sole _cons .7935556 -.1416409 .020835 .9044132 -1.472011 .0653962 .0354509 .003494 .0654829 .0702084 12.13 -4.00 5.96 13.81 -20.97 0.000 0.000 0.000 0.000 0.000 .6653814 -.2111235 .0139869 .7760692 -1.609617 .9217297 -.0721584 .0276832 1.032757 -1.334405 1.77591 .0358677 49.51 0.000 1.705611 1.84621 ln_phi _cons Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . mfx, predict(pr) at(mean sole = 0) Marginal effects after zoib y = Proportion (predict, pr) = .85369833 variable mrate totemp age sole* dy/dx .0566366 -.0100315 .0034952 .053115 Std. Err. .00679 .00208 .00031 .0047 z 8.34 -4.83 11.37 11.31 P>|z| [ 0.000 0.000 0.000 0.000 .043326 .069947 -.014104 -.005959 .002893 .004098 .043908 .062322 95% C.I. ] X .746335 .462107 13.1398 0 (*) dy/dx is for discrete change of dummy variable from 0 to 1 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . mfx, predict(pr1) at(mean sole = 0) Marginal effects after zoib y = probability of having value 1 (predict, pr1) = .33817515 variable mrate totemp age sole* dy/dx .1776078 -.031701 .0046631 .2198069 Std. Err. .01555 .00796 .00078 .01548 z 11.42 -3.98 6.01 14.20 P>|z| [ 0.000 0.000 0.000 0.000 .147125 .208091 -.047304 -.016098 .003143 .006184 .189476 .250138 95% C.I. ] X .746335 .462107 13.1398 0 (*) dy/dx is for discrete change of dummy variable from 0 to 1 . . mfx, predict(prcond) at(mean sole = 0) Marginal effects after zoib y = proportion conditional on not having value 0 or 1 (predict, prcond) = .778942 variable mrate totemp age sole* dy/dx .0262531 -.0045688 .0037236 .0102369 Std. Err. .00781 .0016 .00035 .00632 z 3.36 -2.86 10.57 1.62 P>|z| [ 0.001 0.004 0.000 0.105 .010948 .041558 -.007698 -.001439 .003033 .004414 -.002149 .022623 95% C.I. ] X .746335 .462107 13.1398 0 (*) dy/dx is for discrete change of dummy variable from 0 to 1 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions Comparing models beta mrate 0.027 (3.30) beta with transformed y 0.033 (17.20) totemp -0.005 (-2.86) -0.006 (-5.23) -0.007 (-5.02) -0.010 (-4.83) age 0.004 (10.54) 0.001 (8.38) 0.004 (10.69) 0.003 (11.37) sole (d) 0.011 (1.62) 2711 0.038 (13.74) 4734 0.036 (7.73) 4734 0.053 (11.31) 4734 N flogit zoib 0.066 (8.19) 0.057 (8.34) Marginal effects; z statistics in parentheses (d) for discrete change of dummy variable from 0 to 1 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions types of questions I How are the proportions related to one another? I I I I I Proportions are automatically (negatively) correlated: if you spent more on one thing, there is less left over for the rest. The question is how much association between proportions exist nett of this automatic correlation. Literature exists on this question, most notably Aitchinson (2003 [1986]). Have not been implemented in Stata. How are the proportions related to explanatory variables? I Two options: I I I dirifit: Fits a Dirichlet distribution, which is an extension of the beta distribution to multiple proportions. fmlogit: Fits a fractional multinomial logit, which is an extension of the fractional logit to multiple proportions. Both assume that all correlation between proportions is due to the ‘automatic correlation’ Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . use http://fmwww.bc.edu/repec/bocode/c/citybudget.dta, clear (Spending on different categories by Dutch cities in 2005) . replace social = social + education + recreation (395 real changes made) . dirifit governing safety social urbanplanning, /// > muvar(minorityleft noleft houseval popdens) nolog ML fit of Dirichlet (mu, phi) Number of obs Wald chi2(12) Log likelihood = 1725.1477 Prob > chi2 Coef. Std. Err. z P>|z| = = = 392 189.54 0.0000 [95% Conf. Interval] mu2 minorityleft noleft houseval popdens _cons .1215461 .0423453 -.104733 .0030234 .7199195 .0900454 .0925709 .0729776 .0385816 .129466 1.35 0.46 -1.44 0.08 5.56 0.177 0.647 0.151 0.938 0.000 -.0549397 -.1390903 -.2477665 -.0725951 .4661708 .2980319 .2237809 .0383004 .0786419 .9736682 mu3 minorityleft noleft houseval popdens _cons .0492506 -.2055446 -.4818642 .1282351 2.237157 .078676 .0813187 .0694126 .0331747 .1187476 0.63 -2.53 -6.94 3.87 18.84 0.531 0.011 0.000 0.000 0.000 -.1049516 -.3649264 -.6179105 .063214 2.004415 .2034528 -.0461629 -.3458179 .1932563 2.469898 mu4 minorityleft noleft houseval popdens _cons .1530504 -.0205669 -.1563326 .1285933 .9378096 .0863215 .0891623 .070531 .0354878 .1247067 1.77 -0.23 -2.22 3.62 7.52 0.076 0.818 0.027 0.000 0.000 -.0161367 -.1953218 -.2945709 .0590385 .693389 .3222375 .1541879 -.0180943 .1981481 1.18223 /ln_phi 3.60327 .0405736 88.81 0.000 3.523747 3.682792 phi 36.71809 1.489786 33.91125 39.75726 mu2 = safety mu3 = social mu4 = urbanplanning base outcome = governing Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . ddirifit, at(minorityleft 0 noleft 0 ) discrete change Min --> Max coef. se +-SD/2 coef. se +-1/2 coef. se governing minorityleft noleft houseval popdens -.0078 .0099 .0937 -.0461 .0066 .0074 .0233 .0115 .0115 -.0087 .0024 .0027 .0293 -.0093 .0062 .0029 safety minorityleft noleft houseval popdens .0072 .0257 .0926 -.0792 .0088 .0096 .0254 .0152 .013 -.0149 .003 .0035 .0333 -.0159 .0077 .0037 social minorityleft noleft houseval popdens -.0159 -.0527 -.264 .0865 .0114 .012 .0304 .0251 -.0366 .0164 .0045 .0042 -.0935 .0174 .0114 .0045 urbanplann~g minorityleft noleft houseval popdens .0165 .0171 .0777 .0387 .0097 .0102 .0265 .0219 .0121 .0073 .0033 .0034 .0309 .0078 .0085 .0036 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example Marginal Effects MFX at x coef. se governing houseval popdens .0293 -.0093 .0061 .0029 safety houseval popdens .0334 -.0159 .0077 .0037 social houseval popdens -.0937 .0174 .0115 .0045 .031 .0078 .0085 .0035 urbanplann~g houseval popdens E(governing|x) = E(safety|x) = E(social|x) = E(urbanplann~g|x) = .0993 .175 .5032 .2225 x 0 0 1.483 .7839 mean .4337 .3878 1.483 .7839 minorityleft noleft houseval popdens sd .4962 .4879 .3902 .9408 min 0 0 .72 .025 max 1 1 3.63 5.711 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . fmlogit governing safety social urbanplanning, /// > eta(minorityleft noleft houseval popdens) nolog ML fit of fractional multinomial logit Number of obs Wald chi2(12) Prob > chi2 Log pseudolikelihood = -480.22927 Coef. Robust Std. Err. z P>|z| = = = 392 232.02 0.0000 [95% Conf. Interval] eta_safety minorityleft noleft houseval popdens _cons .1893804 .0826287 -.1389243 .0115828 .7472656 .0596091 .0617178 .0555301 .021217 .0920767 3.18 1.34 -2.50 0.55 8.12 0.001 0.181 0.012 0.585 0.000 .0725486 -.038336 -.2477614 -.0300018 .5667985 .3062121 .2035934 -.0300872 .0531673 .9277326 eta_social minorityleft noleft houseval popdens _cons .1274304 -.1631202 -.5152946 .1456129 2.289081 .0813587 .0843848 .0849965 .0257634 .1415551 1.57 -1.93 -6.06 5.65 16.17 0.117 0.053 0.000 0.000 0.000 -.0320297 -.3285114 -.6818845 .0951176 2.011638 .2868905 .002271 -.3487046 .1961081 2.566524 eta_urbanp~g minorityleft noleft houseval popdens _cons .234597 .0302709 -.1766753 .1601436 .9790566 .1064367 .1142185 .0856439 .0417171 .1526485 2.20 0.27 -2.06 3.84 6.41 0.028 0.791 0.039 0.000 0.000 .0259849 -.1935932 -.3445343 .0783796 .6798709 .443209 .254135 -.0088163 .2419076 1.278242 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example . dfmlogit, at(minorityleft 0 noleft 0 ) discrete change Min --> Max coef. se +-SD/2 coef. se +-1/2 coef. se governing minorityleft noleft houseval popdens -.0138 .0056 .1036 -.0525 .0066 .0072 .0258 .0085 .0124 -.0103 .0027 .0021 .0317 -.011 .0069 .0022 safety minorityleft noleft houseval popdens .0065 .0252 .0847 -.0839 .0064 .0069 .0184 .0105 .0123 -.016 .0024 .0024 .0313 -.017 .0062 .0025 social minorityleft noleft houseval popdens -.0122 -.0513 -.2721 .0792 .0128 .0129 .0458 .0315 -.0377 .016 .007 .0045 -.0963 .017 .0177 .0047 urbanplann~g minorityleft noleft houseval popdens .0196 .0205 .0838 .0572 .014 .0152 .0422 .038 .0131 .0103 .0055 .0057 .0333 .011 .0141 .0061 Maarten L. Buis Analyzing Proportions A single proportion Multiple proportions example Marginal Effects MFX at x coef. se governing houseval popdens .0317 -.011 .0065 .0024 safety houseval popdens .0314 -.017 .0061 .0026 social houseval popdens -.0966 .017 .0178 .0047 urbanplann~g houseval popdens .0335 .011 .0139 .006 E(governing|x) = E(safety|x) = E(social|x) = E(urbanplann~g|x) = .098 .1699 .5046 .2276 x 0 0 1.483 .7839 mean .4337 .3878 1.483 .7839 minorityleft noleft houseval popdens sd .4962 .4879 .3902 .9408 min 0 0 .72 .025 max 1 1 3.63 5.711 Maarten L. Buis Analyzing Proportions Summary I I Proportions are bounded: regress won’t work well. one proportion: I I I I no 0s and/or 1s: betafit or fractional logit 0s and/or 1s: zoib or fractional logit interest in variance: fractional logit won’t work multiple proportions: I I relationship between these proportions: no solution in Stata (yet) relationship between mean proportions and explanatory variables: dirifit or fmlogit Maarten L. Buis Analyzing Proportions References Aitchison, J. The Statistical Analysis of Compositional Data. Caldwell, NJ: The Blackburn Press, 2003 [1986]. Ferrari, S.L.P. and Cribari-Neto, F. Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7): 799–815, 2004. Paolino, P. Maximum likelihood estimation of models with beta-distributed dependent variables. Political Analysis, 9(4): 325–346, 2001. Papke, L.E. and Wooldridge, J.M. Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates. Journal of Applied Econometrics, 11(6):619–632, 1996. Smithson, M. and Verkuilen, J. A better lemon squeezer? Maximum likelihood regression with beta-distributed dependent variables. Psychological Methods, 11(1):54–71, 2006. Maarten L. Buis Analyzing Proportions