Javier Aparicio July 27, 2001 Regressions with Censored data: Tobit Estimation of PAC contributions (use PAC_tobit.dta) We will estimate the amount of AFL-CIO PAC contributions as being dependent on Congressmen membership in the Budget (V19) or Ways and Means Committees (V35), the party affiliation of candidates (PARTY), seniority (SENIOR), the 1990 vote obtained (VOTE90), and in the ideological “distance” between the PAC and the given candidate (DISTANCE). The data has some lower (211 obs.) and some upper (20 obs.) censored observations at 0 and 5000 dollars, respectively. First, we will estimate a simple OLS model with the data “as is”, namely with a censored OLS model. Next we will exclude the censored observations—an OLS truncated model. Finally, we will estimate a Tobit model for the entire sample with both the lower and upper censoring cutoff points properly identified. . * MODEL 1. OLS CENSORED MODEL . reg contrib v19 v35 party senior vote90 distance Source | SS df MS Number of obs = 347 -------------+-----------------------------F( 6, 340) = 18.35 Model | 162478358 6 27079726.3 Prob > F = 0.0000 Residual | 501622144 340 1475359.25 R-squared = 0.2447 -------------+-----------------------------Adj R-squared = 0.2313 Total | 664100502 346 1919365.61 Root MSE = 1214.6 -----------------------------------------------------------------------------contrib | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------v19 | -377.864 229.575 -1.65 0.101 -829.4302 73.70213 v35 | -513.8839 260.6351 -1.97 0.049 -1026.544 -1.223525 party | -618.8379 295.288 -2.10 0.037 -1199.659 -38.01659 senior | -17.34461 8.680039 -2.00 0.046 -34.41795 -.271268 vote90 | -14.7655 4.655749 -3.17 0.002 -23.9232 -5.607803 distance | -341.3641 143.8918 -2.37 0.018 -624.3943 -58.33388 _cons | 2668.555 335.0562 7.96 0.000 2009.511 3327.599 -----------------------------------------------------------------------------. predict yhat_cen (option xb assumed; fitted values) In this first estimation, all variables affect significantly and negatively the amount of PAC contributions, with the exception of V19 (Budget Committee) which is only significant at the 10.1% level. This means that republican candidates, as well as more ideologically distant candidates, tend to receive less AFLCIO PAC contributions—an expected result. A bit more surprisingly, PAC contributions vary inversely with seniority and Budget and Ways and Means Committee membership: one would have expected that the more influential the candidates are (by accumulating seniority or important Committee positions), the higher their contributions. . * MODEL 2. OLS TRUNCATED MODEL . reg contrib v19 v35 party senior vote90 distance if contrib > 0 & contrib < 5000 Source | SS df MS Number of obs = 116 1 -------------+-----------------------------F( 6, 109) = 2.59 Model | 16989820.0 6 2831636.66 Prob > F = 0.0221 Residual | 119305154 109 1094542.70 R-squared = 0.1247 -------------+-----------------------------Adj R-squared = 0.0765 Total | 136294974 115 1185173.69 Root MSE = 1046.2 -----------------------------------------------------------------------------contrib | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------v19 | -452.5959 390.3753 -1.16 0.249 -1226.307 321.1152 v35 | -615.2362 486.9393 -1.26 0.209 -1580.334 349.8617 party | 361.2142 794.4016 0.45 0.650 -1213.264 1935.692 senior | -6.232589 11.99617 -0.52 0.604 -30.0086 17.54343 vote90 | -26.07335 7.879505 -3.31 0.001 -41.69027 -10.45642 distance | -347.8259 393.511 -0.88 0.379 -1127.752 432.1 _cons | 3396.926 575.6287 5.90 0.000 2256.049 4537.803 -----------------------------------------------------------------------------. predict yhat_tr (option xb assumed; fitted values) This second model excludes all censored observations, and hence discards a large proportion of the data. The result is that most variables turn out to be statistically insignificant and some even flip signs (like PARTY). The only variable that remains significantly and negatively affecting contributions is the vote outcome in the previous election cycle. . * MODEL 3. TOBIT model . tobit contrib v19 v35 party senior vote90 distance, ll ul Tobit estimates Number of obs = 347 LR chi2(6) = 198.43 Prob > chi2 = 0.0000 Log likelihood = -1152.4198 Pseudo R2 = 0.0793 -----------------------------------------------------------------------------contrib | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------v19 | -1441.746 658.317 -2.19 0.029 -2736.619 -146.8724 v35 | -2412.174 826.4691 -2.92 0.004 -4037.793 -786.555 party | -1275.197 1028.501 -1.24 0.216 -3298.202 747.8075 senior | -51.60711 21.7074 -2.38 0.018 -94.30437 -8.909856 vote90 | -41.40697 12.84263 -3.22 0.001 -66.66772 -16.14621 distance | -2913.713 617.6242 -4.72 0.000 -4128.545 -1698.88 _cons | 5616.36 948.0951 5.92 0.000 3751.509 7481.211 -------------+---------------------------------------------------------------_se | 2395.664 175.3974 (Ancillary parameter) -----------------------------------------------------------------------------Obs. summary: 211 left-censored observations at contrib<=0 116 uncensored observations 20 right-censored observations at contrib>=5000 . predict yhat_tob (option xb assumed; fitted values) In the Tobit estimation we use the entire sample one more time, but properly correct for the conditional distribution of censored observations. The results are somewhat similar to those of model 1 but the size of the coefficients is largely increased (more than a threefold increase in most cases). However, being a republican candidate is no longer a significant variable (see the PARTY coefficient). To illustrate the effect of Tobit parameter estimates relative to those of censored and truncated OLS models, we will generate fitted values for the 2 dependent variable for different levels of ideological distance, while keeping all the other explanatory variables fixed at their means. This will require some data manipulation for each model, as follows. (Note how miserable life was before Clarify existed!) . * OLS CENSORED MODEL . quietly reg contrib v19 v35 party senior vote90 distance . . . . . . . . . . . . . . . . . * Creating holding variables gen v19h = v19 gen v35h = v35 gen partyh = party gen seniorh = senior gen vote90h = vote90 gen distanceh = distance drop v19 v35 party senior vote90 * Generate hypothetical mean values & predict egen v19 = mean(v19h) egen v35 = mean(v35h) egen party = mean(partyh) egen senior = mean(seniorh) egen vote90 = mean(vote90h) predict y1, xb . * Drop the hypothetical mean values . drop v19 v35 party senior vote90 . * Bring back real variable values . gen v19 = v19h . gen v35 = v35h . gen party = partyh . gen senior = seniorh . gen vote90 = vote90h . . ** OLS TRUNCATED MODEL . quietly reg contrib v19 v35 party senior vote90 distance if contrib > 0 & contrib < 5000 . . * Substitute again for variable means & predict . drop v19 v35 party senior vote90 . egen v19 = mean(v19h) if contrib > 0 & contrib < 5000 (231 missing values generated) . egen v35 = mean(v35h) if contrib > 0 & contrib < 5000 (231 missing values generated) . egen party = mean(partyh) if contrib > 0 & contrib < 5000 (231 missing values generated) . egen senior = mean(seniorh) if contrib > 0 & contrib < 5000 (231 missing values generated) . egen vote90 = mean(vote90h) if contrib > 0 & contrib < 5000 (231 missing values generated) . predict y2, xb (231 missing values generated) . * Drop the hypothetical mean values . drop v19 v35 party senior vote90 . * Bring back real variable values . gen v19 = v19h . gen v35 = v35h 3 . . . . . . gen party = partyh gen senior = seniorh gen vote90 = vote90h ** TOBIT model quietly tobit contrib v19 v35 party senior vote90 distance, ll ul . drop v19 v35 party senior vote90 . . . . . . . . * Generate hypothetical mean values & predict egen v19 = mean(v19h) egen v35 = mean(v35h) egen party = mean(partyh) egen senior = mean(seniorh) egen vote90 = mean(vote90h) predict y3, xb . * Drop the hypothetical mean values . drop v19 v35 party senior vote90 . . . . . . . . * Bring back real variable values gen v19 = v19h gen v35 = v35h gen party = partyh gen senior = seniorh gen vote90 = vote90h gen index=0 . graph y1 y2 y3 index distance, s(opd.) c(...l) key1(s(o) "censored yhat") key2(s(p) "truncated yhat") key3(s(d) "tobit yhat") censored yhat tobit yhat truncated yhat 2028.55 -12638.6 .0036 4.23125 distance As the graph above indicates, Tobit estimates of the negative impact of ideological distance are much stronger than those of censored and truncated 4 OLS. The Tobit fitted values have a more negative slope than those of the OLS models. (The horizontal line in the graph indicates the zero contribution level.) * MODEL 4. PROBIT ESTIMATION . probit give v19 v35 party senior vote90 distance Probit estimates Number of obs = 347 LR chi2(6) = 209.50 Prob > chi2 = 0.0000 Log likelihood = -127.60213 Pseudo R2 = 0.4508 -----------------------------------------------------------------------------give | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------v19 | -.6317307 .3198683 -1.97 0.048 -1.258661 -.0048003 v35 | -.9641854 .3737206 -2.58 0.010 -1.696664 -.2317066 party | -.1928536 .4614771 -0.42 0.676 -1.097332 .711625 senior | -.028079 .0111763 -2.51 0.012 -.0499841 -.0061738 vote90 | -.0167377 .0064355 -2.60 0.009 -.0293511 -.0041244 distance | -1.618576 .3011001 -5.38 0.000 -2.208721 -1.028431 _cons | 2.713734 .4979086 5.45 0.000 1.737851 3.689617 -----------------------------------------------------------------------------note: 2 failures and 0 successes completely determined. The probit estimation of model 3 is roughly similar to the previous Tobit results. The party variable remains insignificant while the other variables still affect negatively and significantly the probability of giving a contribution. That is to say, the substantive results from model 3 and model 4 do not differ. 5