Solution 2005 (postponed)

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON4135 - Applied statistics and econometrics, fall 2005, continuation exam Date of exam: Monday, January 16, 2006 Time for exam: 2:30 p.m. – 5:30 p.m. The problem set covers 4 pages Resources allowed:  All written and printed resources, as well as calculators, are allowed Grades given: A (best), B, C, D, E and F, with E as the weakest passing grade. Comments in arial font Scientific journals constitute the medium of communication between scientists, and also the memory (storage) of science. The economics of (scientific) journals is interesting. Bergstrom 1 argues that journals owned by private publishers are grossly overpriced, and he recommends several actions to reduce the large profits made by these publishers. Bergstrom provides data to substantiate his case. There are 180 economic journals in his database, of which 16 are published by scholarly societies such as the American Economic Association. These 16 journals are published on a non-profit basis, as opposed to the remaining journals that have private publishers. We shall particularly be interested in the separation between society journals and privately published journals. Consider the variables: P : Library subscription price for the journal per year (USD). Y : Number of libraries subscribing to the journal. C : Total number of times papers in the journal were cited in 1998. A : Age of the journal (years). N : Number of pages in the journal in 1998. S : Binary variable (dummy); 1 if non-profit (scholarly society), 0 otherwise. 1. Figure 1. shows dummy S for society journal plotted against age A . Would you think that age of journals is normally distributed within the two groups of journals? Explain what it would mean that A and S are stochastically independent. Would E  A | S  1  E  A | S  0 if A and S are independent? No, the distribution seems skewed, with a long tail towards high ages, particularly for privately published journals. Yes, since the conditional densities of A given S  s for the two values of s are equal (and equal the marginal density), the conditional expected values must be equal. 1 Bergstrom, T.C. 2000. Free Labor for Costly Journals? Journal of Economic Perspectives. 15: 183-198. 2 2. To estimate the mean journal age in the two groups one could consider the regression presented in R1 below. What is the estimated mean age for privately owned journals, and what is it for society journals? It is of interest to estimate the difference in age distribution between the two groups. What is the p-value for testing the null hypothesis H0 : E  A | S  1  E  A | S  0 , versus a two-sided alternative? What would the p-value be for testing H 0 versus the one-sided alternative H1 : E ( A | S  1)  E ( A | S  0) ? Can you find a 95% confidence interval for the difference in mean age? Let s  E ( A | S  s) . Then, ˆ 0  33.98 years, and ˆ1  33.98  12.52  46.50 years. The two-sided p-value is 0.032, and the one-sided is half of that. The 95% confidence interval for 1  0 is 1.12; 23.91 years. 3. Regression R2 is similar to R1, but now the response variable is LA  log( A) . Histograms are shown in Figure 2. Calculate estimated mean log age in the two groups of journals. Do your results agree with those in point 2? If you now want to test the independence between A and S you might test H0 : E  LA | S  1  E  LA | S  0  versus a two-sided alternative. What is the p-value? Why is it different from what you found in point 2? Would you prefer to compare age between the two groups on the arithmetic scale (age in years) or on the logarithmic scale? The estimated mean log ages are 3.3149=log(27.52) for private and 3.7223=log(41.36) for society journals (in log year units). The two sets of results do not agree quite – estimated centre of distribution on the arithmetic scale is systematically higher than that obtained from the mean log age – because log(a) is a concave function of a, and thus E (log( A))  log( E ( A)) by Jensen’s inequality. The two-sided p-value is now 0.003. I would rather use the logarithmic scale because the distribution is more symmetric on that scale (Figure 2), and the mean is thus a more meaningful measure of the distribution centre. Also, the two conditional distributions separates better on the log scale, as indicated by the two-sided p-value being less. 4. If now the issue is to find determinants for what makes a journal being society published rather than privately published, logistic regression might be useful. Regression R3 shows the result of fitting the equation P(S  1)  F  0  1LA where F ( y )  1/ 1  e  y  is the cumulative logistic distribution function. Explain why the estimated probability of a one year old journal being society published is 0.00275. What is the estimated probability of a hundred year old journal being society published? What is the age which would make the probability about ½ for the journal being society published? Note that natural logarithms are used. For A  1 , LA  0 and P(S  1)  F  0  , which is estimated as 1/ 1  e  ( 5.5916)   0.0275 . log(100)  4.605 . Therefore, the estimated probability is 1/ 1  e ( 5.891597 1.013834.60517   0.2268 . To get the estimated probability equal to ½, the exponent must be 0. That is, log( A)  5.891597 /1.01383  5.8147 and A  335.2 years, which is well outside the range of the data. 2 3 5. A more complex logistic regression is shown in R4, where LY  log(Y ) and LN  log( N ) etc. How would you explain to your fellow economist who has never heard of logistic regression what these results mean? Are the data markedly better fitted by this regression than by R3? I would only try to explain what we can read out from the signs of the estimated regression coefficients. Everything else being constant, if the age increases, the probability is reduced; if the subscription increases, it also is reduced; but if the number of pages increases, the probability increases; as it does when the number of citations increases. The most important determinant is the price, and pari pasu, if the price increases the probability of the journal being non-profit increases – as expected. In R3, LA is clearly significant. But not in model R4, presumably since LA, LY , LN , LC are quite strongly correlated. None of these have a significant logistic regression effect on their own, but as a collective they are strongly significant with p-value 0.0058. The data certainly fits the data better, with log likelihood increased by 13 units on 4 extra parameters. The over-all significance of the set of covariates is increased, with p-value decreased from 0.0144 to 0.0000. 6. Let the odds of the probability of a journal being society published be O  P( S  1) / P ( S  0) . Show that log(O)  0  1LA  2 LY  3 LN  4 LC  5 LP when P(S  1)  F  0  1LA  2 LY  3 LN  4 LC  5 LP  . Could you interpret  5 as the price elasticity of the odds for a journal being society published? What is the 95% confidence interval for the price elasticity on the odds?    log 1/ 1  e y  / 1 1/ 1  e y   log  e y   y . Thus the expression for log(O) , and d d log(O)  5 log( P) .  5 is therefore the price elasticity of the odds for being dP dP society- rather than privately published. The )5% confidence interval is given in R4, and is  2.27;  0.61 . 7. Explain, with your statistically ignorant fellow economist in mind, what is meant by the interval having degree of confidence 95%. Our sample of 180 economics journals is about the total population of English language journals in the field of economics. The sample can therefore not rightly be thought of as a random sample from a big population. Is this a problem for your interpretation of the confidence interval? If the experiment was repeated over again many times, and if the same method was used to calculate the 95% interval, it would cover the true value in 95% of the replicates in the long run. But what should the experiment be (my dear friend)? The history of the field of economics has had its realization so far, with its 180 journals in the English language. You can probably not envisage the development being repeated independently a large number of times. But you might think hypothetically. If the logistic model is true – in the hypothetical sense that Age, subscription etc developed as they did, but that for each journal a coin is flipped to determine whether it should be an S  1 or 3 4 an S  0 journal, with success probability determined by the logistic equation, then replicated data from the assumed model can be simulated. And if simulated a good many time, the 95% intervals calculated by the method will – if Stata is right – cover the true value of the parameter in about 95% of the replications. From this explanation, it is really no problem that the sample is the complete population of economics journals in the English language. R1 Regression with robust standard errors Number of obs F( 1, 178) Prob > F R-squared Root MSE = = = = = 180 4.70 0.0315 0.0193 25.534 -----------------------------------------------------------------------------| Robust Age A | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------Society S | 12.51829 5.774714 2.17 0.032 1.122582 23.914 _cons | 33.98171 2.02106 16.81 0.000 29.99339 37.97003 ------------------------------------------------------------------------------ R2 Regression with robust standard errors Number of obs F( 1, 178) Prob > F R-squared Root MSE = = = = = 180 9.05 0.0030 0.0335 .6261 -----------------------------------------------------------------------------| Robust Log age LA | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------S | .4074081 .1354304 3.01 0.003 .1401523 .6746639 _cons | 3.314908 .0497229 66.67 0.000 3.216786 3.413031 ------------------------------------------------------------------------------ R3 Logit estimates Number of obs LR chi2(1) Prob > chi2 Pseudo R2 Log likelihood = -51.000589 = = = = 180 5.98 0.0144 0.0554 -----------------------------------------------------------------------------S | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------LA | 1.01383 .4217932 2.40 0.016 .1871301 1.840529 _cons | -5.891597 1.571597 -3.75 0.000 -8.97187 -2.811323 R4 Logit estimates Number of obs LR chi2(5) Prob > chi2 Pseudo R2 Log likelihood = -37.935238 = = = = 180 32.11 0.0000 0.2974 -----------------------------------------------------------------------------S | Coef. Std. Err. z P>|z| [95% Conf. Interval] 4 5 -------------+---------------------------------------------------------------LA | -.0690083 .5582708 -0.12 0.902 -1.163199 1.025182 LY | -.4324953 .4960825 -0.87 0.383 -1.404799 .5398084 LN | 1.825738 .9343176 1.95 0.051 -.0054905 3.656967 LC | .6336932 .4133132 1.53 0.125 -.1763857 1.443772 LP | -1.438341 .4243647 -3.39 0.001 -2.270081 -.6066017 _cons | -8.368748 5.135016 -1.63 0.103 -18.43319 1.695699 -----------------------------------------------------------------------------. test LC LN LA LY 1) 2) 3) 4) LC LN LA LY = = = = 0 0 0 0 14.54 0.0058 .2 .4 S .6 .8 1 chi2( 4) = Prob > chi2 = 0 ( ( ( ( 0 50 100 150 A Figure 1. Dummy S for society journal plotted against age A . 5 6 1 .5 0 Density 1 0 2 3 4 5 2 3 4 5 LA Graphs by S Figure 2. Histogram of log age by journal type (privately published to the left) 6

Solution 2005 (postponed)

Related documents

Products

Support

Solution 2005 (postponed)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib