POLS 7012 Spring 2008 HYPOTHESIS TESTING Topic: Confidence intervals and hypothesis testing for means and proportions. Concepts of sampling and random data generation. STATA commands and features: ci, sample, ttest, prtest Data set: wvs.dta, taken from the World Values Survey 1991. More information: http://www.worldvaluessurvey.org/services/index.html Readings: Alan Agresti and Barbara Finlay (1997). Statistical Methods for the Social Sciences, 3rd ed. Upper Saddle River, NJ: Prentice Hall. [CHAPTERS 4 -6] 1. HYPOTHESIS TESTING FOR MEANS This week we look the issues surrounding hypothesis testing and sampling. Much of the theory is covered in the ‘Statistical Methods for the Social Sciences’ lectures, and the textbooks, and these notes concentrate on the STATA commands. When we generate a sample mean in STATA it is an unbiased estimate of μ, the population mean. But it will certainly not be a perfect estimate since we are using a sample and not the entire population. Therefore if we took another sample we would expect the mean in this sample to be a little different; by chance, the mean we estimate is bound to be at least either a little bit higher or a little bit lower than the population mean. For the estimate of the mean to be of value, we must have some idea of how precise it is. That is, how close to the population mean is the sample mean estimate likely to be? This is most commonly done by generating a confidence interval around the sample mean. Confidence intervals are calculated according to a particular method so that, under repeated sampling, the population parameter of interest (e.g. the mean) is contained in the confidence interval with a given probability. So, for instance, the population mean lies within the ‘95% confidence interval’ in 95% of random samples. This is not the same as saying that a 95% confidence interval contains the population mean with probability .95, although this is a common misinterpretation. Most often people use either the 95% or 99% confidence interval. The process of generating confidence intervals for means in STATA is simple. We use the .ci (confidence interval) command: . ci [varlist] [weight] [if exp] [in range] [, level(#) binomial poisson exposure(varname) total ] We will use this command to find a confidence interval for the mean of trust (the variable which we created last week from peoptrust). First, recreate this variable: Week 3 Page 1 of 6 . recode peoptrust (1=1 "Trusting") (2=0 "Not Trusting"), gen(trust) . summarize trust Using our new trust variable, results are as follows: . ci trust, binomial -- Binomial Exact -Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------trust | 3432 .4769814 .0085258 .460151 .4938509 So given the confidence interval that is estimated in STATA, we see the range of possible values the mean might take in the population, due to the sample size the confidence interval is quite small. A rough rule of thumb is that the confidence interval is the mean plus or minus twice the standard error. The larger our sample the smaller our standard error and therefore the narrower the confidence interval and more precise estimate of the population mean. Note that we have included the binomial option on the .ci command. By doing so, the estimation takes into account the fact that variable cannot be less than 0 or greater than 1. EXERCISE 1 Examine the confidence intervals for the means of the other variables we were interested in last week: pride, lifesat, and decision (recode the national pride variable as we did last week where 1 = proud and 0 = not proud, and remember to deal appropriately with missing data). Use the binomial option when necessary. Often opinion polls quote sampling error of +/- 2 percentage points. Why are our estimates from the WVS somewhat more precise? 2. A NOTE ON SAMPLING Having learned the .ci command, we should take a moment to look at the relationship between sample size and confidence intervals. You can use the .sample command to draw a random sample from the data in memory. We are currently using about 3500 cases. Use the .sample command to select a smaller random sample, anywhere between about 500 and 1000. . preserve . sample #, count Then check the mean and confidence intervals for the trust variable again. . ci trust, binomial Note that as the sample size decreases, the confidence intervals increase. The reason this is true has to do with sampling theory, which is covered in the lectures. To return to using the full dataset, type: . restore Week 3 Page 2 of 6 EXERCISE 2 Experiment with the data to find what sort of sample size gives a confidence interval of around two percentage points either side. 3. COMPARISON OF MEANS If we find that two groups have different means for a given variable, how do we know whether this is due to chance in the particular sample that we have drawn? What we want to know is the probability that the means of the two groups are different in the population. Typically, when there is sufficient evidence in our sample to say that there is less than a five per cent chance that the underlying population difference is equal to zero, we say there is a statistically significant difference in means. Notice that ‘statistical significance’ is not the same as substantive significance. Even very small differences are statistically significant if we are confident that they reflect a real difference in the population. The .ttest command in STATA tests the difference in means. We will use it to test the difference in life satisfaction in Britain and France, thus we have to exclude the USA and Sweden from the analysis using an if command: . ttest lifesat if nation < 3, by(nation) Two-sample t test with equal variances -----------------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------France | 549 7.063752 .1854138 4.344383 6.699544 7.427961 Britain | 1052 7.613118 .1044634 3.388223 7.408137 7.818098 ---------+-------------------------------------------------------------------combined | 1601 7.424735 .0937565 3.751431 7.240836 7.608633 ---------+-------------------------------------------------------------------diff | -.5493656 .1970979 -.9359629 -.1627682 -----------------------------------------------------------------------------diff = mean(France) - mean(Britain) t = -2.7873 Ho: diff = 0 degrees of freedom = 1599 Ha: diff < 0 Pr(T < t) = 0.0027 Ha: diff != 0 Pr(|T| > |t|) = 0.0054 Ha: diff > 0 Pr(T > t) = 0.9973 Make sure you know which p-value to read from the chart, here we just want to know whether the two means are different; there is no directional component to the hypothesis, that is we don’t think one should be higher than the other. For example, a directional hypothesis would be that we expect that life satisfaction in France to be higher than in Britain. Our example here shows that life satisfaction is higher by a statistically significant margin in Britain. EXERCISE 3 Compare means for Britain and France for decision. Where do significant differences exist? Interpret the results. Week 3 Page 3 of 6 Now compare means for Britain to Sweden and the USA in turn. Again, where do significant differences exist? Hint: either recode into two new variables, where Britain equals 1 and the other country you are comparing equals 0 or use if statements to test this in each country. If you are using a binary variable, like our trust variable, then testing the difference in means might not be the best way to proceed. This is because the ttest command relies on some distributional assumptions that may not be true for a 0/1 (binary or dummy) variable. Rather, you might consider testing the difference in proportions, using prtest. We will use it to examine the trust variable (which you will need to generate again if it was lost after the sampling exercise in section 2). . recode peoptrust (1=1 "Trusting") (2=0 "Not Trusting"), gen(trust) . prtest trust if nation<3, by(nation) Two-sample test of proportion France: Number of obs = 549 Britain: Number of obs = 1052 -----------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------France | .2276867 .017897 .1926093 .2627641 Britain | .4448669 .0153217 .414837 .4748968 -------------+---------------------------------------------------------------diff | -.2171802 .0235596 -.2633562 -.1710043 | under Ho: .0254254 -8.54 0.000 -----------------------------------------------------------------------------diff = prop(France) - prop(Britain) z = -8.5419 Ho: diff = 0 Ha: diff < 0 Pr(Z < z) = 0.0000 Ha: diff != 0 Pr(|Z| < |z|) = 0.0000 Ha: diff > 0 Pr(Z > z) = 1.0000 The final line of the output gives the relevant p-values for each test defined by the alternative hypothesis Ha. A p-value is the probability of observing the sample we have or one more extreme in a particular direction, assuming that the null hypothesis (Ho) is true. The null hypothesis is the same for all three tests: there is no difference between France and Britain in the level of trust. The middle test has an alternative hypothesis that the difference is not equal to zero. The interpretation of the p-value for this test is the probability of observing a sample difference between France and Britain that is as big or bigger than the one we have observed in our sample. Since this probability is practically zero (less than 0.0000) we can say that there is a statistically significant difference in the level of trust between France and Britain. EXERCISE 4 Compare means for Britain and the other three countries for war (willingness to fight in a war if country was invaded). Note this may need recoding. Where do significant differences exist? Interpret the results. Week 3 Page 4 of 6 Are there significant differences between men and woman in their willingness to fight? Are these differences significant in all countries (hint: use if statements to test this in each country). Interpret the results. Week 3 Page 5 of 6 Stata Exercise: 1. According to authors like Putnam, education is positively associated with trust in other people. To what extent is this true for our WVS sample? Examine summary statistics for each education group in the dataset. Present one graph that summarises the relationship between education and trust. Is education also associated with life satisfaction? Present one graph that summarizes the relationship between education and life satisfaction. Describe your results. 2. Create a new two-group education variable, justifying your selection of the groups, and test the hypothesis that the means/proportions of the trust and life satisfaction variables are different between the two educational groups. Present the results in one or two tables and provide a brief description. 3. There is a possibility that the relationship between (1) education and (2) social trust or life satisfaction, varies from country to country. Using significance testing compare the means or proportions for social trust and life satisfaction by educational group (our recoded binary variable from above) in each country. Present your results concisely and describe your findings in no more than 500 words. Week 3 Page 6 of 6