Module 4 Practical 8 Practical 8 Tests of hypothesis Objectives: By the end of this practical you should be able to: set up a null and alternative hypothesis for a given problem involving a single measurement variable, or two measurement variables construct, carry out and interpret results from a one-sample t-test construct, carry out and interpret results from a two-sample t-test interpret output from different software packages providing results for t-tests 1. A researcher working for the Ministry of Agriculture is interested in examining possible changes in the time to harvest of sorghum in a region where it has been recently introduced. Past studies suggest that under similar conditions time to harvest takes 105 days for that variety. To examine whether the mean time to harvest has changed, the researcher collects data from the fields of 15 farmers, giving the following results: 122 105 98 99 116 114 112 105 116 94 108 96 94 113 111 Go into Stata and enter these data using the data editor. (a) Set up appropriate null and alternative hypotheses for this problem, defining clearly any population parameters used. (b) Use appropriate software to compute the mean, standard deviation and standard error of the mean. Note them down below. Districts Training Programme Module 4 Practical 8 – Page 1 Module 4 Practical 8 (c) Now compute the t-statistic needed to test the null hypothesis specified in (a), and carry out the test. You will need to find the appropriate test using the Stata Statistics menu. (d) Interpret the results of your t-test and write down your conclusions in a way that an official from the Ministry of Agriculture can understand. 2. Open the Stata file named unhs_hh&poverty.dta, which you used in the previous practical. One question of interest is to investigate whether the mean consumption expenditure per adult equivalent, available in the variable named welfare, differs across male headed and female headed households. For this purpose, first look at box plots of welfare compared across male headed and female headed households (in variable hsex) and sketch them below. Do you think the data in either group follows a normal distribution? Districts Training Programme Module 4 Practical 8 – Page 2 Module 4 Practical 8 Consider an analysis based on the log of consumption expenditure per adult equivalent (available in variable log_welf) and investigate the question posed at the start of this question. You should state your null and alternative hypotheses; estimate the mean of log_welf for male and female headed households separately; and carry out a significance test to investigate whether or not there is a difference between the two groups in terms of their average expenditure. Record your results below; and write a short paragraph summarizing your findings. Remember to report your final results in terms of expenditure values rather than log values. Districts Training Programme Module 4 Practical 8 – Page 3 Module 4 Practical 8 3. The data for this exercise comes from a survey to investigate cardiovascular (heart) disease among bus drivers and conductors. Part of the data for 125 workers in the survey are shown in Appendix 1 at the end of this handout. Have a brief look at this data. Note that high values of the variables Serum Triglyceride (ST) and Systolic Blood Pressure (SBP) indicate risk of heart disease. Normal levels are <100 mg/dL for Serum Triglyceride and <120 mg Hg for Systolic Blood Pressure. The data are available in Busdata.dta, but you will not need to access this data file for the work below. However for interpreting output below, you may like to note that the variable job has codes 1=driver, 2=conductor, while the variable smoking has 1=nonsmoker, 2=smoker. (a) Given below is output from Stata of a t-test for testing the null hypothesis H0: = 120 versus the alternative that H1: 120, where is the population mean systolic blood pressure for conductors. Here, the test uses the value 120 because this corresponds to the expected value for healthy men. Stata Output(1): . ttest systolic==120 One-sample t test ---------------------------------------------------------------------Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-----------------------------------------------------------Systolic | 66 184.7576 10.34199 84.01869 164.1032 205.4119 ---------------------------------------------------------------------mean = mean(systolic) t = 6.2616 Ho: mean = 120 degrees of freedom = 65 Ha: mean < 120 Pr(T < t) = 1.0000 Ha: mean != 120 Pr(|T| > |t|) = 0.0000 Ha: mean > 120 Pr(T > t) = 0.0000 Interpret the results above and write a brief report which presents the key results and conclusions. Districts Training Programme Module 4 Practical 8 – Page 4 Module 4 Practical 8 (b) Also produced below are results of a two-sample t-test to determine whether the mean systolic blood pressure of conductors varies according to their smoking status. The analysis for drivers is also given. Stata Output(2)for conductors: . ttest systolic, by(smoking) Two-sample t test with equal variances -----------------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------1 | 52 175.6154 10.83936 78.16377 153.8545 197.3763 2 | 14 218.7143 26.39169 98.74865 161.6985 275.7301 ---------+-------------------------------------------------------------------combined | 66 184.7576 10.34199 84.01869 164.1032 205.4119 ---------+-------------------------------------------------------------------diff | -43.0989 24.91893 -92.88018 6.68238 -----------------------------------------------------------------------------diff = mean(1) - mean(2) t = -1.7296 Ho: diff = 0 degrees of freedom = 64 Ha: diff < 0 Pr(T < t) = 0.0443 Ha: diff != 0 Pr(|T| > |t|) = 0.0885 Ha: diff > 0 Pr(T > t) = 0.9557 Stata Output(3)for drivers: . ttest systolic, by(smoking) Two-sample t test with equal variances -----------------------------------------------------------------------------Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------1 | 40 223.45 13.96203 88.30366 195.2091 251.6909 2 | 19 188.1053 16.06673 70.03324 154.3503 221.8602 ---------+-------------------------------------------------------------------combined | 59 212.0678 10.9256 83.92112 190.1978 233.9378 ---------+-------------------------------------------------------------------diff | 35.34474 23.11743 -10.94711 81.63658 -----------------------------------------------------------------------------diff = mean(1) - mean(2) t = 1.5289 Ho: diff = 0 degrees of freedom = 57 Ha: diff < 0 Pr(T < t) = 0.9341 Ha: diff != 0 Pr(|T| > |t|) = 0.1318 Ha: diff > 0 Pr(T > t) = 0.0659 Again, interpret both sets of results above and write a brief report which presents the key results and conclusions. Districts Training Programme Module 4 Practical 8 – Page 5 Module 4 Practical 8 (c) The first table below shows the percentage of drivers and conductors who are at risk from heart disease on the basis of their systolic blood pressure (120 mg Hg). Then given are results of a test to compare whether the proportion at risk differ across drivers and conductors. Interpret the results below, then present and summarise your conclusions. Stata Output(4): . tab bprisk job, col +-------------------+ | Key | |-------------------| | frequency | | column percentage | +-------------------+ Whether at | risk of | Job type heart | (1=driver;2=conductor disease on | ) BP | driver conductor | Total -----------+----------------------+---------no risk | 8 21 | 29 | 13.56 31.82 | 23.20 -----------+----------------------+---------at risk | 51 45 | 96 | 86.44 68.18 | 76.80 -----------+----------------------+---------Total | 59 66 | 125 | 100.00 100.00 | 100.00 . prtesti 59 0.8644 66 0.6818 Two-sample test of proportion x: Number of obs = 59 y: Number of obs = 66 -----------------------------------------------------------------------------Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------x | .8644 .0445719 .7770407 .9517593 y | .6818 .0573333 .5694289 .7941711 -------------+---------------------------------------------------------------diff | .1826 .0726206 .0402662 .3249338 | under Ho: .0756293 2.41 0.016 -----------------------------------------------------------------------------diff = prop(x) - prop(y) z = 2.4144 Ho: diff = 0 Ha: diff < 0 Pr(Z < z) = 0.9921 Districts Training Programme Ha: diff != 0 Pr(|Z| < |z|) = 0.0158 Ha: diff > 0 Pr(Z > z) = 0.0079 Module 4 Practical 8 – Page 6 Module 4 Practical 8 4. This final exercise is aimed at giving you further practice in setting up hypotheses, and conducting and interpreting results from testing the null hypothesis. EITHER work on a data set of your own to find compare means or proportions of one or more key responses of interest across groupings defined by another factor, and then interpreting and reporting the results of your analyses; OR use the file unhs_hh&poverty.dta to answer question posed below for your own district. Follow steps used in the previous practical to select data for the district of your choice. Questions to answer with respect to data from your selected district: (a) Consider again the variable log_welf, i.e. the logarithm of the household’s monthly consumption expenditure per adult equivalent, used as a proxy for the household’s income. Investigate whether the mean of log_welf differs across households where the head has zero years education versus those where the number of years of education is > 0. You will need to start by specifying the null and alternative hypotheses, then conduct the test, interpret the results and report your conclusions. (b) The variable hlitrate refers to whether or not the household head is literate. Compare the proportion of households with literate heads versus those where the household head is not literate. Do this separately for rural household and urban households. Then interpret your results and write a short summary of your conclusions. Districts Training Programme Module 4 Practical 8 – Page 7