Practical

advertisement
Module 4 Practical 8
Practical 8
Tests of hypothesis
Objectives:
By the end of this practical you should be able to:

set up a null and alternative hypothesis for a given problem involving a
single measurement variable, or two measurement variables
construct, carry out and interpret results from a one-sample t-test
construct, carry out and interpret results from a two-sample t-test
interpret output from different software packages providing results for
t-tests



1. A researcher working for the Ministry of Agriculture is interested in examining possible
changes in the time to harvest of sorghum in a region where it has been recently introduced.
Past studies suggest that under similar conditions time to harvest takes 105 days for that
variety.
To examine whether the mean time to harvest has changed, the researcher collects data from the
fields of 15 farmers, giving the following results:
122
105
98
99
116
114
112
105 116
94
108
96
94
113
111
Go into Stata and enter these data using the data editor.
(a) Set up appropriate null and alternative hypotheses for this problem, defining clearly any
population parameters used.
(b) Use appropriate software to compute the mean, standard deviation and standard error of the
mean. Note them down below.
Districts Training Programme
Module 4 Practical 8 – Page 1
Module 4 Practical 8
(c) Now compute the t-statistic needed to test the null hypothesis specified in (a), and carry out
the test. You will need to find the appropriate test using the Stata Statistics menu.
(d) Interpret the results of your t-test and write down your conclusions in a way that an official
from the Ministry of Agriculture can understand.
2. Open the Stata file named unhs_hh&poverty.dta, which you used in the previous practical.
One question of interest is to investigate whether the mean consumption expenditure per adult
equivalent, available in the variable named welfare, differs across male headed and female headed
households.
For this purpose, first look at box plots of welfare compared across male headed and female
headed households (in variable hsex) and sketch them below. Do you think the data in either
group follows a normal distribution?
Districts Training Programme
Module 4 Practical 8 – Page 2
Module 4 Practical 8
Consider an analysis based on the log of consumption expenditure per adult equivalent (available
in variable log_welf) and investigate the question posed at the start of this question. You should



state your null and alternative hypotheses;
estimate the mean of log_welf for male and female headed households separately; and
carry out a significance test to investigate whether or not there is a difference between the
two groups in terms of their average expenditure.
Record your results below; and write a short paragraph summarizing your findings. Remember
to report your final results in terms of expenditure values rather than log values.
Districts Training Programme
Module 4 Practical 8 – Page 3
Module 4 Practical 8
3. The data for this exercise comes from a survey to investigate cardiovascular (heart) disease
among bus drivers and conductors. Part of the data for 125 workers in the survey are shown in
Appendix 1 at the end of this handout. Have a brief look at this data. Note that high values of
the variables Serum Triglyceride (ST) and Systolic Blood Pressure (SBP) indicate risk of
heart disease. Normal levels are <100 mg/dL for Serum Triglyceride and <120 mg Hg for
Systolic Blood Pressure. The data are available in Busdata.dta, but you will not need to access
this data file for the work below. However for interpreting output below, you may like to note
that the variable job has codes 1=driver, 2=conductor, while the variable smoking has 1=nonsmoker, 2=smoker.
(a) Given below is output from Stata of a t-test for testing the null hypothesis H0:  = 120 versus
the alternative that H1:   120, where  is the population mean systolic blood pressure for
conductors. Here, the test uses the value 120 because this corresponds to the expected value for
healthy men.
Stata Output(1):
. ttest systolic==120
One-sample t test
---------------------------------------------------------------------Variable | Obs
Mean
Std. Err. Std. Dev.
[95% Conf. Interval]
---------+-----------------------------------------------------------Systolic | 66 184.7576
10.34199
84.01869
164.1032
205.4119
---------------------------------------------------------------------mean = mean(systolic)
t =
6.2616
Ho: mean = 120
degrees of freedom =
65
Ha: mean < 120
Pr(T < t) = 1.0000
Ha: mean != 120
Pr(|T| > |t|) = 0.0000
Ha: mean > 120
Pr(T > t) = 0.0000
Interpret the results above and write a brief report which presents the key results and
conclusions.
Districts Training Programme
Module 4 Practical 8 – Page 4
Module 4 Practical 8
(b) Also produced below are results of a two-sample t-test to determine whether the mean
systolic blood pressure of conductors varies according to their smoking status. The analysis for
drivers is also given.
Stata Output(2)for conductors:
. ttest systolic, by(smoking)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------1 |
52
175.6154
10.83936
78.16377
153.8545
197.3763
2 |
14
218.7143
26.39169
98.74865
161.6985
275.7301
---------+-------------------------------------------------------------------combined |
66
184.7576
10.34199
84.01869
164.1032
205.4119
---------+-------------------------------------------------------------------diff |
-43.0989
24.91893
-92.88018
6.68238
-----------------------------------------------------------------------------diff = mean(1) - mean(2)
t = -1.7296
Ho: diff = 0
degrees of freedom =
64
Ha: diff < 0
Pr(T < t) = 0.0443
Ha: diff != 0
Pr(|T| > |t|) = 0.0885
Ha: diff > 0
Pr(T > t) = 0.9557
Stata Output(3)for drivers:
. ttest systolic, by(smoking)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------1 |
40
223.45
13.96203
88.30366
195.2091
251.6909
2 |
19
188.1053
16.06673
70.03324
154.3503
221.8602
---------+-------------------------------------------------------------------combined |
59
212.0678
10.9256
83.92112
190.1978
233.9378
---------+-------------------------------------------------------------------diff |
35.34474
23.11743
-10.94711
81.63658
-----------------------------------------------------------------------------diff = mean(1) - mean(2)
t =
1.5289
Ho: diff = 0
degrees of freedom =
57
Ha: diff < 0
Pr(T < t) = 0.9341
Ha: diff != 0
Pr(|T| > |t|) = 0.1318
Ha: diff > 0
Pr(T > t) = 0.0659
Again, interpret both sets of results above and write a brief report which presents the key results
and conclusions.
Districts Training Programme
Module 4 Practical 8 – Page 5
Module 4 Practical 8
(c) The first table below shows the percentage of drivers and conductors who are at risk from
heart disease on the basis of their systolic blood pressure (120 mg Hg). Then given are results
of a test to compare whether the proportion at risk differ across drivers and conductors.
Interpret the results below, then present and summarise your conclusions.
Stata Output(4):
. tab bprisk job, col
+-------------------+
| Key
|
|-------------------|
|
frequency
|
| column percentage |
+-------------------+
Whether at |
risk of |
Job type
heart | (1=driver;2=conductor
disease on |
)
BP |
driver conductor |
Total
-----------+----------------------+---------no risk |
8
21 |
29
|
13.56
31.82 |
23.20
-----------+----------------------+---------at risk |
51
45 |
96
|
86.44
68.18 |
76.80
-----------+----------------------+---------Total |
59
66 |
125
|
100.00
100.00 |
100.00
. prtesti 59 0.8644 66 0.6818
Two-sample test of proportion
x: Number of obs =
59
y: Number of obs =
66
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x |
.8644
.0445719
.7770407
.9517593
y |
.6818
.0573333
.5694289
.7941711
-------------+---------------------------------------------------------------diff |
.1826
.0726206
.0402662
.3249338
| under Ho:
.0756293
2.41
0.016
-----------------------------------------------------------------------------diff = prop(x) - prop(y)
z =
2.4144
Ho: diff = 0
Ha: diff < 0
Pr(Z < z) = 0.9921
Districts Training Programme
Ha: diff != 0
Pr(|Z| < |z|) = 0.0158
Ha: diff > 0
Pr(Z > z) = 0.0079
Module 4 Practical 8 – Page 6
Module 4 Practical 8
4. This final exercise is aimed at giving you further practice in setting up hypotheses, and
conducting and interpreting results from testing the null hypothesis.
EITHER work on a data set of your own to find compare means or proportions of one or more
key responses of interest across groupings defined by another factor, and then interpreting and
reporting the results of your analyses;
OR use the file unhs_hh&poverty.dta to answer question posed below for your own district.
Follow steps used in the previous practical to select data for the district of your choice.
Questions to answer with respect to data from your selected district:
(a) Consider again the variable log_welf, i.e. the logarithm of the household’s monthly
consumption expenditure per adult equivalent, used as a proxy for the household’s income.
Investigate whether the mean of log_welf differs across households where the head has zero
years education versus those where the number of years of education is > 0. You will need to
start by specifying the null and alternative hypotheses, then conduct the test, interpret the results
and report your conclusions.
(b) The variable hlitrate refers to whether or not the household head is literate. Compare the
proportion of households with literate heads versus those where the household head is not
literate. Do this separately for rural household and urban households. Then interpret your
results and write a short summary of your conclusions.
Districts Training Programme
Module 4 Practical 8 – Page 7
Download