34201

advertisement
Biostatistics course
Part 8
Inferences of a mean
Dr. Sc Nicolas Padilla Raygoza
Department of Nursing and Obstetrics
Division of Health Sciences and Engineering
Campus Celaya Salvatierra
University of Guanajuato Mexico
Biosketch
 Medical Doctor by University Autonomous of Guadalajara.
 Pediatrician by the Mexican Council of Certification on






Pediatrics.
Postgraduate Diploma on Epidemiology, London School of
Hygine and Tropical Medicine, University of London.
Master Sciences with aim in Epidemiology, Atlantic International
University.
Doctorate Sciences with aim in Epidemiology, Atlantic
International University.
Professor Titular A, Full Time, University of Guanajuato.
Level 1 National Researcher System
padillawarm@gmail.com
Competencies
 The reader will apply a Z test to obtain
inferences of a mean.
 The reader will obtain a confidence interval
for a mean.
 He (she) will apply a t test for a mean in a
short sample.
 He (she) will obtain a confidence interval for a
mean in a short sample.
Introduction
 If we measure the stature of students of
FEOC, we can obtain its mean and standard
deviation:





Number of students: 269
Mean of stature: 161.6 cm
Standard deviation: 6.3 cm
Median: 159 cm
Range: 149 a 185 cm.
Notation
 For parameters of population, we use Greek
letters; to parameters in sample, we use
Roman letters.
Parameter
Population
Sample
Mean
μ
_
X
Standard
deviation
σ
s
Sampling distribution
 If we take many samples of the same size of
the same population, each sample can have
different mean and standard deviation.
 If we plot these sample means we can obtain
a sampling distribution.
 If the sample size is big, the mean distribution
is almost Normal, although data distribution in
the population is not Normal.
Sampling distribution (contd…)
Stature (cm)
n
%
% accumulated
149
2
0.7
0.7
150
3
1.1
1.8
152
6
2.2
4.0
154
12
4.5
8.5
155
27
10.0
18.5
157
29
10.8
29.3
158
26
9.7
39.0
159
33
12.3
51.3
163
37
13.8
65.1
164
16
5.9
71.0
165
24
8.9
79.9
168
18
6.7
86.6
169
14
5.2
91.8
171
6
2.2
94.0
174
7
2.6
96.6
175
1
0.4
97.0
177
4
1.5
98.5
179
2
0.7
99.2
158 157 158 159 160 161 162
184
1
0.4
99.6
Means of stature (cm)
185
1
0.4
100.0
Total
269
100.0
Data of students from FEOC. If we
take other 999 samples of
students, we can graphic the
distribution of their means.
Frequency
Sampling distribution 1000
samples; n=269
300
200
100
0
95% Confidence Interval
 They use the probability theory to obtain
conclusions on a population, from data
obtained of a sample.
 It is difficult study all population, because of
this, we study samples.
 Methods for obtain estimates and hypothesis
test are important to obtain inferences.
95% confidence intervals (contd…)
 Then, the confidence intervals for a mean, are
calculated:
_
X ± 1.96 (ES)
_
X is the estimate obtained of the sample,
1.96 is the multiply of standard errors for 95%,
SE is the standard error
 We should wait that the 95% confidence interval
around of the mean of sample include the mean of
the population in the 95% of times, if we obtain
thousands of samples.
95% confidence intervals (contd…)
 We calculate 95% confidence interval for the
first sample of 269 students from FEOC:
_
X = 161.6
SE= 6.3/√269= 0.38
95%CI= 161.6 ± 1.96 (0.38) = 161.6 ± 0.74 =
160.86 a 162.34
95% confidence intervals (cont…)
 We can use confidence intervals in another
percentage of confidence, only we need
change the multiply of standard error:



For example, for 90% change to 1.69.
For 95.4% change to 2.
For 99% change to 3.
Hypothesis test for a mean
 Hypothesis test is to probe if our estimate is similar with a
specific value.
 Our sample of 269 students had a mean of 161.6 with
standard deviation of 6.3 and standard error of 0.38.
 In a similar study in students from School of Accounting
and Administration, obtained a mean of stature of 167
cm.
 How we can demonstrate if the stature of students from
FEOC is equal or different that stature of students from
FCA?
 Mean of FEOC 161.6
 Mean of FCA 167
 We can see that obviously, they are different.
 But, we do not know if the observe difference is true or it is
by error sampling, because 161.6 is an estimate of many
that we can have obtaining
Hypothesis test for a mean (contd…)
 To evaluate if the observe difference is real, we can
do:






Null hypothesis say that the means of both populations
are the same (the first population is students from
FEOC and reference population is students from FCA).
Null hypothesis is writing as Ho.
If the mean of hypothesis is μo and the mean in study
is μ, then, null hypothesis is writing as HO : μ = μo
Alternative hypothesis
It is that the means of two populations are not equal.
Usually, it is writing as H1: μ≠μ0
Hypothesis test for a mean (contd…)
 When we are pointing the null hypothesis,
calculate the probability of obtain the observe
data if the null hypothesis is true.
 To obtain this probability, we calculate a
statistic test and it is compare with the
distribution implicated for the null hypothesis.
 In many cases it will be Normal distribution.
Hypothesis test for a mean (contd…)
 The general form of statistic test compare the estimate of
observed values of the sample and the expected value if the null
hypothesis is true.
 Also, it take into account the variability in the population using
standard error.
 This statistic test is called Z and it is equal to:
_
X – μo
Z= -----------ES
_
Then, the test is a standardized difference between X and μo.
Example
 The students sample from FEOC
Mean = 161.6
S = 6.3
95%CI = 160.6 a 162.60
 Null hypothesis; there is not difference between the means of
students from FEOC and FCA
Ho: μ = 167cm
We need use Z test:
_
X – μo
161.6 -167
z = ----------- = ---------------- = - 14.21
ES(X)
0.38
Small samples
 If the sample size is small, we use t
distribution.
 Its form depend of freedom degrees, that it is
a measure that is so small is sample size.
 The degree freedom of a t distribution is
equal to sample size minus 1.
Small samples
 Less freedom degrees, less probability of stay around
of mean of sample and high probability to stay in the
tails.
 The t distributions with a few freedom degrees have
more smaller probabilities to sides of the mean and
higher probabilities in the tails.
 However, if the samples size is bigger and more
freedom degrees, more similar is t distribution to
Normal distribution.
 There are published tables of selected values of the
area under t distribution that we shall use when
calculate confidence intervals and hypothesis test.
Small samples
 When the sample size is small, less than 100, the formulas for
confidence interval and hypothesis test, are:
95%CI
Estimate ± multiplier (standard error)
Estimate is the estimate mean
Multiplier is the value of t
Correspond to p=0.05 with degree
Freedom equal to sample size minus 1
Hypothesis test
To test Ho: μ=μo
To test H1: μ≠μo
_
X – μ0
t = --------SE
P values
 One or two tails?
 Now, we know that the p value is the probability to
obtain a result at least extreme as the found with our
sample, if the null hypothesis are true.
 But, what is the meaning of extreme?
 When the alternative hypothesis is H1: µ ≠ µo
 Then, the extreme results can occur for chance at each
side of the mean of the hypothesis, µo.
 Due of this, we used tables for two tails of Normal and t
distributions.
P values
 There are occasions less common where the




alternative hypothesis is H1: µ < µo or H1: µ > µo
Then, extreme values can occur only to the left or
only to the right, of the mean of hypothesis.
How little is little the p value?
Many people are using the p value of 0.05 as cut
point. This is a arbitrary value, but it is sensitive. The
meaning is that we are prepare to reject the null
hypothesis at least one time of 20 when is true.
Note that when the value of a test has a p value less
than 0.05, the confidence interval does not include
the hypothesis value.
P values
 If we obtained a p value of 0.048, can we
reject the null hypothesis?
 If we obtained a p value of 0.052 , do we
cannot reject the null hypothesis?
 When the p values are between 0.07 and
0.03 they should be joint the real p value,
because they are in the border of significance
statistic.
Showing the results
 We should show the results with their
confidence intervals.
 Clear what is the null and alternative
hypothesis.
 Show the p value of each test; it is sufficient
with say p< 0.001 when apply.
 Not misunderstood the p values


A small p value reject the null hypothesis,
A high p value only does not reject the null
hypothesis
Bibliografía
 1.- Last JM. A dictionary of epidemiology.
New York, 4ª ed. Oxford University Press,
2001:173.
 2.- Kirkwood BR. Essentials of medical
statistics. Oxford, Blackwell Science, 1988: 14.
 3.- Altman DG. Practical statistics for medical
research. Boca Ratón, Chapman & Hall/
CRC; 1991: 1-9.
Download