Uploaded by Mohammed .deker

Chapter 5 Estimation

advertisement
CHAPTER: FIVE
ESTIMATION
Statistics
Descriptive
Organising,
summarising &
describing data
Inferential
Prediction
Generalising
Relationships
Significance
Estimation – describing population parameters in
terms of sample statistic.
Parameter is a numerical descriptive measure of a
population data.
Statistic is a numerical descriptive measure of a
sample data.
The sample from a population is used to provide the
estimates of the population parameter.
For each sample statistics we use a corresponding
population parameter.
Sample statistic
 (Sample mean)
S (Sample SD)
S2 (Sample Variance)
p (Sample Proportion)
Corresponding
population parameter
 (Population mean)
 (Population SD)
2 (Population Variance)
P or  (Population
proportion)
Types of Estimation
• 1. Point Estimation
A single numerical value used to estimate the
corresponding population parameter
_
x is an estimator of the population mean μ
s is an estimator of the population standard deviation σ
p is an estimator of the population proportion π
r is an estimator of the population correlation coefficient
ρ
ˆ is an estimator of population OR
• OR
• RR̂ is an estimator of population RR
•
•
•
•
Properties of a good estimates
a) Unbiasedness: A sample statistic whose
mean is equal to the population parameter it
estimates is unbiased.
• The sample mean and median are unbiased
estimators of the population mean μ.
6
b) Minimum variance: An estimate which has a
minimum standard error is a good estimator.
• For symmetrical distribution the mean has a
minimum standard error
• If the distribution is skewed the median has a
minimum standard error.
7
2. Interval Estimation
Interval estimation - is an estimate of
parameter in terms of an interval or range of
value with which it is likely to lie.
• Point estimation does not give any indication
on how far away the parameter lies.
• A more useful method of estimation is to
compute an interval which has a high
probability of containing the parameter.
Interval estimation is better than point
estimation because it considers the variation
within the samples.
The interval with in which population parameter
is assumed to lie is confidence interval.
The upper and the lower values of the interval
are called confidence limit.
The population parameter lies in interval
and it is not 100% sure but we have some
confidence that the population parameter
may lie on this interval.
1. Confidence Interval for a one population Mean
Confidence interval – the estimate of a parameter as
a whole range of possible values
Margin of error – the point to which point estimate is
accurate
–
when we generalize to a whole population based on a
single value, some accuracy is lost.
• A confidence interval extends either side of the
sample mean by a multiple of the standard error.
•
is most common to calculate a 95% confidence
interval, this extends 1.96 standard errors (SE)
either side of the mean.
Level of Confidence – is a measure of how certain the
results are.
Example – level of confidence 95% means 95% of
the time the sample we pick is sufficiently
representative of the whole population to allow us
to make generalization.
Possibility of Error – is the risk that the sample, on
which the estimation ha been based, was
misleading and more different from the general
population than expected.
Example – if the level of confidence is 95%, the risk
is 5%.
The possibility that the confidence interval will
indeed include the parameter is called the degree
of confidence or confidence level.
• α is to be chosen by the researcher, most common
values of α are 0.05, 0.01, 0.001 and 0.1.
Case I: if  is known and population is normal
If we have X-value in N (,), then we can
determine Z-value by
in N (0,1)
x
Z

So the sampling distribution will also be normally
distributed with mean x and standard deviation
x
(/n) then,
in N (0,1)
Z
/
n
The, the (1-) 100% confidence interval is:
XZ/2/n
• 100(1-α)% CI for μ when σ is known (sampling from
normal population or large sample)
precision of the estimate (margin of error)
Estimator
XZ/2/n
Reliability coefficient
Standard Error
Example
• A physical therapist wished to estimate, with 99%
confidence, the mean maximal strength of a
particular muscle in a certain group of individuals. He
assume that strength scores are approximately
normally distributed with a variance of 144. A sample
of 15 subjects who participated in the experiment
yielded a mean of 84.3. Estimate the maximum
strength of a particular muscle for the total
population.
Solution
• α = 0.01 ⇒ Zα/ 2 = 2.575
• x = 84.3, n =15, σ = 12
• 84.3 ± 2.58(12/15) ⇒ 84.3 ± 8.0 ⇒ (76.3, 92.3)
• ⇒ We are 99% confident that the population mean is
between 76.3 and 92.3.
Case II: Unknown variance (small sample size n ≤
30)
We use here Students’ t-distribution for
determination of confidence interval
The, the (1-) 100% confidence interval is:
X  t/2(n-1)S/n
Example
• A study of hypoxemia during the immediate postoperative period reported the fractions of ideal weight
for 11 patients who became severely hypoxemic
during transfer to the recovery room. The mean is
1.51 and the standard deviation is 0.33. Estimate the
95% C.I. for the population mean fraction of ideal
weight, where the population consists of hypoxemic
patients similar to those in the study (The data is
normally distributed, use α=0.05).
Example
tα/2 n‐1= t 0.025,10 = 2.2281
1.51 ± 2.2281 0 . 33
11
1.51 ± 0.221
(1.289, 1.731 )
We are 95% sure that the population mean μ lies
between 1.289 and 1.731
2. C.I. for the difference between two
population means (normally distributed)
i) Known variance (2 independent samples)
• A 100(1‐α)% C.I. for μ1 ‐ μ2 is
 1 2

n1 n2
2
( X 1  X 2)  Z / 2
2
ii) Unknown variances
a) Equal variances (2 independent samples)
• A 100(1‐α)% C.I. for μ1 ‐ μ2 is
2
( X 1  X 2)  t / 2 , n1
Where
S
p

 n2
 2
sp
s
p

n1
n2
( n11) S12 ( n2 1) S22 )
n1n2 2
2
2
1 ≤ 2 then we assume that the
2
2
S
* If 0.5 ≤
S
population variances are equal.
Example
A sample of 10 twelve years old boys and a sample
of 10 twelve years old girls yielded mean height of
59.8 inches (boys), and 58.5 inches (girls) with S1=3
and S2= 4 inches respectively. Assuming the
population is normality distributed, find the 95% CI for
the difference in means of height between girls and
boys at this age.
Solution
2
1
2
2
S
S
(9) 2
(16) 2
=
= 0.56 ⇒ 0.5<0.56<2
• We assume that the population variance are equal.
S
p

9 ( 9 ) 9 (16)
18
= 3.5
• α = 0.05 ⇒ α/2 = 0.025 ⇒ t0.025,18 = 2.101
(59.8 ‐ 58.5)± 2.101
(3.5)
10
2
2
(3.5)

10
⇒ (‐2, 4.6)
• We are 95% sure that μ1 ‐ μ2 is between ‐2
and 4.6.
b) Unequal variances (2 independent samples)
A 100(1‐α)% C.I. for μ1 – μ2 is:
2
( X 1  X 2)  t '  / 2 , f
2
s 1 s2

n1 n2
where the degree of freedom f is given by:
f









2
S1 
n
1
2
S1
n1
n11




2
2





S2
n2 2
S 2 
n2 
n21




2
2

Example
The serum progesterone levels for 29 women
with ectopic pregnancies and 20 women with
early intrauterine pregnancies are obtained.
The data are normally distributed with mean
5.6 ng/ml and standard deviation of 3.6 for the
ectopic pregnancies and mean 30.9 ng/ml and
standard deviation of 6.9 for the early
intrauterine pregnancies. Calculate a 95%
confidence interval for μ1- μ2.
Solution
S12
2 =
S2
(3.6) 2
2
(6.9)
= 0.3
⇒0.3<0.5
2
f
3.6 
29
2
3.6  2
29  
291









2




6.9 

20 
2
6.9 
20 
201
2

2

26
• α = 0.05 ⇒ t α/2, 26 = 2.025
• The 95% C.I. for μ1‐μ2 is then
(5.6  30.9)  2.056
3.6
29
2
6
.
9

2
20
⇒−25.3±3.5⇒(−28.8,−21.8)
At 95% level of confidence the difference in
serum progesterone level for women having
ectopic and intrauterine pregnancy lies b/n
-28.8 and -21.8
c) Paired sample
• Convert the two paired samples into a single
sample of differences.
dx = X1i – X2i, i = 1, 2, ..., n.
A 100(1‐α)% C.I. for μ1 ‐ μ2 is:
d  t / 2 , n  1
Sd
n
Example
• A study on the effect of low‐calorie intake on
abnormal pulmonary physiology in patients with
chronic hypercapneic respiratory failure.
• Measurement of patients’ arterial oxygen tension
before and after the weight loss program
Patient
1
2
3
4
5
6
7
8
AOT
Before 70
59
53
54
44
58
64
43
mm Hg
After
82
66
65
62
74
77
68
59
12
7
12
8
30
19
4
16
Difference
mean
SD
13.3
8.2
• A 90% C.I. α = 0.1 ⇒ α/2 = 0.05 and
• t 0.05,7 = 1.8895
13.5 ± (1.895) 8.2
= 13.5 ± 5.5 ⇒ (8,19)
8
• ⇒ We can be 90% sure that the interval from 8
to 19 mm Hg contains the actual mean
increase in arterial oxygen tension for patients
after weight reduction program.
C.I. for a population proportion (large
sample size)
A 100(1- α)% C.I. for π is:
p±
Z 2α p(1 - p)
n
Example:
A study on dental health practice of 300
adults interviewed, 123 said that they
regularly had a dental check-up twice a year.
What is the 95% C.I. for ?
P = 123/300 = 0.41 a point estimator of .
 = 0.05  Z0.025 = 1.96
0.41±1.96 (0.41)(0.59)
300
(0.36, 0.46)
•
At 95% level of confidence, the proportion of
adult population who regularly had a dental
check-up twice a year is (0.36, 0.46).
4. C.I. for the difference between two population
proportions (large sample size)
A 100(1‐α)% C.I. for π1 ‐ π2 is:
( P1  P 2)  Z

P1(1 P1)
2
n1

P 2(1 P2 )
n2
Example
• Two hundred patients suffering from a certain
disease were randomly divided into two equal
groups. Of the first group, who received the standard
treatment, 78 recovered within three days. Out of the
other 100, who were treated by a new method, 90
recovered within three days. The physician wished to
estimate the true difference in the proportions who
would recovered within three days.
Solution
• The estimate of the difference in the population
proportions is
• P1 - P2 = 0 78 – 0.90 = ‐0.12
• The 95% C.I. is:
(0.78  0.90)  1.96
0.78( 0.22)
100

0.90( 0.10)
100
-0.12 ± 0.10
⇒ ( − 0 .22 ,− 0 .02 )
• we are 95% sure difference between that the is –0.22
and –0.02. Note that the negative signs merely reflect
the fact that better results were obtained by using the
new treatment.
Sample Size Estimation
What is sample size?
• Sample size:- the number of study population
required to study an estimate in a population
• Researchers always ask themselves “How big
sample do I need?”
– Too large sample size: too expensive and time
consuming
– Too small sample size: it has inadequate precision
to show a good estimate or show difference
42
Sample size
• Sample size is dependent on the type of
design,
– Descriptive vs analytic
– Case control vs Cohort
– Analytic cross sectional vs Case control
• It is also dependent on the type of major
variable used (categorical vs continuous
variable)
43
Sample size depends on
1. Estimated variability
2. The precision (margin of error)
3. The sampling method (clustering, design effect)
4. Size of population
5. Feasibility (cost)
6. Confidence level (Z value of certainty)
44
Why is it important to consider sample size?
• In studies concerned with estimating some
characteristic of a population (e.g. the prevalence of
asthmatic children), sample size calculations are
important to ensure that estimates are obtained with
required precision or confidence.
• For example, a prevalence of 10% from a sample of
size 20 would have a 95% confidence interval of 1%
to 31%, which is not very precise or informative.
• On the other hand, a prevalence of 10% from a
sample of size 400 would have a 95% confidence
interval of 7% to 13%, which may be considered
sufficiently accurate.
• Sample size calculations help to avoid this situation.
Therefore, to obtain the optimum sample
size, decision on the following is important.
1.
How large error can be tolerated during estimation
(d)
2.
Confidence limit that the tolerated error will not
exceed the determined one (1-)
3.
Our advanced guess of population variance or
proportion.
Where do we get this knowledge?
• Previous published studies
• Pilot studies
• If information is lacking, there is no good
way to calculate the sample size!
Based on the parameter to be estimated (whether
population mean or proportion), we have two sets
of formula.
1. Estimation of population mean ()
•
The maximum error made in estimating population
parameter is given by: d = Z/2/n
this formula for calculating the margin of error
•
solving for n, we square both sides
d2 = (Z/2)2 2
n
n = (Z/2)2 2
d2
Example: Populations of cancer patient have a
survival standard deviation of 43.3 months. If one
wants to conduct a sample survey on these
populations, how large sample is needed so that
95% of the means of these samples of size will be
with in 6 months of the population mean? The
population size is 480 patients.
Solution:
n = (1.96)2 (43.3)2
36
= 200
2. Estimation of population proportion (P or )
Assuming simple random sample and
normality of the distribution of population
leads to he following formula:
n = (Z/2)2  (1-  )
d2
Where  = an advanced guess for population
proportion of the most important
variable
q = 1- 
d = precision required in % (0.01-0.05)
•
If there is no information on the advanced guess of
population proportion, then it is usually taken as
0.5.
Example:
What sample size do we require to achieve a 95%
confidence interval of width ± 5% ( that is to be
within 5% of the true value) ? In a study some years
ago that found approximately 30% were smokers.
Solution:
n = (1.96)2 (0.3 x0.7)
(0.05)2
= 323
Common sample size calculations for two population
 Comparing 2 independent groups- means
 Comparing 2 related groups- means
 Comparing 2 independent groups- proportions
 Comparing 2 related groups- proportions
57
Sample size estimation for tests between two
independent sample proportions
Formula:
58
Sample size estimation for tests between two
independent sample proportions cont…
Where as
N= the sample size estimate
Zcv=Z critical value for alpha (.05 alpha has a Zcv of
1.96)
Z power=Z value for 1-beta (.80 power has a Z of
0.842)
P1=expected proportion for sample 1
P2=expected proportion for sample 2
59
Sample size estimation for tests between two
independent sample proportions cont…
Proportion Example
Alpha=.05
Power=.80
P1=.70
P2=.80
p= .75
60
Sample size estimation for tests between two
independent sample means
where
N= the sample size estimate
Zcv=Z critical value for alpha (.05 alpha has a
Zcv of 1.96)
Zpower=Z value for 1-beta (.80 power has a Z
of 0.842)
s=standard deviation
D=the expected difference between the two
means.
61
Sample size estimation for tests between two
independent sample means cont…
Mean Example
Alpha=.05
Power=.80
D=10
S=20
62
Sample Size Adjustments
• If sampling is from a finite population of size N, then
n’ =
n
1+(n/N)
• The initial sample size approached in the study may
need to be increased in accordance with the
expected response rate, loss to follow up, lack of
compliance, and any other predicted reasons for loss
of subjects
• Design effect for complex cluster sampling common
values multiply n by 2, 3, …5.
Sample Size Adjustments
• Separate sample size calculation should be
done for each important outcome & then use the
maximum estimate
• Allowing for response rates & other losses to the
sample
– The expected response rate
– Loss to follow up
– Lack of compliance
– Other losses
Failure to Achieve Required Sample Size
• Patient refusal to consent
• Bad time of the study (heavy working time
for participants)
• Adverse media publicity
• Weak recruiting staff
• Lack of genuine commitment to the project
• Lack of staffing in wards or units
• Too many projects attempting to recruit
the same subjects
Possible Solutions
• Pilot studies
• Have a plan to regularly monitor recruitment or
create recruitment targets
• Ask for extension in time and/or funding
• Review your staffs commitment to other ongoing
trials or other distracters
• Regular visits to field sites
Download