Binomial Distribution (James Bernoulli,

advertisement
Binomial Distribution
(James Bernoulli, 1713)
Probability Models
Dene a discrete random
variable:
Binomial
Y = number of \successful"
outcomes in a series of
n independent identical
binary trials
Multinomial
Poisson
other models
Note: Y is a sum of n i.i.d.
Bernoulli random
variables
37
For the i-th trial dene
8
<
outcome is a failure
Xi = 01 ifif the
the outcome is a success
38
Example:
:
with
P rfXi = 1g = P rfXi = 0g = 1 Then
Y=
and
n
X
i=1
Observe a sample of n = 5
nesting pairs
Xi
n k
P rfY = kg =
(1 )n k
k
n!
=
k (1 )n k
k!(n k)!
k = 0; 1; 2; : : : ; n
= 0.6 is the proportion of all
nesting pairs of a bird
species that are successful
39
Random variable:
Y = observed number of
successful pairs
Bin(n = 5; = :6)
40
What is the probability of
observing Y = 2 successful
pairs?
5
P r(Y = 2) =
(:6)2(:4)3
2
= (10)(:6)2(:4)3
!
= :2304
Note that
5
5!
=
[2!][(5 2)!]
2
(5)(4)(3)(2)(1)
=
[(2)(1)][(3)(2)(1)]
= 10
!
S
S
S
S
F
F
F
F
F
F
S
F
F
F
S
S
S
F
F
F
F
S
F
F
S
F
F
S
S
F
%
F
F
S
F
F
S
F
S
F
S
F
F
F
S
F
F
S
F
S
S
(.6)(.6)(.4)(.4)(.4)
(.6)(.4)(.6)(.4)(.4)
(.6)(.4)(.4)(.6)(.4)
(.6)(.4)(.4)(.4)(.6)
(.4)(.6)(.6)(.4)(.4)
(.4)(.6)(.4)(.6)(.4)
(.4)(.6)(.4)(.4)(.6)
(.4)(.4)(.6)(.6)(.4)
(.4)(.4)(.6)(.4)(.6)
(.4)(.4)(.4)(.6)(.6)
%
10 ways to
get 2 successes
in n = 5 trials
the probability of
each outcome is
(:6)2(:4)3
41
42
Bin(n = 5; = :6) distribution
5
0
5
1
5
2
5
3
5
4
5
5
P r fY = 1g =
P r fY = 2g =
P r fY = 3g =
P r fY = 4g =
P r fY = 5g =
(:6)0(:4)5 = :01024
P r fY = 0g =
(:6)1(:4)4 = :0768
Mean (or expectation)
E (Y ) = (0)P rfY = 0g + (1)P rfY = 1g
+(2)P rfY = 2g + (3)P rfY = 3g
+(4)P rfY = 4g + (5)P rfY = 5g
(:6)2(:4)3 = :2304
= 3
(:6)3(:4)2 = :3456
(:6)4(:4)1 = :2592
(:6)5(:4)0 = :07776
In general
n
E (Y ) =
k k(1 )n k
k=0 k
= n
n
!
X
Note that 0! is dened as one,
and
5!
5
5!
=
=
=1
0
0!5! (1)(5!)
!
43
%
Sample size
-
population success rate
44
Variance:
V (Y ) =
Moment Generating Function:
n
X
k=0
n
X
(k n)2 P rfY = kg
(k n)2 n k (1 )n k
k
k=0
= n (1 )
=
When Y Bin(n = 5; = :6)
we have
n
X
k=0
etk
n k
(1 )n k
k
= (1 + et)n
g(t)
The r-th moment about zero is
@ r g(t)
E (Y r ) =
@tr
evaluated at t = 0. The r-th central
moment is
V (Y ) = (5)(:6)(:4) = 1:2
The standard deviation is
r
r
E (etY ) =
= V (Y ) = n (1 )
= 1:0954
E (Y
=
r
E (Y ))r
( 1)j r [E (Y )]j E (Y r j )
j
j =0
X
45
46
Third central moment:
E (Y
n)3 =
=
n
X
k=0
n
X
k=0
Skewness:
(k n)3 P rfY = kg
q
(k n)3 n k (1 )n
k
k
1 =
=
= n (1 )(1 2)
Fourth central moment:
E (Y
n)4 =
=
n
X
Kurtosis:
(k n)4 P rfY = kg
k=0
n
X
(k n)4 n k (1 )n
k=0
k
= 3(n (1 ))2
k
+n (1 )(1 6(1 ))
47
2 =
E (Y n)3
[V (Y )]3=2
(1 2)
n (1 )
q
E (Y n)4
[V (Y )]2
= 3 + (1 6(1 ))
n (1 )
48
Inference for Suppose Y is the number of successful
outcomes for a series of n independent
and identical binary trials (simple
random sampling with replacement)
where is the probability of
obtaining a successful outcome on any
single trial (selection).
Y
Bin(n; )
Dene the sample proportion
Y
p=
n
Properties of p = Y=n
maximum likelihood estimator
for E (p) = V (p) = (1n )
p(1 ) N (0; 1), for large n
r
n
r
N (0; 1), for large n
p)
p p(1
n
Note: n is \large" if
n 5 and n(1 ) 5.
49
50
Binomial(10, 0.25) Distribution
0.0
0.0
0.05
0.1
0.10
0.2
0.15
0.20
0.3
0.25
0.4
0.30
Binomial(10, 0.10) Distribution
-2
0
2
4
6
8
10
51
-2
0
2
4
6
8
10
52
Binomial(10, 0.95) Distribution
0.0
0.0
0.1
0.05
0.2
0.10
0.3
0.15
0.4
0.20
0.5
0.6
0.25
Binomial(10, 0.50) Distribution
0
2
4
6
8
10
0
2
4
6
8
10
53
12
54
Binomial(25, 0.10) Distribution
0.0
0.0
0.05
0.1
0.10
0.2
0.15
0.3
0.20
0.4
0.25
Binomial(10, 0.10) Distribution
-2
0
2
4
6
8
10
55
0
5
10
15
20
25
56
Binomial(100, 0.10) Distribution
0.0
0.0
0.02
0.05
0.04
0.06
0.10
0.08
0.10
0.15
0.12
Binomial(50, 0.10) Distribution
0
10
20
30
40
50
0
20
40
60
80
57
Tests of hypotheses:
Example: Is the sex ratio 1:1
for early run Chinook Salmon
caught by hook and line?
count percent
females 172 59.11
males 119 40.89
291
100
58
Test against a two-sided
alternative
null hypothesis H0 : = 0
alternative
HA : 6= 0
Reject H0 if
jZ j = jp (1 0j ) > Z=2
s
Here the estimated proportion of
females is
172
= 0:5911
p=
291
59
0
n
0
is the signicance level (or
type I error level).
60
Example:
Z=
H0 : sex ratio is 1:1
(or = 0 = 0:5)
p 0
0(1 0)
n
=
0:5911 :5
s
(:5)(:5)
291
= 3:107
where is the proportion of
females among all early run
Chinook salmon that could be
caught by hook and line.
p=
s
Since
Z:025 = 1:96
and
Z = 3:107 > 1:96
the null hypothesis is rejected at
the = :05 level of signicance.
172
= 0:5911
291
p-value = .00095 + .00095
= .0019
62
61
/* Program to analyze the 1999 Chinook
salmon data. This program is stored
in the file
chinook1.sas
----------------------- gear=1 run=1 ----------------------The FREQ Procedure
sex
*/
data set1;
infile 'c:\courses\alaska\sas\hdata.dat';
input (year month day biweek run gear
age sex length)
(4. 2. 2. 1. 1. 1. 2. $1. 4.);
rage=int(age/10);
oage=age-(10*rage);
run;
proc sort data=set1; by gear run;
run;
F
M
Frequency
172
119
Percent
Cumulative
Frequency
59.11
40.89
172
291
Cumulative
Percent
59.11
100.00
Binomial Proportion for sex = F
Proportion
ASE
95% Lower Conf Limit
95% Upper Conf Limit
0.5911
0.0288
0.5346
0.6476
Exact Conf Limits
95% Lower Conf Limit
95% Upper Conf Limit
0.5322
0.6481
Test of H0: Proportion = 0.5
ASE under H0
Z
One-sided Pr > Z
Two-sided Pr > |Z|
proc freq data=set1; by gear run;
table sex / binomial (p=.5);
run;
sample size =
63
0.0293
3.1069
0.0009
0.0019
291
64
----------------------- gear=1 run=2 -----------------------
----------------------- gear=2 run=1 -----------------------
The FREQ Procedure
The FREQ Procedure
sex
F
M
Frequency
199
162
Percent
Cumulative
Frequency
55.12
44.88
199
361
Cumulative
Percent
55.12
100.00
sex
Frequency
F
M
165
202
Percent
Cumulative
Frequency
44.96
55.04
165
367
44.96
100.00
Binomial Proportion for sex = F
Binomial Proportion for sex = F
Proportion
ASE
95% Lower Conf Limit
95% Upper Conf Limit
0.5512
0.0262
0.4999
0.6026
Proportion
ASE
95% Lower Conf Limit
95% Upper Conf Limit
0.4496
0.0260
0.3987
0.5005
Exact Conf Limits
95% Lower Conf Limit
95% Upper Conf Limit
0.4983
0.6033
Exact Conf Limits
95% Lower Conf Limit
95% Upper Conf Limit
0.3979
0.5021
Test of H0: Proportion = 0.5
ASE under H0
Z
One-sided Pr > Z
Two-sided Pr > |Z|
Test of H0: Proportion = 0.5
0.0263
1.9474
0.0257
0.0515
ASE under H0
Z
One-sided Pr < Z
Two-sided Pr > |Z|
Sample Size = 361
0.0261
-1.9314
0.0267
0.0534
Sample Size = 367
65
----------------------- gear=2 run=2 ----------------------The FREQ Procedure
sex
F
M
Cumulative
Percent
Frequency
168
268
Percent
Cumulative
Frequency
38.53
61.47
168
436
Cumulative
Percent
An approximate (1-) 100%
condence interval for includes
all values of 0 that satisfy
38.53
100.00
Binomial Proportion for sex = F
r
Proportion
ASE
95% Lower Conf Limit
95% Upper Conf Limit
0.3853
0.0233
0.3396
0.4310
Exact Conf Limits
95% Lower Conf Limit
95% Upper Conf Limit
0.3394
0.4328
Test of H0: Proportion = 0.5
ASE under H0
Z
One-sided Pr < Z
Two-sided Pr > |Z|
66
0.0239
-4.7891
<.0001
<.0001
Sample Size = 436
67
jp 0j < Z
=2
1
n p(1
p)
The upper and lower limits are
pU = p + Z=2
pL = p Z=2
v
u
u
u
t
v
u
u
u
t
p(1 p)
n
p(1 p)
n
68
Example:
(pL; pU ) is not an exact 95%
condence interval because
the binomial distribution is
discrete
bounded
skewed
95% condence interval for proportion of females among early
run salmon that could be caught
with hook and line
=2
Z=2
n
p
a large sample normal
approximation is used
=
=
=
=
=
1 0:95 = :05
:025
1:96
291
172
0:5911 =
291
70
69
v
u
u
t
pU = :5911 + (1:96)
= :6476
v
u
u
t
pL = :5911 (1:96)
= :5346
\Exact" condence intervals:
(:5911)(:4089)
291
(:5911)(:4089)
291
An approximate 95% condence
interval is
(.535, .648)
%
round nal answer to 13 (Std. error for p)
The lower limit is the value of for which
= P r(Y
2
n
y)
n j
(1 )n j
j
j =y
y 1
1
=
t (1 t)n y dt
0
(y; n y + 1)
=
X
Z
I (y; n y + 1)
Note: Use integration by parts with
(a) (b) and (a) = (a 1)!
(a; b) =
(a + b)
s
= 1 (p)(1 p)
3
n
1
= (:0293) = :0097
3
71
72
The upper limit is the value of for which
Note that F = F(v2;v1)=2
satises
v1 v2
= I v1
;
v1+v2 F 2 2
2
0
1
@
A
= P r(Y
2
n j
(1 )n j
j
j =0
n
n j
= 1
(1 )n j
j
j =y+1
=
%
pL =
y
y)
X
X
v1
v1 + v2F(v2;v1);=2
or
where
n
n j
1 =
(1 )n j
2
j
j =y+1
y 1
1
=
t (1 t)n y dt
( y + 1; n y ) 0
X
v1 = 2Y
v2 = 2(n Y + 1)
Z
I (y + 1; n y)
73
74
Since
Note that F = F(v4;v3);1
satises
1 = I v3
2
v4+v3 F
%
pU =
where
=2
v3 v4
;
2 2
1
F(v4;v3);1 =2 =
F
(v3;v4)=2
we have
!
pU =
v3
v4 + v3F(v4;v3);1 =2
v3 F(v3;v4);=2
v4 + v3 F(v3;v4);=2
where
v3 = 2(Y + 1)
v4 = 2(n Y )
v3 = 2(Y + 1)
v4 = 2(n Y )
D. Collette, (1991) Modelling Binary Data, page 25.
Johnson & Kotz (1969), Discrete Distributions, page
59.
75
76
Example:
Observe Y = 1 success
in n = 10 trials
Then
y
p = = 0:10
n
Exact 95% condence interval:
v1 = 2Y = 2
v2 = 2(n Y + 1) = 20
F(20;2):025 = 39:45
Construct a 95% condence
interval for v3 = 2(Y + 1) = 4
Large sample normal approximation
p
1:96 p(1 p)=n
=) 0:10 :186
v4 = 2(n Y ) = 18
q
F(4;18):025 = 3:61
use (0, 0.286)??
78
77
/* This is a SAS/IML program to compute
confidence intervals for a binomial
sucess rate. The program is stored
in the file
The condence limits are
binci.sas
2
pL =
= 0:0025
2 + (20)(39:45)
and
(4)(3:61)
pU =
= 0:445
18 + (4)(3:61)
79
*/
proc iml;
start binci;
x = 1;
n = 10;
a = .95;
* Enter number of successes;
* Enter total number of trials;
* Enter confidence level;
a2=1-((1-a)/2);
v1=x;
v2=n-x+1;
v3=v1+1;
v4=v2-1;
invb1=1;
if(v1 > 0) then invb1 = betainv(a2,v2,v1);
80
invb2=1;
if(v4 > 0) then invb2 = betainv(a2,v3,v4);
pl= 1-invb1;
pu = invb2;
print 'Exact confidence intervals';
print pl pu;
z = probit(a2);
p = v1/n;
pl = p - z*sqrt(p*(1-p)/n);
pu = p + z*sqrt(p*(1-p)/n);
print 'Confidence intervals based on the',
'large sample normal approximation';
print pl pu;
finish;
Exact confidence intervals
PL
PU
0.0025286
0.4450161
Confidence intervals based on the
large sample normal approximation
PL
PU
-0.085939
0.2859385
run binci;
81
#
#
This code is stored in the file
binci.ssc
#
#
#
#
#
#
#
#
#
#
#
#
#
#
This code creates confidence
intervals for a binomial
proportion using
x
n
a
p
#
#
Large sample normal theory
Confidence interval based on
large sample normal theory. This
a2 <- 1-((1-a)/2)
plower <- p - qnorm(a2)*sqrt(p*(1-p)/n)
pupper <- p + qnorm(a2)*sqrt(p*(1-p)/n)
An exact interval
Another approximation
# Round to 5 decimal places and print
# results
x = observed number of successes
n = number of trials
a = level of confidence (e.g. 0.95)
<<<<-
82
round(plower,5)
round(pupper,5)
c(1)
c(10)
c(.95)
x/n
83
84
#
#
#
#
Use quantiles from the F-distribution
to construct an exact confidence interval
The function qf( , , ) computes
quantiles of the F-distribution
a2 <- 1-((1-a)/2)
if (x > 0) f1 <- qf(a2,2*(n-x+1),2*x)
else f1 <- c(1)
plower <- x/(x + (n-x+1)*f1)
if (n > x) f2 <- qf(a2,2*(x+1),2*(n-x))
else f2 <- c(1)
pupper <- (x+1)*f2/((n-x)+(x+1)*f2)
#
#
#
#
#
#
#
#
The prop.test function creates
a confidence using a method
proposed by Fleiss, 2nd ed.
pages 14-15. It also can test
a null hypothesis that the
success rate is a specific value
using the option p=...
prop.test(x,n,conf.level=a,p=.45)
#
#
Print results
To compute and display just
the confidence interval use
prop.test(x,n)$conf.int
round(plower,5)
round(pupper,5)
85
#
#
x
n
a
p
#
#
This is the output for the code
stored in the file
binci.spl
<<<<-
86
#
#
#
#
c(1)
c(10)
c(.95)
x/n
Use quantiles from the F-distribution
to construct an exact confidence interval
The function qf( , , ) computes
quantiles of the F-distribution
a2 <- 1-((1-a)/2)
if (x > 0) f1 <- qf(a2,2*(n-x+1),2*x) else f1 <- c(1
[1] 39.44791
plower <- x/(x + (n-x+1)*f1)
if (n > x) f2 <- qf(a2,2*(x+1),2*(n-x)) else f2 <- c
[1] 3.608344
pupper <- (x+1)*f2/((n-x)+(x+1)*f2)
Confidence interval based on
large sample normal theory. This
a2 <- 1-((1-a)/2)
plower <- p + qnorm(a2)*sqrt(p*(1-p)/n)
pupper <- p - qnorm(a2)*sqrt(p*(1-p)/n)
#
# Round to 5 decimal places and print
# results
Print results
round(plower,5)
[1] 0.00253
round(pupper,5)
[1] 0.44502
round(plower,5)
[1] 0.28594
round(pupper,5)
[1] -0.08594
87
88
#
#
#
#
#
#
#
The prop.test function creates
a confidence using a method
proposed by Fleiss, 2nd ed.
pages 14-15. It also can test
a null hypothesis that the
success rate is a specific value
using the option p=...
95 percent confidence interval:
0.005242302 0.458846016
sample estimates:
prop'n in Group 1
0.1
prop.test(x,n,conf.level=a,p=.45)
Warning messages:
Expected counts < 5. Chi-square/normal
approximation may not be appropriate.
in: prop.test(x, n, conf.level = a, p = 0.45)
1-sample proportions test with continuity
correction
data: x out of n, null
X-square = 3.6364, df =
alternative hypothesis:
Group 1 is not equal to
probability 0.45
1, p-value = 0.0565
true P(success) in
0.45
#
#
To compute and display just
the confidence interval use
prop.test(x,n)$conf.int
[1] 0.005242302 0.458846016
attr(, "conf.level"):
[1] 0.95
89
90
Suppose you observe
Y = 0 successes in n trails,
then
p=
Y
=0
n
and a 95% condence interval
based on the large sample
normal approximation yields
p (1:96) p(1 p)=n
r
Results for "exact" 95%
condence intervals depend
on n:
n
5
10
20
50
100
1000
Lower
Limit
0
0
0
0
0
0
Upper
Limit
.5218
.3085
.1684
.07112
.03622
.003682
=) (0; 0)
91
92
Example: Drinking Survey
In a survey of people aged 18 or over
in England and Wales conducted by
Gallup in 1985, each respondent was
asked
\Thinking about the last 7 days,
on how many of those days did
you have at least one alcoholic
drink?"
Binary response:
Drinker: at least one day
Non-drinker: zero days
Construct a 95% condence interval
for
= proportion of \drinkers" in
18 and over population
Sample size: n = 928
Observed number of drinkers:
Y = 570
Sample proportion: p = 570
928 = :614
Approximate 95% condence interval:
p (1:96) p(1n p)
=) (0:5829; 0:6455)
r
"Exact" method: (0.5820, 0.6457)
94
93
Is a normal approximation to
a binomial distribution an
appropriate model to use in
this case?
Example: Iowa Poll (1999)
Simple random sample with
replacement
plus or minus :035
Simple random sample
without replacement
Cluster sampling
Stratication
95
Sample size: n = 801
Maximum margin of error:
96
How many observations are
needed?
Sample Size (n)
p M = .035 M = .01
.001
4
39
.01
32
381
.1
283
3458
.2
502
6147
.3
659
8068
.4
753
9220
.5
784
9604
.6
753
9220
.7
659
8068
.8
502
6147
.9
283
3458
.99
32
381
.999
4
39
Specify a margin of error
M = 0.035
M = 0.010
Compute n:
v
u
u
u
t
M = (1:96)
=)
p(1 p)
n
1:96 2
n=
p(1 p)
M
2
3
4
5
%
This is maximized when p = 0.5
97
How many observations are needed to
test the null hypothesis H0 : = 0?
Specify a signicance level
(or Type I error level)
= probability of rejecting
H0 : = 0 when H0 is true
Typical values
= .10
= .05
= .01
98
Specify an alternative of
practical importance
H0 : = 0
HA : = A
Specify the desired power of the
test to reject H0 : = 0 when
HA : = A is true.
power = probability that
H0 : = 0 is rejected
when HA : = A is
true.
99
100
Typical values are
power = 0.80
power = 0.90
power = 0.95
Sample size needed to test
H0 : = 0 vs. HA : 6= A
(two sided alternative) is
Type II error probability
= 1 - power
= probability that
H0 : = 0 is not
rejected when
HA : = A is true
q
Derivation of sample size formula:
= power
= P r reject H : = = A
0
n
=
Pr
0
8
>
<
>
:
P
q
+ Pr
=
=
=
Pr
8
>
<
>
:
0
(0 (1
(
n
P
0 )
0
0(1 0 )
n
P
q
0
(0 (1
n
(
0 )
> Z=2 = A
=2
>Z
P r P > 0 + Z=2
Pr
8
>
<
>
:
P
q
%
A
A(1 A )
n
this is approximately
standard normal
>
r
= A
Example:
o
9
>
=
)
9
>
=
>
;
0(1 0) = A
n
0 A + Z=2
q
Want an 80% chance of rejecting the
hypothesis of a 1:1 sex ratio among
salmon caught in nets during the early
run, when at least 58% are female or
less than 42% are female. Will use
= .05 as the Type I error level.
>
;
< Z=2 = A
2
102
101
1
q
Z A(1 A) + Z=2 0(1 0)
n=
[0 A]2
q
)
9
0(1 0 ) >
=
n
A(1 A )
n
>
;
%
this must be the
lower percentile
of the standard
normal distribution
103
H0 : = 0 = :50
HA : 6= 0 and a = :58 or .42
and j0 Aj = .08
104
Sample size for tests against one
sided alternatives:
Signicance level: = .05
Z=2 = Z:025 = 1:96
H0 : = 0 vs. H0 : > 0
power = .80 and
= 1 - power = .20
ZB = Z:20 = :842
q
q
q
Z A(1 A) + Z 0(1 0)
n=
[0 A]2
q
:842 (:58)(:42) + 1:96 (:5)(:5)
n =
[:58 :50]2
= 304:3
2
2
-
round up to n = 305
106
105
SAS and S-PLUS code
SAS and JMP:
No built in function for sample
size determination for binomial
proportions
S-PLUS for Windows:
Click on Statistics
Select Power and Sample Size
Select Binomial Proportion
Fill in the boxes
Click the Options tab and click o
Continuity Correction
Click Okay
SAS code:
/* This program computes sample sizes needed to
obtain a test of a hypothesis about a single
proportion with a specified power value.
It also computes the number of observations
needed to obtain a confidence interval with
a specified accuracy. This program is stored
in the file
size1p.sas
*/
proc iml;
start samples;
p0 = .7;
pa = {.6 .5 .4 .3};
/* Enter alternatives */
power = {.8 .9 .95 .99};
alpha = .05;
107
/* Enter the proportion
corrsponding to the
null hypothesis */
/* Power values */
/* Type I error level */
108
za = probit(1-alpha/2);
za1 = probit(1-alpha);
nb = ncol(power);
np = ncol(pa);
size = j(1,np);
size1 = j(1,np);
print,,,,,,, p0 p2 alpha power;
size = int(size) + j(1,np);
print 'Sample sizes (2-sided test):' size;
print ,,,,,, 'Sample sizes for tests of '
'a single proportion';
do i1 = 1 to np; /* Cycle across alternatives */
p2 = pa[1,i1];
do i2 = 1 to nb; /* Cycle across power levels */
zb = probit(power[1,i2]);
size[1,i2] = ((za*sqrt(p0*(1-p0))
+zb*sqrt(p2*(1-p2)))**2)/((p0-p2)**2);
size1[1,i2] = ((za1*sqrt(p0*(1-p0))
+zb*sqrt(p2*(1-p2)))**2)/((p0-p2)**2);
end;
size = int(size1) + j(1,np);
print 'Sample sizes (1-sided test):' size;
end;
/* Compute sample size needed to obtain a
confidence interval with a specified
margin of error
*/
p = {.5 .4 .3 .2 .1 .01}; /* Enter possible
values of the
true proportion */
level = 0.95;
/* Enter confidence level */
me = { 0.035};
/* Enter desired margin of
error */
109
110
Sample sizes for tests of a single proportion
P0
0.7
/* Compute needed sample sizes */
p = t(p);
alpha2 = (1-level)/2;
np = nrow(p);
n = ((probit(1-alpha2)/me)**2)#p#(j(np,1)-p);
n = int(n) + j(np,1);
ALPHA
0.05
POWER
0.8
percent = level*100;
print ,,,,,, 'Sample sizes for ' percent
'percent confidence interval:';
print p n;
P2
0.6
0.9
0.95
0.99
Sample sizes (2-sided test):
SIZE
172
finish;
233
291
416
Sample sizes (1-sided test):
SIZE
run samples;
136
111
191
244
359
112
P0
P2
ALPHA
P0
P2
ALPHA
0.7
0.5
0.05
0.7
0.4
0.05
POWER
0.8
0.9
POWER
0.95
0.99
0.8
Sample sizes (2-sided test):
SIZE
44
60
75
107
49
63
92
0.95
20
26
33
P2
ALPHA
0.7
0.3
0.05
POWER
0.8
0.9
16
22
28
14
0.95
12
114
0.99
18
25
Sample sizes (1-sided test):
SIZE
9
40
Sample sizes for 95 percent confidence interval:
Sample sizes (2-sided test):
SIZE
11
47
Sample sizes (1-sided test):
SIZE
113
P0
0.99
Sample sizes (2-sided test):
SIZE
Sample sizes (1-sided test):
SIZE
35
0.9
15
P
N
0.5
0.4
0.3
0.2
0.1
0.01
784
753
659
502
283
32
21
115
116
S-PLUS code
#
#
#
#
#
#
#
This program computes sample sizes needed to
obtain a test of a hypothesis about a single
proportion with a specified power value.
It also computes the number of observations
needed to obtain a confidence interval with
a specified accuracy. This program is stored
in the file
size1p.ssc
# Specify the null hypothesis
p0 <- c(.7)
za <- qnorm(1-alpha/2)
za1 <- qnorm(1-alpha)
nb <- length(power)
np <- length(pa)
cat("Sample sizes for tests of a single proportion")
# Cycle across the list of alternatives
# and obtain a sample size for each of
# the requested power values
for(i1 in 1:np){
p2 <- pa[i1]
zb <- qnorm(power)
size <- ((za*sqrt(p0*(1-p0))
+zb*sqrt(p2*(1-p2)))^2)/((p0-p2)^2)
size1 <- ((za1*sqrt(p0*(1-p0))
+zb*sqrt(p2*(1-p2)))^2)/((p0-p2)^2)
# Enter a vector of alternatives
pa <- c(.6, .5, .4, .3)
# Enter power values
power <- c(.8, .9, .95, .99)
# Ener the type I error level
alpha <- .05
117
# Increase sample size to next largest integer
# and print results
size <- ceiling(size)
cat("\n \n \n p0=",p0,"pA=", p2, "alpha=",
alpha,"power=", power)
cat("\n Sample sizes (2-sided test): " , size)
size1 <- ceiling(size1)
cat("\n \n p0=",p0,"pA=", p2, "alpha=",
alpha,"power=", power)
cat("\n Sample sizes (1-sided test): " , size1)
}
# Compute sample size needed to obtain a
# confidence interval with a specified
# margin of error
# Enter possible values of the true proportion
p <- c(.5, .4, .3, .2, .1, .01)
118
# Enter desired margin of error
me <- c(0.035)
# Compute needed sample sizes
alpha2 <- (1.0-level)/2
np <- length(p)
one <- rep(1,np)
n <-((qnorm(one-alpha2)/me)^2)*p*(one-p)
n <- ceiling(n)
sizes<-t(rbind(p,n))
percent <- level*100
cat("\n \n \n Sample sizes for", percent,
"percent confidence intervals \n",
"with margin of error ", me, "\n")
sizes
# Enter the confidence level
level <- c(0.95)
119
120
# This is the output from the S-PLUS code
# in the file size1p.ssc.
Sample sizes for tests of a single proportion
p0= 0.7 pA= 0.6 alpha= 0.05
power= 0.8 0.9 0.95 0.99
Sample sizes (2-sided test): 172 233 291 416
p0= 0.7 pA= 0.6 alpha= 0.05
power= 0.8 0.9 0.95 0.99
Sample sizes (1-sided test): 136 191 244 359
p0= 0.7 pA= 0.5 alpha= 0.05
power= 0.8 0.9 0.95 0.99
Sample sizes (2-sided test): 44 60 75 107
p0= 0.7 pA= 0.5 alpha= 0.05
power= 0.8 0.9 0.95 0.99
Sample sizes (1-sided test): 35 49 63 92
121
Sample sizes for 95 percent confidence intervals
with margin of error 0.035
[1,]
[2,]
[3,]
[4,]
[5,]
[6,]
p
0.50
0.40
0.30
0.20
0.10
0.01
n
784
753
659
502
283
32
123
p0= 0.7 pA= 0.4 alpha= 0.05
power= 0.8 0.9 0.95 0.99
Sample sizes (2-sided test): 20 26 33 47
p0= 0.7 pA= 0.4 alpha= 0.05
power= 0.8 0.9 0.95 0.99
Sample sizes (1-sided test): 16 22 28 40
p0= 0.7 pA= 0.3 alpha= 0.05
power= 0.8 0.9 0.95 0.99
Sample sizes (2-sided test): 11 14 18 25
p0= 0.7 pA= 0.3 alpha= 0.05
power= 0.8 0.9 0.95 0.99
Sample sizes (1-sided test): 9 12 15 21
122
Download