Chapter 3: Common Distributions

advertisement
Chapter 5: Common Distributions
In this chapter we examine four of the distributions that will be frequently encountered later
in the course.
5.1
The Normal Distribution
The normal distribution is the most widely used distribution in statistics. Continuous data
such as mass, length, etc, can often be modelled using a normal distribution.
The normal distribution has two parameters- the mean (  ) and variance (  2 ). If a random
variable X has a normal distribution then we can write this as:
X ~ N[  ,  2 ].
A normal distribution with  = 0 and  = 1 is referred to as a standard normal distribution
(and a random variable with this distribution is usually denoted Z).
Important result: If X is a random variable distributed as N[  ,  2 ] , then
X 

~ N[0,1].
The process of subtracting the mean and dividing by the standard deviation is referred to as
standardisation:
General Normal
X ~ N[  ,  2 ]
Standard Normal
Z ~ N[0, 1]
z
x

0.04
0.06
dnorm(x, sd = 3)
0.08
0.10
0.12
0.4
0.3
0.2
0.00
0.0
0.02
0.1
dnorm(x)
-4
-2
0
2
4
-5
pdf of N[0,1]
0
pdf of N[0,9]
 Example:
The fully grown lengths (in mm) of a certain insect can be regarded as having the following
normal distribution:
X ~ N[64, 16].
What is the probability that an insect has length less than 59 mm?
 Applying the standardisation formula,
x   59  64
z

 1.25.

4
Thus,
P( X  59)  P(Z  1.25)  P(Z  1.25)  1  (1.25)  1  0.8944  0.1056 .
3.1.1
Percentage points
Definition: Consider a random variable X with some distribution. The (upper) 100 %
point is the value of x such that:
P(X > x) = .
For the standard normal distribution, we will denote the (upper) 100% point by z , i.e.:
P(Z > z ) =  .
X ~ N[  ,  2 ]
Z ~ N[0, 1]
z
x

5
In statistical tables (e.g. Lindley and Scott), there is a separate percentage point table covering
the most used values of . In Lindley and Scott,
 P represents 100 ,
 x(P) represents the value of z .
Extract:
P = 100
10%
5%
2%
1%
0.1%

0.01
0.05
0.02
0.01
0.001
x(P) = z
1.2816
1.6449
2.0537
2.3263
3.0902
The 10% point for
the standard
normal is
z 0.1  1.2816 .
 Example 1:
Let X ~ N[50, 16]. Find the value of x such that P(X > x) = 0.05, i.e. find the (upper) 5% point.
X  50
~ N[0,1].
4
The 5% point for the standard normal is z 0.05  1.6449 .
 If X ~ N[50, 16], then
Thus, the 5% point for a N[50, 16] distribution can be obtained by solving
So, the 5% point is x  50  1.6449  4  56.5796 .
x  50
 1.6449 .
4
 Example 2:
Let Z ~ N[0, 1]. Find the value of z such that P(Z < z) = 0.01 (i.e. find the lower 1% point).
 The upper 1% point for a standard normal is z 0.01  2.3263 . Therefore, P(Z > 2.3263) =
0.01.
By symmetry, we must also have P(Z < -2.3263) = 0.01. So, the lower 1% point is –2.3263.
5.2
The chi-squared distribution
3.2.1
Introduction
The chi-squared (  2 ) distribution has a single parameter called the degrees of freedom- this
can be any positive integer. The  2 distribution with n degrees of freedom is denoted  n2 .
Probability density function:
If X ~  n2 , then the p.d.f. of X (for x > 0) is given by:
1
f ( x)  n / 2
x n / 21e  x / 2 .
n
2 2

For x  0, f ( x)  0.
This density is written in terms of the gamma function. Some of the key properties of this
function are:
 ( x)  ( x  1)( x  1);

12  

 ;

( x)  ( x  1)!
if x is a natural number.
The degrees of freedom, n, define the shape of the  2 density. For n < 3, the density has a
mode at zero. For n  3, the mode moves further away from zero as n increases. The shapes
of some specific densities are given below.
Graph of several chi-squared densities
0.6
n=
n=
n=
n=
0.5
1
2
4
8
0.4
0.3
0.2
0.1
0
5.2.2
0
2
4
6
8
10
12
Finding probabilities
Probabilities associated with the  2 distribution can be looked up in probability tables.
Lindley and Scott list the d.o.f. (which they denote ) along the top of each column. Then for
each value x listed, the values in the table are the probability that X < x.
Extracts:
 = 3.0
x
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
etc
 = 7.0
P(X < x)
0.0000
0.0811
0.1987
0.3177
0.4276
0.5247
0.6084
0.6792
0.7385
 Example 1:
If X ~  32 , then P(X < 2.5) = 0.5247.
x
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
P(X < x)
0.0052
0.0402
0.1150
0.2202
0.3400
0.4603
0.5711
0.6674
0.7473
0.8114
 Example 2:
Suppose X ~  72 . Find P(X > 10).
 Now, from tables we can find, P(X < 10) = 0.8114  P(X > 10) = 1 – 0.8114 = 0.1886.
3.2.2
Percentage points
The 100 % point for the  n2 distribution is denoted  n2, . Therefore, if X ~  n2 , then
P(X >  n2, ) = .
The percentage points of the  2 distribution are in a separate table in Lindley and Scott.
Extract:
P
99
95
10
5
1
 = 1.0
 = 2.0
 = 3.0
 = 4.0
 = 5.0
 = 6.0
 = 7.0
 = 8.0
0.000
0.020
0.115
0.297
0.554
0.872
1.239
1.646
0.004
0.103
0.352
0.711
1.145
1.635
2.167
2.733
2.706
4.606
6.251
7.779
9.236
10.64
12.02
13.36
3.841
5.991
7.815
9.488
11.07
12.59
14.07
15.51
6.635
9.210
11.34
13.28
15.09
16.81
18.48
20.09
 52, 0.1  9.236.
So P(X > 9.236) = 0.1
In this table, the degrees of freedom for the distribution are listed going down the rows and P
is 100.
The chi-squared distribution is not symmetric (unlike the normal distribution). So if we want
a lower percentage point (i.e. a value of x such that P(X < x) = ) , then we can't simply
negate the corresponding upper percentage point. Instead we need to find  n2,1 .
 Example 1:
Let X ~  82 . Find the lower 1% point (i.e. the value of x such that P(X < x) = 0.01).
 The lower 1% point is denoted  82, 0.99 , the value for which is 1.646.
 Example 2:
2
Suppose X ~  10
. Find the value of t for which P(X > t) = 0.1321.
 Here, t would be the 13.21% point for the distribution. But, 0.1321 is a non-standard
value of . So we need to use the distribution function table to find t.
P(X > t) = 0.1321  P(X < t) = 1 – 0.1321 = 0.8679.
Going through the distribution table we find that t = 15.
5.3
The Student t-distribution
5.3.1
Introduction
Definition: Suppose that we have two independent random variables Y and Z, such that:
Y ~ N[0, 1] and Z ~  n2 .
Then the random variable X defined by
Y
X
Z n
has a t-distribution with n degrees of freedom- denoted t n .
The t-distribution is symmetric about zero and its general shape is like the bell-shape of a
normal distribution. However, the tails of the t-distribution can approach zero much more
slowly than those of the normal distribution- i.e. the t-distribution is more heavy tailed than
the normal. The degrees of freedom define how heavy-tailed the t-distribution is.
Note:
The t-distribution with n = 1 is sometimes referred to as the Cauchy distribution. This is so
heavy tailed that its mean and variance do not exist! (This is because the integrals specifying
the mean and variance are not absolutely convergent.)
Important note:
The density of a t-distribution converges to that of the standard normal as n  .
The diagram below shows how the t-distribution varies for different degrees of freedom.
Comparing several t distributions with the standard normal
0.4
normal
t2
t5
t 20
0.35
0.3
Density
0.25
0.2
0.15
0.1
0.05
0
-3
-2
-1
0
x
1
2
3
5.3.2
Probabilities
Probabilities associated with the t-distribution can be looked up in tables. In Lindley and
Scott, the degrees of freedom are again denoted by  and are listed along the top of the
columns. Then for each value t listed, the values in the table are the probability that X < t.
 Example 1:
Let X ~ t 3 . Then P(X < 2.5) = 0.9561.
 Example 2:
Let X ~ t12 . Find P(X > 2.5).
 P(X > 2.5) = 1 - P(X < 2.5) = 1 - 0.986 = 0.014.
Percentage points
The 100 % point for the t n distribution is denoted by t n , . If X ~ t n , then:
P(X > t n , ) = .
Percentage points for the t-distribution are tabulated separately. The degrees of freedom for
the distribution are listed down the rows and P = 100.
 Example 1:
Find the 5% point for t 6 .
 Directly from tables, this is seen to be t 6, 0.05  1.943 . (Thus P(X > 1.943) = 0.05.)
As the t-distribution is symmetric, finding lower percentage points is simple.
 Example 2:
Let X ~ t10 . Find the value of t such that P(X < t) = 0.01 (i.e. find the lower 1% point).
 The upper 1% point is t10,0.01  2.764.
But
P(X > 2.764) = 0.01  P(X < -2.764) = 0.01.
So, the lower 1% point, t, is -2.764.
Note: To find non-standard percentage points (such as the 12.5% point, for example), we
need to use the t-distribution function table.
5.3The (Fisher’s) F-distribution
5.4.1
Introduction
Definition: Consider two independent random variables Y and Z such that
nY ~  n2 and mZ ~  m2 .
The random variable X defined by
Y
X
Z
is then said to have an F-distribution with n and m degrees of freedom- denoted Fn, m .
The F-distribution therefore has two parameters, both of which are degrees of freedom. The
order of the degrees of freedom is important! The Fn ,m distribution is not the same as the
Fm, n distribution.
Note: The density for the F-distribution is only defined for positive values of x. The values
of the two degrees of freedom define the shape of the distribution. Plots of the F-distribution
for various values of n and m are shown below.
Graphs of several F distributions
1
n=2, m=2
n=4, m=4
n=8, m=8
n=20, m=20
0.9
0.8
0.7
Density
0.6
0.5
0.4
0.3
0.2
0.1
0
0
1
2
3
x
4
5
6
Graphs of several more F distributions
1
n=
n=
n=
n=
0.9
0.8
2, m = 4
4, m = 2
5, m = 10
10, m = 20
0.7
Density
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5
2
2.5
x
3
3.5
4
4.5
5
Lindley and Scott do not have tables for looking up probabilities associated with the Fdistribution.
3.3.2
Percentage points
Separate tables giving 10, 5, 2.5, 1, 0.5 and 0.1 percentage points for F-distributions with
different combinations of degrees of freedom can be found in Lindley and Scott.
We will denote the (upper) 100 % point for the Fn, m distribution by Fn, m, . If X ~ Fn, m ,
then:
P(X > Fn, m, ) = .
In the table of the 100 percentage points for the F-distribution, the first degrees of freedom
is denoted  1 and listed along the columns. The second degrees of freedom is denoted by  2
and listed down the rows.
Extract: 1% points of the F-distribution
1 
2 
1
2
3
4
5
1
4052
98.50
34.12
21.20
16.26
2
4999
99.00
30.82
18.00
13.27
3
5403
99.17
29.46
16.69
12.06
4
5625
99.25
28.71
15.98
11.39
5
5764
99.30
28.24
15.52
10.97
The (upper) 1% point for an
F5, 3 distribution is 28.24. We
write F5, 3, 0.01  28.24.
 Example:
Find the 5% point for both the F5,10 and the F10, 5 distributions.
 From the 5% points table:
F5,10, 0.05  3.326
F10, 5, 0.05  4.735
Notice that these are not the same.
The tables in Lindley and Scott give the upper percentage points only- i.e. they give the
values of x such that P(X > x) = , for small values of . Since the F-distribution is not
symmetric, to find lower upper percentage points we cannot simply use the negative of the
corresponding upper percentage point:
P( X  x)  P( X   x).
The density is in fact not even defined for x < 0.
5.3.3
Finding lower percentage points
Result: Suppose that X 
Y
~ Fn, m . Then
Z
X 1 
Z
~ Fm, n .
Y
Proof:
Y
~ Fn, m if nY ~  n2 and mZ ~  m2 .
Z
But by definition of the F-distribution, this means that
Z
~ Fm, n
Y
as required.
X
We can use this result to find lower percentage points for F-distributions:
Important result:
The lower 100 percentage point for the Fn, m distribution is the reciprocal of the upper
100 percentage point of the Fm, n distribution.
Proof:
If X ~ Fn, m and x represents the lower 100 percentage point for this distribution, then P(X <
x) = .
But
P( X  x )  
 1 1
 P     .
 X x
1
1
~ Fm, n then is (by definition) the upper 100 percentage point of the Fm, n
x
X
distribution.
1
So, x 
.
Fm, n,
As
 Example 1:
Let X ~ F5,10 . Suppose we wish to find x such that P(X < x) = 0.05- i.e. we want to find the
lower 5% point of the F5,10 distribution.
 The lower 5% point of the F5,10 distribution is the reciprocal of the upper 5% point of
F10, 5 distribution.
So,
x
1
F10, 5, 0.05

1
 0.2112 .
4.735
 Example 2:
Suppose X ~ F4,7 . Find the upper and lower 10% points.
 The upper 10% point can be found directly from tables:
F4, 7, 0.1  2.961 .
The lower 10% point is the reciprocal of the upper 10% point of the F7, 4 distribution:
Lower 10% point = F4, 7, 0.9 
1
1

 0.2513 .
F7, 4, 0.1 3.979
 Exercise:
Suppose X ~ F2, 4 . Find the upper and lower 1% points.
5.5
Some additional facts about distributions
1) If X 1 ,..., X n are independent with X i ~ N[  i ,  i2 ] , i = 1, …, n, then
n
n

a 0   ai X i ~ N a 0   ai  i ,
i 1
i 1

n

i 1

 ai2 i2  ;
2) If X 1 ,..., X n are i.i.d. as N[0, 1], then
(a) X i2 ~  12 , for i = 1, 2, …, n;
n
(b)
 X i2 ~  n2 ;
i 1
3) If X 1 ,..., X n are independent with X i ~  k2i , i = 1, …, n, then
n
 X i ~  k2 ,
i 1
where k  k1  ...  k n .
4) If X ~ t n , then X 2 ~ F1, n .
These results are not proved in this course.
Chapter 6: Sampling Distributions
6.1
Parameters
The purpose of many statistical investigations is to learn about the distribution of some
random variable X. Many aspects about X's distribution may be of interest, but attention often
focuses on one or two particular population characteristics.
 Example 1:
A bakery needs to decide how many loaves of fresh bread it should put out on its shelves each
day. If they put out too many, then they will lose money as stale bread will not sell, and if
they put out too few, then they will lose potential sales. Therefore, to help the bakery make its
order, interest might focus on the mean number of loaves, , usually sold on a particular day.
 Example 2:
Suppose that a company has the job of packing a certain breakfast cereal into boxes, so that
each box approximately contains 500g of cereal. The weight of cereal in each box varies
around 500g due to the variability of the cereal product. The company wants to check that the
amount going into each box doesn't vary too much about 500g- weights greater than 500g will
lose the company money and weights less than 500g could lead to customer dissatisfaction. In
this case, attention may focus on the variability of weights in the boxes as described by , the
standard deviation of weights.
 Example 3:
When testing a new drug, a doctor might not be interested so much in the number of people
cured by the drug, but rather the proportion, , of people who are cured by the drug.
We call , , or  population parameters. To learn about such parameters, we can observe a
random sample of n observations, x1 ,..., x n , and then use these data to calculate estimates for
the parameter(s) of interest.
For example, a sample mean could be used to estimate .
Definition: Any quantity computed from values in a sample is called a (sample) statistic.
 Example:
All the numerical summaries introduced in Chapter 2 are statistics as they are all calculated
from values in the random sample. This includes statistics such as the sample mean (which
utilises all the observations in its calculation) and the sample median (which only takes
account of the middle observations).
It is important to realise that there is a difference between population parameters and sample
statistics. The population parameter is a characteristic of the distribution of the random
variable, is typically unknown and cannot be observed. By contrast, a statistic is a
characteristic of the sample and can be observed. For example, the population mean  has
some fixed (but unknown) value. On the other hand, the sample mean, X , can be observed
and therefore can be known for a particular sample. The observed value of X , however, can
vary from sample to sample (as different samples will give different values of x1 ,..., x n ). The
value of a statistic, therefore, is subject to sampling variability.
Definition: As a statistic is a function of the random variables X 1 ,..., X n , it is itself a
random variable. The distribution of a statistic is called its sampling distribution.
The sampling distribution of a statistic describes the long-run behaviour of the statistic's
values when many different samples, each of size n, are obtained and the value of the statistic
is computed for each sample.
6.2
The sampling distribution of the sample mean
To investigate the sampling distribution for X , we will consider several experiments.
Experiment 1: We generate 500 random samples (each of size n) from N[100, 400]. For
each of these 500 samples we calculate x , so we have a random sample of 500 observations
from the sampling distribution of X . This was repeated for n = 5, 20, 50.
Sampling distribution for the sample mean (n = 20)
60
70
50
60
40
Frequency
Frequency
Sampling distribution for the sample mean (n = 5)
30
20
50
40
30
20
10
10
0
0
80
90
100
110
85
120
95
105
115
Sample mean
Sample mean
Sampling distribution for the sample mean (n = 50)
90
80
Frequency
70
60
50
40
30
20
10
0
90
100
110
Sample mean
Observations: In each case the distribution seems roughly normal and it is clear that each of
these histograms is centred roughly at 100 (the mean of the normal distribution from which
the samples were generated). We can also see that as the sample size n increases, the
variability in the sampling distributions decreases (look carefully at the scales on the
horizontal axes).
These points can also be seen if we look at some statistics relating to each histogram above:
Sample size
n=5
n = 20
n = 50
Mean
100.07
99.83
100.05
Standard deviation
8.17
4.40
2.81
We will do a similar set of experiments to see what the sampling distribution for X is like
when we are not sampling from the normal distribution.
Experiment 2: We generate 500 random samples (each of size n) from a uniform U[0,1]
distribution. Again, for each of these 500 samples we calculate x , so we have a random
sample of 500 observations from the sampling distribution of X . This was repeated for n = 5,
10, 20, 50.
Note: If X ~ U[0, 1], then E[X] = 0.5 and Var[X] = 1/12 (so s.d. = 0.289).
Sampling distribution for the sample mean (n = 5)
Sampling distribution for the sample mean (n = 10)
80
60
70
60
Frequency
Frequency
50
40
30
20
50
40
30
20
10
10
0
0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.2
0.9
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Sample mean
Sample mean
Sampling distribution for the sample mean (n = 20)
Sampling distribution for the sample mean (n = 50)
90
60
80
50
60
Frequency
Frequency
70
50
40
30
40
30
20
20
10
10
0
0
0.25
0.35
0.45
0.55
0.65
0.75
0.35
Sample mean
0.45
0.55
0.65
Sample mean
Observations: The shapes of the histograms relating to the sample means look increasingly
more like normal distributions as n increases- this is despite the data being sampled from a
uniform distribution. The histograms in each case seem to centre on 0.5 (the mean of the U[0,
1] distribution). Also, the variability of the sampling distributions is decreasing as the sample
size becomes larger.
The mean and standard deviation for the data in the four situations above are given below:
Sample size
n=5
n = 10
n = 20
n = 50
Mean
0.491
0.504
0.502
0.499
Standard deviation
0.133
0.095
0.068
0.042
Important Result:
For an independent random X 1 ,..., X n from a distribution with mean  and variance  2 , the
sampling distribution for X has the following properties:
1. E[ X ]   .
2.
Var[ X ] 
therefore
2
n
. The standard deviation of X (often called the standard error) is

.
n
 2
3. If each X i ~ N[  ,  2 ] , then X ~ N   ,
 regardless of the size of n.
n 

4. If X 1 ,..., X n are not normally distributed then when n is large (say at least 30) the
 2
distribution of X is approximately N   ,
.
n 

Proof
1 n
 1 n
1
E[ X ]  E   X i    E[ X i ]  n   (as required).
n
n 1
 n 1
Because we are assuming that the random variables are independent, we can also write:
1 n
 1 n
1
2
(as required).
Var[ X ]  Var  X i   2  Var[ X i ]  2 n 2 
n
n
n 1
 n 1
A linear combination of normally distributed random variables also has a normal distribution.
The mean and variance are as given above.
Not proved here.
Note:
Part (4) of the above result is the Central Limit Theorem, an extremely powerful and useful
result in Statistics.
 Example 1:
X 1 ,..., X 20 are independently and identically distributed N[30, 5]. Find the sampling
distribution for X .
 Here n = 20 and so X ~ N[30, 5/20] = N[30, 0.25].
 Example 2:
X 1 ,..., X 40 are i.i.d Po(10) random variables. What approximately is the sampling
distribution for X ?
 The sample size can be considered large enough for the Central Limit Theorem to be
applied. The sampling distribution can therefore be considered approximately normal. A
 10 
Po(10) distribution has mean and variance equal to 10, therefore X ~ N 10,   N10, 0.25
 40 
(roughly).
6.3
Sampling distribution of the sample proportion
In many statistical investigations we are interested in learning about the proportion of
individuals, or objects, in a population that possess a specified property. For example, we
might be interested in what proportion of patients are alive 5 years after diagnosis of a
particular cancer, or we might be interested in the proportion of UK adults who would like a
ban on blood-sports. Denote the true population proportion of interest by . Note that  is a
population parameter.
To learn about , we could observe a random sample in which each of the n observations is
either a “success” or a “failure”. The sample proportion, p, is given by:
p = (number of successes)  n.
The sample proportion is clearly a sample statistic. It makes sense to use p to learn about .
We are therefore interested in the sampling distribution for p.
To investigate the sampling distribution for p, we will look at 2 experiments in which we
generate random samples of observed values of p.
Experiment 1:
Suppose that we generate 500 samples of size n where each sampled value is either a success
(with probability  = 0.25) or a failure (with probability 1 -  = 0.75). We then calculate the
observed proportion of “successes” in each of the 500 samples. We will do this for n = 5, 10,
25 and 50.
Sampling distribution for the sample proportion (n = 5)
Sampling distribution for the sample proportion (n = 10)
140
200
120
Frequency
Frequency
100
100
80
60
40
20
0
0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Sample proportion, p
Sample proportion, p
Sampling distribution for the sample proportion (n = 20)
Sampling distribution for the sample proportion (n = 50)
70
100
60
Frequency
Frequency
50
50
40
30
20
10
0
0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Sample proportion, p
0.7
0.8
0.1
0.2
0.3
0.4
0.5
Sample proportion, p
Observations:
For a sample of size 5, the possible values of p are 0, 0.2, 0.4, 0.6, 0.8 and 1. The sampling
distribution for p gives the probability of each of these 6 values. The histogram for the case n
= 5 is positively skewed.
As n increases, the histograms become more and more symmetrical and in fact when n = 50
the histogram clearly resembles a normal curve centred on 0.25. In addition, increasing the
sample size decreases the range of observed values for p.
Experiment 2:
Once again we will generate 500 samples, but this time we will have the sample sizes n = 10,
25, 50 and 100 and we will take the true proportion of successes, to be 0.07. So once again
each observation in each sample is either a success (S) with probability 0.07, or failure (F)
with probability 0.93.
Sampling distribution for the sample proportion (n = 10)
Sampling distribution for the sample proportion (n = 25)
150
Frequency
Frequency
200
100
100
50
0
0
0.00
0.0
0.1
0.2
0.3
0.4
0.05
0.10
0.15
0.20
0.25
Sample proportion, p
Sample proportion, p
Sampling distribution for the sample proportion (n = 50)
Sampling distribution for the sample proportion (n = 100)
80
70
60
Frequency
Frequency
100
50
50
40
30
20
10
0
0
0.0
0.1
Sample proportion, p
0.2
0.00
0.05
0.10
Sample proportion, p
Observations:
When n = 10, the possible values for p are 0, 0.1, 0.2, …, 1. The histogram for the 500
samples is very positively skewed and no values greater than 0.4 was observed for p. [Notice
how in the previous experiment, the density for p was not very skewed when n = 10].
As n increases to 25 and 50, the histograms still look positively skewed. However, when the
sample size reaches 100, the histogram is beginning to look slightly more normal. Therefore
we note that in this experiment we need larger sample sizes than in Experiment 1 before the
sampling distribution for p looks approximately normal.
We also note that increasing the sample size again results in a narrowing in the range of
observed values for p.
Thus to summarise the observations from this experiment:
 Densities are roughly centred about  = 0.07.
 Variance for p decreases as n increases.
 As the sample size increases, the density for p becomes approximately normal. However,
the density tends to normality much slower than when we had  = 0.25. Therefore, it
appears that the rate at which the sampling distribution for p tends to normality
depends not only on the sample size n, but also on the value of .
0.15
Important result:
If p is the sample proportion of successes in a random sample of size n where  is the true
proportion of successes, then the following results hold:
 The expected value of p is .
 (1   )
 The standard error (i.e. s.d.) of p is
.
n
 When n is sufficiently large, the sampling distribution for p is approximately normal.
Note: The further the value of  is from 0.5, the larger the value of n must be in order for the
normal approximation of the sampling distribution for p to be accurate.
Rule of thumb:
If both n  5 and n(1   )  5 , then we may use the normal approximation for p.
Proof:
Let X = total number of successes in the sample. Then X ~ Bi[n, ] and so:
E[X] = n
V[X]= n(1 - )  sd[X] = n (1   ) .
But, by definition, the sample proportion p =
X
, and so
n
1
X  1
E[p] = E    E[ X ]  n   .
n
n n
2
 (1   )
1
X  1
V[p] = V      V[ X ]  2 n (1   ) 
.
n
 n  n
n
Taking square roots, we get the required standard error for p.
Also,
Proof of the normality approximation is simply an application of the Central Limit Theorem,
so that for large n
  (1   ) 
X ~ N  ,
.
n 

approximately.
 Example 1:
Suppose that the proportion of women who believe that they are underpaid is 0.55.
a)
If we had a random sample of size 10, could we assume that the sampling distribution
for p is approximately normal?
b)
For a random sample of 400, what are the mean value and standard deviation for p?
c)
In a sample of size 400, what is the probability that we observe the proportion of
women who believe they are underpaid to be greater than 0.6?
a)
 = 0.55 and n = 10, so n = 5.5 and n(1 - ) = 4.5.
As both of these are not  5, then we cannot assume that the distribution of p is normal with
only a sample size of 10.
b)
n = 400, so:
E[p] =  = 0.55
 (1   ) 0.55  0.45
V[p] =

 0.000619
n
400
 sd[X] = 0.0249.
For n = 400, n = 220 and n(1 - ) = 180 and so p's distribution can be considered
approximately normal. Therefore:
p ~ N[0.55, 0.000619].
c)
0.6  0.55 

P( p  0.6)  P Z 
  P(Z  2.008)  1  (2.008)  1  0.9778  0.0222 approximat ely.
0.0249 

 Example 2:
Suppose that the true proportion of individuals with a particular disease is 0.02. What
minimum sample size would be needed before p's distribution can be assumed to be
approximately normal?
 For approximate normality we need n  5 and n(1 - )  5. Now
n (0.02)  5

n  250.
n (0.98)  5

n  5.102
Therefore, to assume approximate normality for p, we would need a sample size of at least
250.
 Exercise:
90% of the population are right-handed. In a sample of 200 people, what is the probability
that the sample proportion who are right-handed is less than 0.86?
6.4
Sampling distribution for sample variance
When we want to learn about the variance,  2 , of a population, it is natural to first look
towards the sample variance, S 2 . We are therefore interested in the sampling distribution for
S2.
In general, the sampling distribution for S 2 does not follow any fixed rules and so here we
will only look at the case when X 1 ,..., X n are i.i.d. N[  ,  2 ].
Important result:
If X 1 ,..., X n are i.i.d. N[  ,  2 ] where  is unknown, then
(n  1) S 2
2
~  n21 .
Proof: The proof will not be given in this course.
Experiment: To demonstrate that this result does in fact hold in practice, 500 samples were
generated from N[100, 400] for various samples sizes, n and the value of
(n  1) S 2 (n  1) S 2

calculated for each of the 500 samples. Histograms of these samples
400
2
Histogram for n = 3
Frequency
200
100
0
0
5
10
15
Statistic
then demonstrate what the sampling distribution for
( n  1) S 2
2
looks like in each case.
Histogram for n = 5
90
80
Frequency
70
60
50
40
30
20
10
0
0
5
10
Statistic
Histogram for n = 20
60
70
50
60
Frequency
Frequency
Histogram for n = 10
40
30
20
50
40
30
20
10
10
0
0
0
0
1
2
3
4
5
1
2
6
3
4
5
Statistic
Statistic
Observations:
In the case when n = 3, the histogram for the sample of 500 observations of
( n  1) S 2
is
2
heavily positively skewed and resembles a  22 distribution. The histograms for the other
cases, where n = 5, 10 and 20, also resemble chi-squared distributions (the respective degrees
of freedom should be 4, 9 and 19).
Download