bstat05Samples

advertisement
Biostatistics
Unit 5 – Samples
Sampling distributions
Sampling distributions are important in the
understanding of statistical inference. Probability
distributions permit us to answer questions about
sampling and they provide the foundation for
statistical inference procedures.
Definition
The sampling distribution of a statistic is the
distribution of all possible values of the statistic,
computed from samples of the same size randomly
drawn from the same population. When sampling a
discrete, finite population, a sampling distribution
can be constructed. Note that this construction is
difficult with a large population and impossible with
an infinite population.
Construction of sampling distributions
1. From a population of size N, randomly draw all
possible samples of size n.
2. Compute the statistic of interest for each sample.
3. Create a frequency distribution of the statistic.
Properties of sampling distributions
We are interested in the mean, standard deviation
and appearance of the graph (functional form) of a
sampling distribution.
Types of sampling distributions
We will study the following types of sampling
distributions.
A) Distribution of the sample mean
B) Distribution of the difference between two means
C) Distribution of the sample proportion
D) Distribution of the difference between two
proportions
Sampling distribution of
Given a finite population with mean (m) and variance
(s2). When sampling from a normally distributed
population, it can be shown that the distribution of
the sample mean will have the following properties.
Properties of the sampling distribution
1. The distribution of
will be normal
2. The mean
, of the distribution of the values of
, will be the same as the mean of the population
from which the samples were drawn;
= m.
3. The variance, , of the distribution of
, will be equal to the variance of the population
divided by the sample size;
=
.
Standard error
The square root of the variance of the sampling
distribution is called the standard error of the mean
or the standard error.
Nonnormally distributed populations
When the sampling is done from a nonnormally
distributed population, the central limit theorem is
used.
The central limit theorem
Given a population of any nonnormal functional form
with mean (m) and variance (s2) , the sampling
distribution of , computed from samples of size n
from this population will have mean, m, and
variance, s2/n, and will be approximately normally
distributed when the sample is large (30 or higher).
The central limit theorem
Note that the standard deviation of the sampling
distribution is used in calculations of z scores and is
equal to
Example
Given the information below, what is the probability
that x is greater than 53?
(1) Write the given information
m = 50
s = 16
n = 64
x = 53
Example
(2) Sketch a normal curve
Example
(3) Convert x to a z score
Example
(4) Find the appropriate value(s) in the table
A value of z = 1.5 gives an area of .9332. This is
subtracted from 1 to give
the probability P (z > 1.5) = .0668
Example
(5) Complete the answer
The probability that x is greater than 53 is .0668.
Distribution of the difference between two means
It often becomes important to compare two
population means. Knowledge of the sampling
distribution of the difference between two means is
useful in studies of this type. It is generally assumed
that the two populations are normally distributed.
Sampling distribution of
Plotting sample differences against frequency gives
a normal distribution with mean equal to
which is the difference between the two population
means.
Variance
The variance of the distribution of the sample
differences is equal to
Therefore, the standard error of the differences
between two means would be equal to
Converting to a z score
To convert to the standard normal distribution, we
use the formula
We find the z score by assuming that there is no
difference between the population means.
Sampling from normal populations
This procedure is valid even when Sampling from
normal populations the population variances are
different or when the sample sizes are
different. Given two normally distributed populations
with means, and , and variances, and
,
respectively.
(continued)
Sampling from normal populations
The sampling distribution of the difference,
between the means of independent samples of
size n1 and n2 drawn from these populations is
normally distributed with mean,
, and
variance,
,
Example
In a study of annual family expenditures for general
health care, two populations were surveyed with the
following results:
Population 1: n1 = 40,
= $346
Population 2: n2 = 35,
= $300
Example
If the variances of the populations are
= 2800 and
= 3250, what is the probability of
obtaining sample results
as large as those
shown if there is no difference in the means of the
two populations?
Solution
(1) Write the given information
n1 = 40,
= $346,
= 2800
n2 = 35,
= $300,
= 3250
Solution
(2) Sketch a normal curve
Solution
(3) Find the z score
Solution
(4) Find the appropriate value(s) in the table
A value of z = 3.6 gives an area of .9998. This is
subtracted from 1 to give the probability
P (z > 3.6) = .0002
Solution
(5) Complete the answer
The probability that
.0002.
is as large as given is
Distribution of the sample proportion (
)
While statistics such as the sample mean are
derived from measured variables, the sample
proportion is derived from counts or frequency data.
Properties of the sample proportion
Construction of the sampling distribution of the
sample proportion is done in a manner similar to that
of the mean and the difference between two
means. When the sample size is large, the
distribution of the sample proportion is approximately
normally distributed because of the central limit
theorem.
Mean and variance
The mean of the distribution,
, will be equal to
the true population proportion, p, and the variance of
the distribution,
, will be equal to p(1-p)/n.
The z-score
The z-score for the sample proportion is
Example
In the mid seventies, according to a report by the
National Center for Health Statistics, 19.4 percent of
the adult U.S. male population was obese. What is
the probability that in a simple random sample of
size 150 from this population fewer than 15 percent
will be obese?
Solution
(1) Write the given information
n = 150
p = .194
Find P(
< .15)
Solution
(2) Sketch a normal curve
Solution
(3) Find the z score
Solution
(4) Find the appropriate value(s) in the table
A value of z = -1.36 gives an area of .0869 which is
the probability
P (z < -1.36) = .0869
Solution
(5) Complete the answer
The probability that
< .15 is .0869.
Distribution of the difference
between two proportions
This is for situations with two population
proportions. We assess the probability associated
with a difference in proportions computed from
samples drawn from each of these populations. The
appropriate distribution is the distribution of the
difference between two sample proportions.
Sampling distribution of
The sampling distribution of the difference between
two sample proportions is constructed in a manner
similar to the difference between two means.
(continued)
Sampling distribution of
Independent random samples of size n1 and n2 are
drawn from two populations of dichotomous
variables where the proportions of observations with
the character of interest in the two populations are p1
and p2 , respectively.
Mean and variance
The distribution of the difference between two
sample proportions,
, is approximately
normal.
The mean is
The variance is
These are true when n1 and n2 are large.
The z score
The z score for the difference between two
proportions is given by the formula
Example
In a certain area of a large city it is hypothesized that 40
percent of the houses are in a dilapidated condition. A
random sample of 75 houses from this section and 90
houses from another section yielded difference,
,
of .09. If there is no difference between the two areas in
the proportion of dilapidated houses, what is the
probability of observing a difference this large or larger?
Solution
(1) Write the given information
n1 = 75, p1 = .40
n2 = 90, p2 = .40
= .09
Find P(
.09)
Solution
(2) Sketch a normal curve
Solution
(3) Find the z score
Solution
(4) Find the appropriate value(s) in the table
A value of z = 1.17 gives an area of .8790 which is
subtracted from 1 to give the probability
P (z > 1.17) = .121
Solution
(5) Complete the answer
The probability of observing
of .09 or greater is .121.
fin
Download