15-f11-bgunderson-iln

advertisement
Author(s): Brenda Gunderson, Ph.D., 2011
License: Unless otherwise noted, this material is made available under the
terms of the Creative Commons Attribution–Non-commercial–Share
Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/
We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your
ability to use, share, and adapt it. The citation key on the following slide provides information about how you
may share and adapt this material.
Copyright holders of content included in this material should contact open.michigan@umich.edu with any
questions, corrections, or clarification regarding the use of content.
For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use.
Any medical information in this material is intended to inform and educate and is not a tool for self-diagnosis
or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please
speak to your physician if you have questions about your medical condition.
Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers.
Some material sourced from:
Mind on Statistics
Utts/Heckard, 3rd Edition, Duxbury, 2006
Text Only: ISBN 0495667161
Bundled version: ISBN 1111978301
Material from this publication used with permission.
Attribution Key
for more information see: http://open.umich.edu/wiki/AttributionPolicy
Use + Share + Adapt
{ Content the copyright holder, author, or law permits you to use, share and adapt. }
Public Domain – Government: Works that are produced by the U.S. Government. (17 USC §
105)
Public Domain – Expired: Works that are no longer protected due to an expired copyright term.
Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain.
Creative Commons – Zero Waiver
Creative Commons – Attribution License
Creative Commons – Attribution Share Alike License
Creative Commons – Attribution Noncommercial License
Creative Commons – Attribution Noncommercial Share Alike License
GNU – Free Documentation License
Make Your Own Assessment
{ Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. }
Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in
your jurisdiction may differ
{ Content Open.Michigan has used under a Fair Use determination. }
Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your
jurisdiction may differ
Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that
your use of the content is Fair.
To use this content you should do your own independent analysis to determine whether or not your use will be Fair.
Stat 250 Gunderson Lecture Notes
Learning about a Population Mean
Part 1: Distribution for a Sample Mean
Chapter 9: Section 5, SD Module 3 and Sections 9 and 10
Recall Parameters, Statistics and Sampling Distributions
We go back to the scenario where we have one population of interest but now the response
being measured is quantitative (not categorical). We want to learn about the value of the
population mean  We take a random sample and use the sample statistic, the sample mean
X , to estimate the parameter. When we do this, the sample mean may not be equal to the
population mean, in fact, it could change every time we take a new random sample.
So recall that a statistic is a random variable and it will have a probability distribution. This
probability distribution is called the sampling distribution of the statistic.
We turn to understanding the sampling distribution of the sample mean which will be used to
construct a confidence interval estimate for the population mean and to test hypotheses about
the value of a population mean.
9.5 SD Module 3: Sampling Distribution for One Sample Mean
Many responses of interest are measurements – height, weight, distance, reaction time, scores.
We want to learn about a population mean and we will do so using the information provided
from a sample from the population.
Example: How many hours per week do you work?
A poll was conducted by The Heldrich Center for Workforce Development (at Rutgers
University). A probability sample of 1000 workers resulted in a mean number of hours worked
per week of 43.
Population =
Parameter =
Sample =
Statistic =
Can anyone say how close this observed sample mean x of 43 is to the population mean ?
If we were to take another random sample of the same size, would we get the same value for
the sample mean?
107
So what are the possible values for the sample mean x if we took many random samples of the
same size from this population? What would the distribution of the possible x values look like?
What can we say about the distribution of the sample mean?
Distribution of the Sample Mean – Main Results
Let  = mean for the population of interest.
Let  = standard deviation for the population of interest.
Let x = the sample mean for a random sample of size n.
If all possible random samples of the same size n are taken and x is computed for each, then …
 The average of all of the possible sample mean values (from all possible random samples of
the same size n) is equal to ___________________________.
Thus the sample mean is an __________________ estimator of the population mean.

The standard deviation of all of the possible sample mean values is equal to the original
population standard deviation divided by n .
Standard deviation of the sample mean is given by: s.d.( x ) = 
n
What about the shape of the sampling distribution? The first two bullets above provide what
the mean and the standard deviation are for the possible sample mean values. The final two
bullets tell us that the shape of the distribution will be (approximately) normal.

If the parent (original) population has a normal distribution,
then the distribution of the possible values of x , the sample mean, is normal.
Take a random sample of ANY size n
and compute the sample mean.
Original Population
Sample Mean Population

If the parent (original) population is not necessarily normally distributed
but the sample size n is large, then the distribution of the possible values of x ,
the sample mean is approximately normal.
Take a random sample of size n where n is LARGE,
and compute the sample mean.
Original Population
108
Sample Mean Population
This last result is called the
.
The CLT is formally presented in Section 9.10 of your text (starting on page 345). The C in CLT is
for CENTRAL. The CLT is an important or central result in statistics. As it turns out, all normal
curve approximations for the big five statistics are really applications of the CLT and can be
derived using the results from Section 8.8.
The Stat 250 formula card summarizes distribution of a sample mean as follows:
Sample Means
Mean: EX    X  
Standard Deviation: s.d .( X )   X 

n
Sampling Distribution of X
  
.
If X has the N (  ,  ) distribution, then X is N   ,

n


If X follows any distribution with mean  and standard deviation  and n is large,

 
.
then X is approximately N   ,

n

This last result is the Central Limit Theorem
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012.
Used with permission.
Try It! ACT scores
Suppose that ACT scores for students follow a normal model with mean of 18.6 and standard
deviation of 5.9.
a. What is the probability that a single student randomly selected will score 21 or higher?
b. Suppose two students will be randomly selected. What is the probability they will both
score 21 or higher?
c. A random sample of 50 students is obtained. What is the probability that the mean score
for these 50 students will be 21 or higher?
109
d. What if the normal distribution assumption were not given? Would your answer to
part c change? Explain.
Try It! Actual Flight Times
Suppose the random variable X represents the actual flight time (in minutes) for Delta Airlines
flights from Cincinnati to Tampa follows a uniform distribution over the range of 110 minutes to
130 minutes.
a. Sketch the distribution for X (include axes labels and some values on the axes).
b. Suppose we were to repeatedly take a random sample of size 100 from this distribution and
compute the sample mean for each sample. What would the histogram of the sample mean
values look like? Provide a smoothed out sketch about the distribution of the sample mean,
include all details that you can.
Try It! True or False
Determine whether each of the following statements is True or False. A true statement is
always true. Clearly circle your answer.
a. The central limit theorem is important in statistics because
for a large random sample, it says the sampling distribution of
the sample mean is approximately normal.
True
False
b. The sampling distribution of a parameter is the distribution
of the parameter value if repeated random samples are obtained.
True
False
110
More on the Standard Deviation of X
The standard deviation of the sample mean is given by: s.d.( x ) = 
n
This quantity would give us an idea about how far apart a sample mean x and the true
population mean  are expected to be on average.
We can interpret the standard deviation of the sample mean as approximately the average
distance of the possible sample mean values (for repeated samples of the same size n) from the
true population mean 
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012.
Used with permission.
Note:
If the sample size increases, the standard deviation decreases, which says the possible sample
mean values will be closer to the true population mean (on average).
The s.d.( x ) is a measure of the accuracy of the process of using a sample mean to estimate the
population mean. This quantity  does not tell us exactly how far away a particular observed
n
x value is from 
In practice, the population standard deviation  is rarely known, so the sample standard
deviation s is used. As with proportions, when making this substitution we call the result the
standard error of the mean s.e.( x ) =
s
. This terminology makes sense, because this is a
n
measure of how much, on average, the sample mean is in error as an estimate of the
population mean.
Standard error of the sample mean is given by:
s.e.( x ) =
s
n
This quantity is an estimate of the standard deviation of x .
So we can interpret the standard error of the sample mean as estimating, approximately, the
average distance of the possible x values (for repeated samples of the same size n) from the
true population mean 
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012.
Used with permission.
Moreover, we can use this standard error to create a range of values that we are very confident
will contain the true population mean  namely, x (few)s.e.( x ). This is the basis for
confidence interval for the population mean  discussed more in the Chapter 11.
111
9.9 Preparing for Statistical Inference: Standardized Statistics
In our ACT Scores and Actual Flight Times examples earlier, we have already constructed and
used a standardized z-statistic for a sample mean.
x
z = 
has (approximately) a standard normal distribution N(0,1).
n
Dilemma = _____________________________________
If we replace the population standard deviation  with the sample standard deviation s, then
x
won’t be approximately N(0,1); instead it has a ________________________________
s
n
Student’s t-Distribution
A little about the family of t-distributions ...
 They are symmetric, unimodal, centered at 0.
 They are flatter with heavier tails compared to the N(0,1) distribution.
 As the degrees of freedom (df) increases ... the t distribution approaches the N(0,1)
distribution.
 We can still use the ideas about standard scores for a frame of reference.
 Tables A.2 and A.3 summarize percentiles for various t-distributions
From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012.
Used with permission.
112
We will see more on t-distributions in Chapters 11 and 13 when we do inference about
population mean(s).
9.10 Generalizations beyond the Big Five
The sampling distribution of a statistic is the distribution of possible values of the statistic for
repeated samples of the same size from a population.
So far we have discussed the sampling distribution of a sample proportion (SD Module 1), the
sampling distribution of the difference between two sample proportions (SD Module 2), and
the sampling distribution of the sample mean (SD Module 3). In all cases, under specified
conditions the sampling distribution was approximately normal.
Every statistic has a sampling distribution, but the appropriate distribution may not always be
normal, or even be approximately bell-shaped.
You can construct an approximate sampling distribution for any statistic by actually taking
repeated samples of the same size from a population and constructing a histogram for the
values of the statistic over the many samples.
The web applet examined in your computer labs could simulate the sampling distribution of the
median, the range, and many other statistics. Example 9.19 on pages 347-348 of your text also
shows a good example.
113
Additional Notes
A place to … jot down questions you may have and ask during office hours, take a few extra notes, write
out an extra practice problem or summary completed in lecture, create your own short summary about
this chapter.
114
Download