Chapter 18 * Sampling Distribution Models

Chapter 18 – Sampling Distribution Models How accurate is our sample?  Sometimes different polls show different results for the same question.  Since each poll samples a different group of people, we should expect some variation in the results.  We could try drawing lots of samples and looking at the variation amongst those samples. Experiment: Simulating a sample  A recent US Census Bureau study (source) reports that about 30% of Americans 25 or older have a Bachelor’s degree.  Open up a blank Minitab worksheet and let’s generate some random data:     Calc > Random Data > Bernoulli Enter 200 rows Store in Column C1-C20 Event Probability: .3 Proportion estimates for samples of size 5  We can treat each row as a sample and calculate the proportion of each sample using the mean.  Samples of size 5:  Calc > Row Statistics > Mean Input Variables: C1 – C5  Store result in: C21   Look at these sample proportions. Are they close to the population proportion of 30%?  Draw a histogram of the sample proportions in C21 Proportion estimates for samples of size 10  Samples of size 10:  Calc > Row Statistics > Mean Input Variables: C1 – C10  Store result in: C22   Look at these sample proportions. Are they close to the population proportion of 30%?  Draw a histogram of the sample proportions in C22 Proportion estimates for samples of size 20  Samples of size 10:  Calc > Row Statistics > Mean Input Variables: C1 – C20  Store result in: C23   Look at these sample proportions. Are they close to the population proportion of 30%?  Draw a histogram of the sample proportions in C23 Sampling Distribution Model for a Proportion  Our histogram of the sample proportions started to look like a Normal model  The larger our sample size gets, the better the Normal model works  Assumptions:  Independence: sampled values must be independent of each other  Sample Size: n must be large enough Conditions to check for assumptions  Randomization Condition:  Experiments should have treatments randomly assigned  Survey samples should be a simple random sample or representative, unbiased sample otherwise  10% Condition:  Sample size n must be no more than 10% of population  Success/Failure Condition:  Sample size needs to be large enough to expect at least 10 successes and 10 failures Sampling Distribution Model for a Proportion  If the sampled values are independent and the sample size is large enough, The sampling distribution model of p is modeled by a Normal model with: ( p)  p SD( p)  pq n Example: Proportion of Vegetarians  7% of the US population is estimated to be vegetarian. If a random sample of 200 people resulted in 20 people reporting themselves as vegetarians, is this an unusually high proportion?  Conditions:  Randomization  10% condition  Success/Failure Vegetarians Example continued  Since our conditions were met, it’s ok to use a Normal model.  p = 20/200 = .10  E( p) = p = .07 SD( p)  pq (.07)(.93)   .018 n 200 .10  .07  1.67  z= .018 This result is within 2 sd’s of mean, so not unusual 68-95-99.7 Rule with Vegetarians 68% 95% 98% -3σ -2σ -1σ p 1σ 2σ 3σ Sampling Distribution of a Mean  Rolling dice simulation  10,000 individual rolls recorded Figure from DeVeaux, Intro to Stats Sampling Distribution of a Mean  Roll 2 dice 10,000 times, average dice Figure from DeVeaux, Intro to Stats Sampling Distribution of a Mean  Rolling 3 dice 10,000 times and averaging dice Figure from DeVeaux, Intro to Stats Sampling Distribution of a Mean  Rolling 5 dice 10,000 times and averaging Figure from DeVeaux, Intro to Stats Sampling Distribution of a Mean  Rolling 20 dice 10,000 times and averaging  Once again, as sample size increases, Normal model appears Figure from DeVeaux, Intro to Stats Central Limit Theorem  The sampling distribution of any mean becomes more nearly Normal as the sample size grows.  The larger the sample, the better the approximation will be  Observations need to be independent and collected with randomization. CLT Assumptions  Assumptions:  Independence: sampled values must be independent  Sample Size: sample size must be large enough  Conditions:  Randomization  10% Condition  Large enough sample Which Normal Model to use?  The Normal Model depends on a mean and sd  Sampling Distribution Model for a Mean When a random sample is drawn from any population with mean µ and standard deviation σ, its sample mean y has a sampling distribution with: Mean: µ Standard Deviation:  n Example: CEO compensation  800 CEO’s  Mean (in thousands) = 10,307.31  SD (in thousands) = 17,964.62  Samples of size 50 were drawn with:  Mean = 10,343.93 SD = 2,483.84  Samples of size 100 were drawn with:  Mean = 10,329.94 SD = 1,779.18  According to CLT, what should theoretical mean and sd be? Example from DeVeaux, Intro to Stats

Chapter 18 * Sampling Distribution Models

Related documents

Products

Support

Chapter 18 * Sampling Distribution Models

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib