03 Sampling - John Uebersax Home Page

advertisement
Statistics 251 – Dr. Uebersax
03 Sampling
1
Student Research Project on Statistics courses:
http://goo.gl/UeoRrU
1. Populations and Samples
Today our general topic will be how to construct a sample of a population. We'll consider some
good methods, and some not-so-good methods.

Sampling gets at the essence of inferential statistics: to draw accurate generalizations
about a population using a smaller sample.

Population characteristics are called parameters. We symbolize these as Greek letters:
μ, σ, π

Sample characteristics are called sample statistics. We symbolize these as English
letters.
X , s, p
2. Representative vs. Biased Samples
We want our samples to be representative of (i.e., accurately correspond to) populations.
Two things limit the representativeness of samples:

Sampling variation (random nonrepresentiveness); this is pure random error; it
can produce under- or over-estimation, depending on the sample.

Bias (systematic nonrepresentiveness); bias is a consistent error, i.e., produces
samples that tend to have too much or too little of some characteristics.
Sampling variation is reduced by having larger samples.
Bias is reduced by using random sampling methods.
Statistics 251 – Dr. Uebersax
03 Sampling
2
3. Sample Size




Small samples are inherently variable (Example: conducting an opinion survey using
only 5 people).
The larger the sample size, the better our inferences about the population from which it
comes.
But larger samples are more expensive.
So we need to find a happy medium – not too large, not too small.
4. Random Sampling
We minimize sampling bias by using random sampling methods.
With simple random sampling (SRS), each object in the population has the same chance of
being selected. Example: place everybody's name in a hat and select n names (Iottery
method).
With stratified random sampling, we divide the population into subgroups (called strata), and
randomly sample within each subgroup.
Example: We want a random sample of 40 CalPoly students, but also want each class (Fr, So,
Jr, Sr) to be equally represented. Therefore we draw separate samples of n = 10 from each of
the four classes separately: 10 random Fr, 10 random So, 10 random Jr, 10 random Sr.
If we just selected any 40 students at random, we might get, say, only a few Freshmen
With weighted random samples, we assign cases in some groups a higher probability of
being sampled. Example: suppose we are equally interested in men's and women's opinions,
but women are under-represented by 50% in a population; we could then use lottery-style
random sampling, but place each woman's name 'in the hat' twice.
Systematic samples follow some fixed rule – for example, taking every 10th name from a list.
This method is still random, provided we begin sampling from a random location. However, if
there are patterns or periodicities in the original list, the sample might be biased.
5. Nonrandom Sampling
Nonrandom samples are almost always biased and should be avoided if possible. Sometimes,
however, they are a practical alternative to random samplings.
Voluntary sampling. Ask people to volunteer.
Convenience sampling. Use as sample whoever is convenient (e.g., students on the mall).
Statistics 251 – Dr. Uebersax
03 Sampling
6. Using Excel to Generate a Simple Random Sample
1. Open Excel
2. File > Options > Formulas
3. In "Calculation Options" field:
Check:
Uncheck:
Click: OK
Manual
Recalculate Workbook before saving
4. Enter this formula in a cell: =RANDBETWEEN(1, N), where N is your population size.
5. Copy this formula to n * 1.5 (or more) cells (where n is the sample size) below the first one
(this includes some extra random numbers.
6. Press F9 to calculate.
7. Copy values to a different column:
Highlight target range
Ctrl-C
Position cursor in first cell of destination range
Ctrl-V
When clipboard icon appears, Choose Paste Values; otherwise, on the Home tab, in the
Clipboard group, click Paste icon, then Paste Values).
3
Statistics 251 – Dr. Uebersax
03 Sampling
8. Remove duplicate values:
Highlight column that contains random number values.
Data > Remove Duplicates
Homework:
Given a population with N = 20, numbered from 1 to 20:
Use Excel to generate a random sample with n = 10.
Place results in two adjacent columns of Excel and print and bring to class to turn in.
Remember to indicate lecture number (03) on homework.
4
Download