Jan 18

advertisement
Outline for Class Meeting 1 (Chapter 1, Lohr, 1/12/04)
I.
The topic of sampling
A. What is it about?
B. What is different from your math stat course?
- Not iid
- Parameters are not defined as characteristics of r.v.’s
- More worry about realities (non-response, measurement error)
II.
Sampling concepts
A.
B.
C.
D.
E.
F.
III.
Observation unit or element
Target population (or universe)
Sample
Sampled population
Sampling unit
Sampling frame
What makes a sample design good?
A. Gives all members of population some known chance of selection (known
as probability design)
B. Is efficient
Example
It is desired to estimate the amount of time spent by an RN on non-RN duties on her
eight-hour shift in a hospital emergency room. Budget will allow data collection for only
two of the eight hours. It is decided to sample in the following way. First, select one
hour from the eight, giving every hour an equal chance of being chosen. Then select a
second hour, giving every remaining hour an equal chance of being chosen.
(a) What would be a good estimator for the total time spent on non-RN duties?
(b) Now suppose that rather than giving every remaining hour an equal chance of being
chosen, only those non-adjacent to the first can be selected for the 2nd sample hour. Now
how would you estimate the total time spent on non-RN duties?
(c) Which of these designs is better?
IV.
Sampling and non-sampling errors
A. Sampling error is the difference between parameter value and its estimate
due solely to sample-to-sample variability. (e.g., margin of error)
B. Non-sampling errors refers to difference between parameter value and its
estimate which cannot be attributed to sample to sample variability (e.g.,
selection bias, measurement error)
C. Both sources of error are studied by statisticians, although the latter are
harder to measure.
V.
Notation
A. The index set for a population or universe will be denoted by U = {1, 2,
…, N}. The sample will be denoted by S. S  U. The observed (fixed)
value of the characteristic of interest for unit i is denoted by yi.
B. When S is chosen from U using a probabilistic mechanism, we can
calculate the probability that the sample S is chosen from the population,
and we denote it by Pr[S].
C. The characteristic of interest in the population is usually t   iU yi ,
yU  t / N , or p = proportion of universe having some attribute.
D. The first-order selection or inclusion probability is i  Pr[unit i  S ] .
The second order inclusion probability is ij  Pr[units i & j  S ] .
VI.
Review
A. Let  denote any characteristic of a population (such as t, yU , or p), and
let ˆ (or ˆ for short) denote any estimator of  calculated from S. The
S
distribution of ˆ over all possible samples is called the sampling
distribution of ˆ .
B. The expected value of ˆ is E(ˆ)   all S Pr[S]ˆS .
C. The sampling bias of ˆ is Bias( ̂ ) = E( ˆ ) - . Unbiased or
asymptotically unbiased estimators are preferred.
D. The variance of ̂ is V (ˆ)  E (ˆ  E (ˆ)) 2   all S Pr[ S ](ˆ  E (ˆ)) 2 .
This variance is used to compare unbiased estimators.
E. The mean square error is a measure of accuracy and precision of an
estimator. It is MSE (ˆ )  E (ˆ  ) 2  V (ˆ )  [Bias( ˆ )] 2 . It can be used
to compare estimators that are biased.
Example (con't)
What are the inclusion probabilities for the design above?
Do you think your suggested estimators are unbiased?
Download