Stat 411 Lecture Notes on Non

advertisement
Stat 411 Lecture Notes on Non-Sampling Errors
Your book has a short section on non-sampling errors. I will be going into much more
detail about some of the topics, and skipping others – but regardless, I strongly suggest
you read sections 3.3 – 3.6 in the book. Pay special attention to section 3.6, which gives
a checklist for the planning stages, and will be particularly useful when writing your final
paper.
-----------------------------------------------------------------------------------------------------------Non-sampling errors can be generally defined as any source of bias or error in the
estimation of a population characteristic in which the uncertainty about the resulting
estimate is NOT due to the fact that we’re sampling. You can think of them as errors for
which increasing the sample size will not aid us in our estimation.
There are two main types of non-sampling errors that we’ll talk about:
Non-Response Errors – not all selected elements yield their information, which
usually means that the population of interest is not the population from which the sample
is drawn
Measurement Errors – measurements taken on selected elements are wrong,
known with error, or not accurate enough
----------------------------------------------------------------------------------------------------------Let’s consider non-response first. This is a problem usually associated with surveys or
interviews – any situation in which the human element is involved. People can and will
refuse information for a wide variety of reasons – they could be busy, uninterested,
suspicious of the surveyor’s intentions, afraid they won’t be anonymous, or simply
uncooperative. The problem with non-response is that it changes our sampling frame – if
some elements will not give us their information, then effectively we are sampling from
the population of potential responders, not the population of interest. For example, let:
N = total population size, and  = population mean
N1 = total potential responders, and 1 = population mean of responders
N2 = total potential non-responders, and 2 = population mean of non-responders
Suppose we conduct an SRS from this population, with estimation via the usual sample
mean (which is unbiased under SRS when all folks respond). Is the sample mean
unbiased when there is non-response? No, because all of our data is drawn from the
population of responders, and thus we are really estimating is 1, not . The bias in this
case can be shown to be (N2 / N)*(X1bar – X2bar).
You can think of this situation as a stratified sample where the population is broken into
two strata, and we only have data from one stratum. Remember that the simple estimator
used on data from a stratified sample is biased for  - the same thing applies here. Some
of you might be wondering – can’t we think of this as a two-stage sample where we
choose m=1 of the M=2 strata, then take an SRS within that group? Not quite. And the
reason is because we are not randomly choosing the group that we take in the first stage –
we are forced to take the group of responders. IF there was equal chance of getting either
group, THEN we could use a two-stage estimator.
Notice that if 1 = 2, in other words, if the populations of responders and nonresponders are the same, then 1 = , and we’re out of the woods – we can do everything
in the same manner as we have all along. Evaluating whether or not the responders and
non-responders are the same involves making an assumption, and that assumption is
more or less reasonable depending on each specific situation.
So what if we can’t reasonably assume that the groups of responders and non-responders
are similar, or if we prefer not to let our analysis ride on a subjective assessment? There
are some alternatives.
The most obvious (but practically speaking, usually the hardest and/or most expensive)
method of reducing non-response bias is to convert non-responders into responders.
Recall the equation for non-response bias: (N2 / N)*(X1bar – X2bar). One way to
reduce the absolute value of this quantity is to reduce N2/N, i.e., reduce the proportion of
non-responders in the population. The ways to do this are numerous. Here is a mediumsized list, with short discussions of pros and cons. Some are specific, some are general,
some are practical and some are psychological. They appear in no particular order.
Ways to Convert Non-Responders Into Responders
1. If you are conducting a telephone or face-to-face interview, make sure you
call/visit at times when the person to be interviewed is likely to be home. For the
average working Joe, this means sometime in the evening after 6pm. But don’t
call too late either, or you may incur non-response because of a sleepy and
annoyed individual. Sometime between 6 and 8 is best.
2. If you intend to send a mail survey, confirm that the people you wish to survey
still live at the address you have on file - registries of this sort become obsolete
quickly (20% of American families move each year). If a particular individual
does not respond, you may want to send a representative to the address to find out
if they are there, or perhaps to find out to where they have moved. If you want to
sample whoever is currently living in the address you’ve selected, label the
envelope, for example, “Mr. and Mrs. Smith or current resident.”
3. For mailed surveys in particular, studies have shown that using attractive, high
quality, official-looking envelopes and letterhead can improve response
significantly. Include a carefully typed cover letter explaining your intentions,
and guaranteeing their confidentiality. Get a big-wig from your company or
organization to sign it (personally, if possible). Always send materials through
first-class mail, and include a return envelope with first-class postage.
4. Keep surveys and interviews as short as possible. As a general rule, the more
questions you ask, the less likely you are to get accurate (or any) information.
5. Use the guilt angle whenever possible (but do it implicitly, don’t beg). What I
mean by this is simply to increase the amount and quality of personal contact with
your population. Psychologically speaking, for most people it’s easy to throw
away a mailed survey, considerably harder to hang-up on an interviewer, and
harder yet to walk away. Therefore, choose a face-to-face interview over a phone
interview, and choose a phone interview over a mailed survey, whenever it is
practical to do so.
6. Publicizing or advertising your survey often helps with non-response. This lets
people know they’re not the only one being surveyed and helps with credibility.
Use endorsements by celebrities, important individuals, or respected institutions if
you are able.
7. Offer an incentive. Money is by far the best, because it has the most universal
appeal. Be careful when using other incentives, because you do not want to elicit
responses from some specific subgroup of the population who happens to want or
like what you’re offering. Whether to offer the incentive up-front or upon return
of the survey is basically a toss up in terms of effectiveness – but the former will
be considerably more expensive.
In addition to the above, there is one more method that requires a bit more attention,
called ‘double sampling.’ At the core, it is really just a two-stage sample. In the first
stage, try to elicit responses through a cheap and easy method, such as a mailed survey.
In the second stage, go after a random sample of the non-responders from stage 1 with
the big guns – telephone or face-to-face interviewing. This is a fairly well studied
method, with suggested estimators and such, but I’ll go through the details in class.
Clearly a lot of effort has been put into figuring out how to get people to respond. But it
is a sad fact that even after we’ve done everything in our power to get people to respond,
there will still almost surely be some missing values in our data set. Next I’ll talk about
how to deal with these missing values.
The context for the next bit will be to assume that we’ve coerced a potential participant to
give us answers to at least some of the questions we asked. For example, suppose we get
the following results from a survey of the class (dashes (-) indicate missing values):
Subject
1
2
3
4
Height (in)
72
63
74
65
Shoe Size
9
10
6
We can assume that height is a known auxiliary variable.
Weight (lb)
150
175
-
Well, probably the easiest thing to do is simply delete the records with any missing
values, but this is generally considered a bad idea. Deletion of this sort greatly reduces
the sample size (in our example, it cuts it in half), and worse yet, the non-responders
might have something in common (in our case, they tend to be shorter), that could bias
the estimate.
The next most intuitive solution would be to replace the missing values with the mean
value of the existing data. [For future reference, any method by which we substitute
possible values for the missing ones is called imputation.] If we did this, the completed
set would look like this (imputed values in bold):
Subject
1
2
3
4
Height (in)
72
63
74
65
Shoe Size
9
8.33
10
6
Weight (lb)
150
162.5
175
162.5
This is slightly better than deletion, but still has some inherent problems. We still have
the problem that the non-responders could be similar, in which case the mean of the
remaining values could be a far cry from the mean of the missing values (in our case, it is
pretty unlikely that everyone who is slightly over 5ft tall will weigh 162.5 lbs.) Also,
since the missing values are all replaced by the same value, the estimated variance will be
significantly reduced compared to the real thing.
To circumvent the above problems, we could use the known auxiliaries and some of the
existing values of the other variables to perform a linear regression and impute the
values. I did this below:
Subject
1
2
3
4
Height (in)
72
63
74
65
Shoe Size
9
5.09
10
6
Weight (lb)
150
37.5
175
62.5
You can see this was met with mixed success - the shoe size looks reasonable, but the
weights are much too small. Seemingly this would work a bit better if the data set was
larger (specifically, if we had some actual weights for people with heights in the 64 inch
range). Also, it may not be reasonable to fit a straight line to the relationship between
height and weight.
An alternative to regression is the ‘Hot Deck’ method. In this procedure, the data file is
sorted in a meaningful way based on auxiliary variables, and then the missing values are
simply filled in with the corresponding previous value. In this way, the auxiliaries are
used somewhat implicitly, and therefore the computational effort is reduced, as are the
occasionally unreasonable results from a rigorous regression. This is the preferred
method of the US Census Bureau – not sure if that’s a plus or a minus. The method is a
little cumbersome to write out more specifically, so I’ll do an example in class.
All the methods we’ve discussed so far have attempted to create a single value for each
missing value. The final method seeks to impute multiple values for each missing value,
and then calculates estimates of the population characteristics for every possible
arrangement of the missing values. (Again, I’ll do an example in class.) By far the best
thing about this method is that it allows us to see how the estimates of population
characteristics would change depending on how we impute the values. If the estimates
are relatively stable regardless of the imputed values, we can be confident in our results,
but if the estimates vary wildly depending on the imputed values, we are less certain. By
far the worst thing about this method is that it takes a ton of computing power, especially
when the number of missing values and/or the number of possible values to impute per
missing value is large. Multiple imputation has been around for a while, but has come
into vogue only recently, simply because we now have computers that are fast enough to
make it reasonable.
Now, let me completely shift gears to talk about measurement error. The assumption
here is that we have complete information, but that the values may not be exactly right.
The simplest form of this is when we have instruments with a minimum detection level,
such as a ruler that only has marks at the centimeter, or a scale that only measures to the
gram. Then the value we get from the instrument will be only approximate or within a
range. We could also imagine errors in measurement resulting from outright erroneous
equipment or human error in recording or summarizing the values. Models that adjust for
these kinds of errors exist, but are extremely complex, so I will not even attempt to cover
them here. We can, however, briefly discuss the implications of measurement error.
In the case where measurements are wrong, rather than known with error (say, if our
scale consistently gives the weight of an object as 1 gram heavier than it should be), then
the only effect on the estimates is a bias in the direction of the error. If we could
determine the magnitude of the error (say, get a properly working scale), we could
calculate the bias in our estimate and correct for it. Generally, though, it is next to
impossible to determine the amount of bias. Not much we can do about that.
When measurements are known with error, the resulting estimates based on our standard
estimation methods could be biased, have incorrect variance, or both. As a general rule,
the variance based on our standard calculations will be too small, because we are not
including the variance of the measurement errors. Intuitively, uncertainty about the
measured value leads to additional uncertainty about the estimate, which means a larger
variance of the estimate.
In surveys or interviews of humans, the ‘measurement instrument’ is the individual being
questioned. We can ‘calibrate’ the responses (and thereby decrease our measurement
error) by changing the way we phrase or present the questions. This will be the subject of
the remainder of my little rant about measurement errors.
Studies have shown that the way a question is asked has a huge impact on the answer.
Massive texts have been written on how to ask questions in an unambiguous fashion (at
the end of these notes, I’ll offer some references). A reasonably complete coverage of
this topic would be enough material to fill an entire semester (and the university,
probably the psychology or sociology department, may actually offer such a class).
Therefore, I’ll give you just a flavor.
Overwhelmingly, my advice to you is to use common sense. Most of you will be able to
tell when a question is poorly worded or not specific enough. The problem with this is
that, as the question creator, your verdict on the ambiguity of a question is clouded by the
fact that you know what you’re trying to ask. Easy solution – give the survey to a test
group, and see if they answer as you expect them to. If not, revise the question (perhaps
with recommendations from the study group), and try again. Repeat until you achieve the
desired results.
The final topic (hooray!) I want to discuss about non-sampling errors is what is called
randomized response. This is a method that can be used to encourage honesty when
sensitive questions are being asked. I will go through an example of this in class as well.
References for Non-Sampling Errors
“Non-Sampling Errors in Surveys”, Judy Lessler & William Kalsbeek, 1992
“The Phantom Respondents”, John Brehm, 1993
“Mail and Phone Surveys: The Total Design Method”, Don Dillman, 1978
“Sampling Design in Business Research”, Ed Deming, 1960
“Multiple Imputation for Non-Response in Surveys”, Don Rubin, 1987
Download