Department of Urban Studies and Planning

advertisement
Department of Urban Studies and Planning
Massachusetts Institute of Technology
11.220 Quantitative Reasoning and Statistical Methods for Planning I
Spring 1998
Homework Set #3 Solutions
[Total = 62 points]
Sampling, Surveys, Opinion Polling
Question 1
The City Planning Commission in your city has an index card file of all addresses in the
city. Each card contains one address and a list of all of the families that live at that
address.
[4]
(a)
Outline how you would select a simple random sample of addresses from this
card file. Be brief but complete. Several sentences should suffice.
The City Planning Commission has an index card file of addresses. To select a simple
random sample, it numbers each of the cards from 0 to "N" where "N" is the total size of the
population of addresses. Knowing the number of digits in "N", the commission can refer to
the random number tables and use that number of digits to randomly select each card. For
each random number from the tables, the commission pulls the corresponding card out of
the index file. The selection continues until "n" cards (i.e., the sample size) are pulled out.
An alternative procedure is to use systematic sampling. First, divide the population size,
"N," by the sample size, "n," (N/n) and round to a whole number (call this number "X").
Then, go to the random number table and select a number between 1 and X for the first case.
Then, choose every Xth card after that. Notice that X should not be selected arbitrarily. It
is important not to under-represent or over-represent the beginning or the end of the index
card file.
A third alternative procedure for drawing the sample is to mix all of the cards together in a
large box. Mix them well and then select "n" cards. Unfortunately, someone will have to
refile all the cards after the selection is completed. Good luck if you're the chosen one!
[3]
(b)
Once you have taken a simple random sample of addresses from the card file, you
take all the families on each of the cards you selected. Is the resulting sample of
families a simple random sample of all families in this city? Explain. If not,
indicate what type of family is overrepresented or underrepresented in the sample.
No, the resulting sample of families is not a simple random sample of all the families in the
city. When one family at an address is selected to be in the sample, then all of the families
at that address are included in the selection. This is only the case if the address selected
corresponds to a multi-family unit. (In probability terms, the probability of each family on
the selected card then becomes 1.) Consequently, each family in the total population does
not have an equal chance of being selected; in short, it is not a simple random sample.
Families in multi-unit dwellings will be over-represented.
[3]
(c)
Once you have taken a simple random sample of addresses from the card file, you
select one family at random from each of the sampled addresses. Is the resulting
sample of families a simple random sample of families living in this city?
Explain. If not, indicate what type of family is overrepresented or
underrepresented in the sample.
No, the resulting sample of families is not a simple random sample of families living in
the city. Once a family from a multi-unit dwelling is chosen, the other families cannot
be chosen. (In probability terms, their probability of selection would go to 0). So again,
each family does not have an equal chance of being selected, and we do not have a
simple random sample. Using this procedure, families living in multi-units would be
under-represented. Conversely, families living in single units would be overrepresented.
Question 2
A survey is carried out by the planning department to determine the distribution of
household size in a certain city. The department draws a simple random sample of 1,000
households, but after several visits the interviewers find people at home in only 853 of the
sample households. Rather than face such a high nonresponse rate, the planners draw a
second batch of households and use the first 147 completed interviews in this second
batch to bring the sample up to its goal of 1,000 households. They count 3,087 people in
these 1,000 households and estimate the average (mean) household size in the city to be
about 3.1 persons.
[4]
(a)
Is this estimate of the mean household size in the city likely to be too low, too
high, or just about right? Explain.
The estimate of mean household size in the city, 3.1 persons, is likely to be too high. The
first 147 completed interviews in the second batch are more likely to represent larger
households because there is a better chance of finding one of the household members at
home in larger households.
Question 3
Attached to this homework set is the complete survey questionnaire that was used by the
Commission to Study Racial and Ethnic Bias in the Courts of Massachusetts to solicit
attorneys’ views on racial and ethnic bias in the courts. (You have already seen a portion
of this questionnaire in an earlier homework set.)
[6]
(a)
Critique this questionnaire by giving three explicit criticisms of the questionnaire
as a device to study the twin issues of racial and ethnic bias in the courts. You
might criticize the structure of the questionnaire, the wording of individual
questions, the categories that are offered as answers to a particular question, or
any other aspect of the questionnaire that you find problematic.
The answer provided here is taken directly from An Analysis of the Report, "Racial and
Ethnic Bias in the Massachusetts Court System: A Survey of Attorneys and Judges,"
submitted to the Commission to Study Racial and Ethnic Bias in the Courts, Supreme
Judicial Court of the Commonwealth of Massachusetts (Schuster, 1994). Students were only
expected to provide three criticisms of the Attorney Survey in the homework set.
•
It is often unclear whose bias is being measured by a particular question. Is it intended
to measure the bias of the actor in the legal system about whom the question is being asked,
or is it intended to measure the bias of the attorney of whom the question is being asked? If
the latter plays even a minor role here, it is hard to know how to interpret the results. An
attorney who is doing the best to represent the interests of his or her client may do well to
claim bias even if he or she does not necessarily feel that it is evident.
•
Questions 16-18 on page 3 illustrate a different problem. The wording of these
questions includes two premises, and because the wording includes two premises (rather
than one) it is hard to know exactly what is being measured and how the respondent is
expected to respond. The first premise is that “minority attorneys are addressed by first
names or familiar terms.” The second is that “non-minorities are addressed by surnames
or titles.” In answering the question is the respondent expected to judge the frequency of
the first premise, the second, or the combination? But the combination that the question
appears to ask about is only one possible combination: non-minority surnames and minority
first names. What about the possibility that both groups are addressed by first names, but
that minority attorneys are more frequently addressed in this way? How would an attorney
who believed this respond to the frequency question?
This question would have been better as an agreement/disagreement question: “To what
extent do you agree with the statement: ‘Minority attorneys are more often addressed by
their first names or in familiar terms than are non-minority attorneys.’” Then, it would be
much easier to interpret the results.
Even so, there is another problem with this question. In which direction is the bias about
which the Supreme Judicial Court should be concerned? Perhaps the way in which bias
works here is that the court system is very casual and open to non-minority attorneys,
addressing them in familiar terms, whereas it is more formal and more forbidding to
minority attorneys, addressing them by surnames or titles? Wouldn’t this be unacceptable
bias also? This question in this questionnaire seems to assume that only a difference in one
direction is bias.
It would have been even better to have separated each of these questions into two, assessing
how often minority attorneys are addressed in familiar, first name ways and how often nonminority attorneys are addressed in these ways. Then, these two answers could be
compared for each attorney-respondent to ascertain what percentage of attorneys believes
that minorities are more often addressed in this way.
•
Questions 19-21 display the same problem of double premises.
•
Some of the questions that are posed as frequency questions would be better posed as
agreement/disagreement questions because they are soliciting attorneys’ opinions. For
example, questions 28-31 would have been better and more revealing if they had been
posited as assertions measuring the degree of agreement with the assertion. Another way to
think about this is to ask, what does it mean that an attorney feels that “minority attorneys’
statements appear to be given less credibility sometimes?” It would have been far better to
state “Minority attorneys’ opinions are given less credibility” and measure degree of
agreement/disagreement.
•
Question 33 is unlikely to give an honest response.
•
Question 37 is another question that would have been better as an
agreement/disagreement question, particularly because of the ambiguity of the phrase
“same access.”
•
It is interesting to note that the section at the bottom of Page 5 begins with the phrase
“based on your observations or experience.” The unspoken assumption is that the prior
section was about the reporting of actual facts in the frequencies, but of course the answers
there are also, in the final analysis, based on observations and experience. The temptation
to try and measure relative frequency in a document of this sort is great, but it can cause
lots of problems as these results are filtered through various levels of extracted documents,
reports, and press accounts. Perhaps most of these questions should have been posed as
agreement/disagreement questions.
•
Page 6. Here the questionnaire turns to a series of agreement/disagreement questions.
Generally, the wording in these sections of both questionnaires is much clearer. The
respondent knows exactly what he or she is being asked to respond to. The main problem
here is in the number of loopholes that are offered to the respondent to get out of answering
these questions. There are three ways to avoid giving an opinion: (1) you may circle
“undecided”; (2) you may circle “no basis for opinion,” which is also designated by the
enigmatic N/A, which I suppose means not applicable—though that is different from “no
basis for opinion”; or (3) you may choose to simply not answer the question at all.
The interpretation problem here is knowing what each of these loopholes means and how to
interpret it. In the text of the report the authors often combine “undecided” and “no basis
for opinion” and report a single percentage. Somehow that does not seem quite right.
Indeed, at the very end of the text the authors worry about this a little bit and do a quick
analysis on pages 71-73. They shrug the problem off by suggesting that it does not
complicate conclusions too much. But this seems to me to be too little too late.
The role of the “no basis for opinion” option should be to filter out attorneys who have not
had experience with particular types of trials. In other words, the desire is to ask each of
the survey questions of those who have had experience of a particular type. As the authors
themselves suggest, the information to do this filtering is already contained in the
questionnaire, and, indeed, the authors have used the answers to question 11 to filter out
those attorneys who have spent no time in Massachusetts courts. (This is noted in passing
on page 11 of the main text, but deserves much greater prominence. The reader will never
understand the import of the deletion of these attorneys without a clearer statement of what
has been done.) This selection, however, is not done for the later filter questions, and there
the report still includes attorneys who have arguably not had experience in cases of a
particular type. The same statement may be made of the analysis of the attorney survey as
well. Unfortunately, as the authors’ analysis shows, some number of attorneys who had had
experience of various types still said they had “no basis for opinion.” A large loophole was
left open when respondents were not forced a bit more by the design of the questionnaire to
take a stand on these questions, and very high non-response/non-opinion/undecided rates
resulted.
Recommendation: This issue could be minimized by reanalyzing the data and rigorously
applying the filter questions that are available in later sections These further analyses
might also omit those who said they had “no basis for an opinion.”
To be absolutely clear in the first section, the various results might be more accurately
presented: “Of those attorneys with trial experience in the Massachusetts courts, X
percent said Y.” In later sections, if the analysis is adjusted in this way with an additional
filter, the corresponding phrases might read, for example: “Of those attorneys with trial
experience in the Massachusetts courts and with experience in criminal cases, X percent
said Y.”
•
I have no idea what question 70 means. Don’t both groups of defendants fear being
found guilty. How does it advance our knowledge of bias?
•
In question 76 the phrase “I have noticed that” is ill-advised. It draws the
respondent’s attention to himself/herself. Are we measuring whether or not a person
noticed or whether or not he or she believes that minority employees are overly sensitive?
•
Questions 81 and 82 contain two premises. I would have left off the first phrase in both
cases. Is this what was being measured? In any event, question 12 offers a rough proxy for
the first phrase anyway, and later questions give a good measure for the percentage of cases
of particular types involving minority litigants.
•
The wording of question 84 is unhelpful because we don’t know whether disagreement
means that the respondent thinks that minorities have more opportunity or less. Bias can
happen in both directions, though the questionnaire seems to assume that it can happen in
only one.
•
Questions 95 and 96 are poorly worded. They are asking for the respondent’s opinions
on someone else’s views. This would be thrown out in court—the witness is being asked to
speculate about someone else’s motives. Preferable would have been the following: “A
criminal case is more ‘winnable’ when the victim is white than when the victim is a
minority.” This would be considering the respondent’s opinions directly.
•
Question 143 is an odd question. We would only expect a lawyer to use an interpreter
when one was necessary. Would anyone use an interpreter usually or always? Of course
not.
I have also attached a description of my discussion of the sampling procedure that was
used by the original consultants as well as my mathematical adjustments to that
procedure. I have given you far more than is necessary to answer part (b) of this question,
but I thought it would be useful to show you how complicated the issues of sampling,
weighting, and non-response can be in an actual survey setting. I trust that I wrote this
section clearly enough to explain the underlying issues.
[3]
(b)
Why did the researchers attempt a census of minority attorneys while only
surveying a one-in-seven sample of white attorneys? What is this form of
sampling called?
The Commission wanted to ensure that the final sample should include a sufficient
number of minority attorneys (so that there could be comparisons made between them
and non-minority attorneys) and because there are relatively few minorities among the
attorney population in Massachusetts, one could not be sure that a random sample of
the BBO’s list would contain a sufficient number of minority attorneys for analysis and
comparison. Hence the researchers attempted a census of minority attorneys.
This form of sampling is called stratified sampling. One stratum was sampled with a
systematic sample and the other was subjected to a census.
Question 4
The Code of Ethics of the American Association for Public Opinion Research requires
public opinion pollsters who are its members to disclose the answers to six questions
concerning the design of their surveys and their implementation.
[12]
(a)
These six questions are reproduced below in italics. For each question, write one
or two sentences indicating why an answer to that question is important to your
ability to evaluate the survey on which the pollster is reporting. Be as specific as
possible; more specific answers will receive higher credit.
•
What was the population?
We need to know the population so that we can understand who the pollster is
talking about in his/her analysis. It is important to know both the target and
parent populations so that we can know about whom the pollster is
generalizing the sample results.
•
How was the sample selected?
We need to know how the sample was selected so that we can decide if it was
done in a statistically valid fashion and if it is representative of the population
in a statistical sense. If a probabilistic sampling method was used, we can
then measure the size of the error when generalizing the results to the
population. We can also check to see if there was any bias in the selection of
the sample. If a non-probabilistic method was used (e.g. convenience or
judgement sampling), we need to be more cautious in accepting the
generalization of results to the population.
• What was the sample size?
[Note: We will soon learn the mathematical relevance of sample size. We do
not expect a fully mathematical answer at this point.]
We need to know the sample size so that we can determine whether or not the
sample is large enough so as to sufficiently minimize errors in making
generalizations about the population.
•
How were the subjects contacted?
This information is important in being able to understand the possible extent
of non-response bias involved in conducting the survey. A mail-in
questionnaire typically elicits a lower response rate than a telephone or
personal survey. However, a personal or telephone interview may result in
greater response bias in terms of the interviewer prompting certain types of
responses through the tone of voice, dynamics between the interviewer and
interviewee (such as race issues, issues of the interviewee not wanting to
appear ignorant by not answering a question and so lying and so on).
•
When was the survey conducted?
It is important to know when the survey was conducted in order to know
whether or not there was any bias in the responses caused by external timespecific factors. For example, an exit poll when people are voting for a
candidate may not result in a truthful answer if a person does not want to
reveal their preference to the interviewer.
•
What were the exact questions asked?
It is important to know the exact questions asked so that we can tell whether
there was any bias in the wording of the questions.
[2]
(b)
In sampling, what is the difference between the “target population” and the
“parent population,” and why is this distinction important?
The parent population is the actual population from which the actual sample is
chosen while the target population is the entire population that is being studied.
This distinction is important because the results from the sample can be
generalized to the parent population and not to the target population. Only if the
parent and target populations are identical can the results be generalized to the
target population.
For example, the target population for the survey in Question 3 was a list of
attorneys practicing in the Commonwealth of Massachusetts. On the other hand,
the parent population was lawyers who had passed the bar in Massachusetts, who
were active at the time of the survey and whose business address was in the State
of Massachusetts. The results of the survey are only generalizable to this subset
of attorneys and not to all attorneys residing in Massachusetts (or all attorneys
who have passed the Massachusetts Bar).
Probability
Question 5
[7]
Do Exercise #5.11, page 297, Moore (fourth edition).
(a) We need to know what counts as “poverty”. The government definition depends on
household income relative to the number of people in the household.
(b) Table total = 39,263,000 (entries are in thousands).
(c) 3755/39263 = 0.096 = 9.6%
(d) 10876/39263 = 0.277 = 27.7%
(e) 2939/26226 = 0.112 = 11.2%
(f) 5125/15727 = 0.326 = 32.6%
(g) No - the table does not show the total number of people who are 65 and older, only
the number who are below poverty.
In addition to the questions listed in Moore, please do the following:
[3]
(h)
Construct a tree diagram that accurately depicts the probabilities implied in the
table. Clearly label all the nodes and branches. Calculate all of the raw,
conditional and joint probabilities that are necessary to complete the diagram.
[3]
(i)
Construct a tree diagram that would include all Americans both above and below
poverty level. Clearly label all the nodes, branches and probabilities. (You will
not have sufficient information to calculate all the probabilities. This exercise
should clearly reveal the answer to part (g).)
Question 6
[12]
Do Exercise 4.115, pages 242-243 of Weiss (fourth edition).
Let D = drug abuse
T = teen responds
AD = adult responds
P(D/T) = 0.32
P(D/AD) = 0.27
P(T) = 0.332
P(AD) = 0.668
(a) P(D) = P(T and D) + P(AD and D)
= P(T) x P(D/T) + P(AD) x P(D/AD)
= 0.332 x 0.32 + 0.668 x 0.27 = 0.287
(b) P(T) = 0.332
(c) P(T/D) = P(T and D)/P(D) = P(T) x P(D/T)/P(D) = 0.332 x 0.32/0.287 = 0.371
(d) The interpretation of each item above is as follows:
(i) 28.7% of all respondents say that drug abuse is the nation’s top problem
(ii) 33.2% of all respondents were teenagers
(iii) 37.1% of those respondents who say that drug abuse is the nation’s top problem
were teenagers.
Download