Department of Urban Studies and Planning Massachusetts Institute of Technology 11.220 Quantitative Reasoning and Statistical Methods for Planning I Spring 1998 Homework Set #3 Solutions [Total = 62 points] Sampling, Surveys, Opinion Polling Question 1 The City Planning Commission in your city has an index card file of all addresses in the city. Each card contains one address and a list of all of the families that live at that address. [4] (a) Outline how you would select a simple random sample of addresses from this card file. Be brief but complete. Several sentences should suffice. The City Planning Commission has an index card file of addresses. To select a simple random sample, it numbers each of the cards from 0 to "N" where "N" is the total size of the population of addresses. Knowing the number of digits in "N", the commission can refer to the random number tables and use that number of digits to randomly select each card. For each random number from the tables, the commission pulls the corresponding card out of the index file. The selection continues until "n" cards (i.e., the sample size) are pulled out. An alternative procedure is to use systematic sampling. First, divide the population size, "N," by the sample size, "n," (N/n) and round to a whole number (call this number "X"). Then, go to the random number table and select a number between 1 and X for the first case. Then, choose every Xth card after that. Notice that X should not be selected arbitrarily. It is important not to under-represent or over-represent the beginning or the end of the index card file. A third alternative procedure for drawing the sample is to mix all of the cards together in a large box. Mix them well and then select "n" cards. Unfortunately, someone will have to refile all the cards after the selection is completed. Good luck if you're the chosen one! [3] (b) Once you have taken a simple random sample of addresses from the card file, you take all the families on each of the cards you selected. Is the resulting sample of families a simple random sample of all families in this city? Explain. If not, indicate what type of family is overrepresented or underrepresented in the sample. No, the resulting sample of families is not a simple random sample of all the families in the city. When one family at an address is selected to be in the sample, then all of the families at that address are included in the selection. This is only the case if the address selected corresponds to a multi-family unit. (In probability terms, the probability of each family on the selected card then becomes 1.) Consequently, each family in the total population does not have an equal chance of being selected; in short, it is not a simple random sample. Families in multi-unit dwellings will be over-represented. [3] (c) Once you have taken a simple random sample of addresses from the card file, you select one family at random from each of the sampled addresses. Is the resulting sample of families a simple random sample of families living in this city? Explain. If not, indicate what type of family is overrepresented or underrepresented in the sample. No, the resulting sample of families is not a simple random sample of families living in the city. Once a family from a multi-unit dwelling is chosen, the other families cannot be chosen. (In probability terms, their probability of selection would go to 0). So again, each family does not have an equal chance of being selected, and we do not have a simple random sample. Using this procedure, families living in multi-units would be under-represented. Conversely, families living in single units would be overrepresented. Question 2 A survey is carried out by the planning department to determine the distribution of household size in a certain city. The department draws a simple random sample of 1,000 households, but after several visits the interviewers find people at home in only 853 of the sample households. Rather than face such a high nonresponse rate, the planners draw a second batch of households and use the first 147 completed interviews in this second batch to bring the sample up to its goal of 1,000 households. They count 3,087 people in these 1,000 households and estimate the average (mean) household size in the city to be about 3.1 persons. [4] (a) Is this estimate of the mean household size in the city likely to be too low, too high, or just about right? Explain. The estimate of mean household size in the city, 3.1 persons, is likely to be too high. The first 147 completed interviews in the second batch are more likely to represent larger households because there is a better chance of finding one of the household members at home in larger households. Question 3 Attached to this homework set is the complete survey questionnaire that was used by the Commission to Study Racial and Ethnic Bias in the Courts of Massachusetts to solicit attorneys’ views on racial and ethnic bias in the courts. (You have already seen a portion of this questionnaire in an earlier homework set.) [6] (a) Critique this questionnaire by giving three explicit criticisms of the questionnaire as a device to study the twin issues of racial and ethnic bias in the courts. You might criticize the structure of the questionnaire, the wording of individual questions, the categories that are offered as answers to a particular question, or any other aspect of the questionnaire that you find problematic. The answer provided here is taken directly from An Analysis of the Report, "Racial and Ethnic Bias in the Massachusetts Court System: A Survey of Attorneys and Judges," submitted to the Commission to Study Racial and Ethnic Bias in the Courts, Supreme Judicial Court of the Commonwealth of Massachusetts (Schuster, 1994). Students were only expected to provide three criticisms of the Attorney Survey in the homework set. • It is often unclear whose bias is being measured by a particular question. Is it intended to measure the bias of the actor in the legal system about whom the question is being asked, or is it intended to measure the bias of the attorney of whom the question is being asked? If the latter plays even a minor role here, it is hard to know how to interpret the results. An attorney who is doing the best to represent the interests of his or her client may do well to claim bias even if he or she does not necessarily feel that it is evident. • Questions 16-18 on page 3 illustrate a different problem. The wording of these questions includes two premises, and because the wording includes two premises (rather than one) it is hard to know exactly what is being measured and how the respondent is expected to respond. The first premise is that “minority attorneys are addressed by first names or familiar terms.” The second is that “non-minorities are addressed by surnames or titles.” In answering the question is the respondent expected to judge the frequency of the first premise, the second, or the combination? But the combination that the question appears to ask about is only one possible combination: non-minority surnames and minority first names. What about the possibility that both groups are addressed by first names, but that minority attorneys are more frequently addressed in this way? How would an attorney who believed this respond to the frequency question? This question would have been better as an agreement/disagreement question: “To what extent do you agree with the statement: ‘Minority attorneys are more often addressed by their first names or in familiar terms than are non-minority attorneys.’” Then, it would be much easier to interpret the results. Even so, there is another problem with this question. In which direction is the bias about which the Supreme Judicial Court should be concerned? Perhaps the way in which bias works here is that the court system is very casual and open to non-minority attorneys, addressing them in familiar terms, whereas it is more formal and more forbidding to minority attorneys, addressing them by surnames or titles? Wouldn’t this be unacceptable bias also? This question in this questionnaire seems to assume that only a difference in one direction is bias. It would have been even better to have separated each of these questions into two, assessing how often minority attorneys are addressed in familiar, first name ways and how often nonminority attorneys are addressed in these ways. Then, these two answers could be compared for each attorney-respondent to ascertain what percentage of attorneys believes that minorities are more often addressed in this way. • Questions 19-21 display the same problem of double premises. • Some of the questions that are posed as frequency questions would be better posed as agreement/disagreement questions because they are soliciting attorneys’ opinions. For example, questions 28-31 would have been better and more revealing if they had been posited as assertions measuring the degree of agreement with the assertion. Another way to think about this is to ask, what does it mean that an attorney feels that “minority attorneys’ statements appear to be given less credibility sometimes?” It would have been far better to state “Minority attorneys’ opinions are given less credibility” and measure degree of agreement/disagreement. • Question 33 is unlikely to give an honest response. • Question 37 is another question that would have been better as an agreement/disagreement question, particularly because of the ambiguity of the phrase “same access.” • It is interesting to note that the section at the bottom of Page 5 begins with the phrase “based on your observations or experience.” The unspoken assumption is that the prior section was about the reporting of actual facts in the frequencies, but of course the answers there are also, in the final analysis, based on observations and experience. The temptation to try and measure relative frequency in a document of this sort is great, but it can cause lots of problems as these results are filtered through various levels of extracted documents, reports, and press accounts. Perhaps most of these questions should have been posed as agreement/disagreement questions. • Page 6. Here the questionnaire turns to a series of agreement/disagreement questions. Generally, the wording in these sections of both questionnaires is much clearer. The respondent knows exactly what he or she is being asked to respond to. The main problem here is in the number of loopholes that are offered to the respondent to get out of answering these questions. There are three ways to avoid giving an opinion: (1) you may circle “undecided”; (2) you may circle “no basis for opinion,” which is also designated by the enigmatic N/A, which I suppose means not applicable—though that is different from “no basis for opinion”; or (3) you may choose to simply not answer the question at all. The interpretation problem here is knowing what each of these loopholes means and how to interpret it. In the text of the report the authors often combine “undecided” and “no basis for opinion” and report a single percentage. Somehow that does not seem quite right. Indeed, at the very end of the text the authors worry about this a little bit and do a quick analysis on pages 71-73. They shrug the problem off by suggesting that it does not complicate conclusions too much. But this seems to me to be too little too late. The role of the “no basis for opinion” option should be to filter out attorneys who have not had experience with particular types of trials. In other words, the desire is to ask each of the survey questions of those who have had experience of a particular type. As the authors themselves suggest, the information to do this filtering is already contained in the questionnaire, and, indeed, the authors have used the answers to question 11 to filter out those attorneys who have spent no time in Massachusetts courts. (This is noted in passing on page 11 of the main text, but deserves much greater prominence. The reader will never understand the import of the deletion of these attorneys without a clearer statement of what has been done.) This selection, however, is not done for the later filter questions, and there the report still includes attorneys who have arguably not had experience in cases of a particular type. The same statement may be made of the analysis of the attorney survey as well. Unfortunately, as the authors’ analysis shows, some number of attorneys who had had experience of various types still said they had “no basis for opinion.” A large loophole was left open when respondents were not forced a bit more by the design of the questionnaire to take a stand on these questions, and very high non-response/non-opinion/undecided rates resulted. Recommendation: This issue could be minimized by reanalyzing the data and rigorously applying the filter questions that are available in later sections These further analyses might also omit those who said they had “no basis for an opinion.” To be absolutely clear in the first section, the various results might be more accurately presented: “Of those attorneys with trial experience in the Massachusetts courts, X percent said Y.” In later sections, if the analysis is adjusted in this way with an additional filter, the corresponding phrases might read, for example: “Of those attorneys with trial experience in the Massachusetts courts and with experience in criminal cases, X percent said Y.” • I have no idea what question 70 means. Don’t both groups of defendants fear being found guilty. How does it advance our knowledge of bias? • In question 76 the phrase “I have noticed that” is ill-advised. It draws the respondent’s attention to himself/herself. Are we measuring whether or not a person noticed or whether or not he or she believes that minority employees are overly sensitive? • Questions 81 and 82 contain two premises. I would have left off the first phrase in both cases. Is this what was being measured? In any event, question 12 offers a rough proxy for the first phrase anyway, and later questions give a good measure for the percentage of cases of particular types involving minority litigants. • The wording of question 84 is unhelpful because we don’t know whether disagreement means that the respondent thinks that minorities have more opportunity or less. Bias can happen in both directions, though the questionnaire seems to assume that it can happen in only one. • Questions 95 and 96 are poorly worded. They are asking for the respondent’s opinions on someone else’s views. This would be thrown out in court—the witness is being asked to speculate about someone else’s motives. Preferable would have been the following: “A criminal case is more ‘winnable’ when the victim is white than when the victim is a minority.” This would be considering the respondent’s opinions directly. • Question 143 is an odd question. We would only expect a lawyer to use an interpreter when one was necessary. Would anyone use an interpreter usually or always? Of course not. I have also attached a description of my discussion of the sampling procedure that was used by the original consultants as well as my mathematical adjustments to that procedure. I have given you far more than is necessary to answer part (b) of this question, but I thought it would be useful to show you how complicated the issues of sampling, weighting, and non-response can be in an actual survey setting. I trust that I wrote this section clearly enough to explain the underlying issues. [3] (b) Why did the researchers attempt a census of minority attorneys while only surveying a one-in-seven sample of white attorneys? What is this form of sampling called? The Commission wanted to ensure that the final sample should include a sufficient number of minority attorneys (so that there could be comparisons made between them and non-minority attorneys) and because there are relatively few minorities among the attorney population in Massachusetts, one could not be sure that a random sample of the BBO’s list would contain a sufficient number of minority attorneys for analysis and comparison. Hence the researchers attempted a census of minority attorneys. This form of sampling is called stratified sampling. One stratum was sampled with a systematic sample and the other was subjected to a census. Question 4 The Code of Ethics of the American Association for Public Opinion Research requires public opinion pollsters who are its members to disclose the answers to six questions concerning the design of their surveys and their implementation. [12] (a) These six questions are reproduced below in italics. For each question, write one or two sentences indicating why an answer to that question is important to your ability to evaluate the survey on which the pollster is reporting. Be as specific as possible; more specific answers will receive higher credit. • What was the population? We need to know the population so that we can understand who the pollster is talking about in his/her analysis. It is important to know both the target and parent populations so that we can know about whom the pollster is generalizing the sample results. • How was the sample selected? We need to know how the sample was selected so that we can decide if it was done in a statistically valid fashion and if it is representative of the population in a statistical sense. If a probabilistic sampling method was used, we can then measure the size of the error when generalizing the results to the population. We can also check to see if there was any bias in the selection of the sample. If a non-probabilistic method was used (e.g. convenience or judgement sampling), we need to be more cautious in accepting the generalization of results to the population. • What was the sample size? [Note: We will soon learn the mathematical relevance of sample size. We do not expect a fully mathematical answer at this point.] We need to know the sample size so that we can determine whether or not the sample is large enough so as to sufficiently minimize errors in making generalizations about the population. • How were the subjects contacted? This information is important in being able to understand the possible extent of non-response bias involved in conducting the survey. A mail-in questionnaire typically elicits a lower response rate than a telephone or personal survey. However, a personal or telephone interview may result in greater response bias in terms of the interviewer prompting certain types of responses through the tone of voice, dynamics between the interviewer and interviewee (such as race issues, issues of the interviewee not wanting to appear ignorant by not answering a question and so lying and so on). • When was the survey conducted? It is important to know when the survey was conducted in order to know whether or not there was any bias in the responses caused by external timespecific factors. For example, an exit poll when people are voting for a candidate may not result in a truthful answer if a person does not want to reveal their preference to the interviewer. • What were the exact questions asked? It is important to know the exact questions asked so that we can tell whether there was any bias in the wording of the questions. [2] (b) In sampling, what is the difference between the “target population” and the “parent population,” and why is this distinction important? The parent population is the actual population from which the actual sample is chosen while the target population is the entire population that is being studied. This distinction is important because the results from the sample can be generalized to the parent population and not to the target population. Only if the parent and target populations are identical can the results be generalized to the target population. For example, the target population for the survey in Question 3 was a list of attorneys practicing in the Commonwealth of Massachusetts. On the other hand, the parent population was lawyers who had passed the bar in Massachusetts, who were active at the time of the survey and whose business address was in the State of Massachusetts. The results of the survey are only generalizable to this subset of attorneys and not to all attorneys residing in Massachusetts (or all attorneys who have passed the Massachusetts Bar). Probability Question 5 [7] Do Exercise #5.11, page 297, Moore (fourth edition). (a) We need to know what counts as “poverty”. The government definition depends on household income relative to the number of people in the household. (b) Table total = 39,263,000 (entries are in thousands). (c) 3755/39263 = 0.096 = 9.6% (d) 10876/39263 = 0.277 = 27.7% (e) 2939/26226 = 0.112 = 11.2% (f) 5125/15727 = 0.326 = 32.6% (g) No - the table does not show the total number of people who are 65 and older, only the number who are below poverty. In addition to the questions listed in Moore, please do the following: [3] (h) Construct a tree diagram that accurately depicts the probabilities implied in the table. Clearly label all the nodes and branches. Calculate all of the raw, conditional and joint probabilities that are necessary to complete the diagram. [3] (i) Construct a tree diagram that would include all Americans both above and below poverty level. Clearly label all the nodes, branches and probabilities. (You will not have sufficient information to calculate all the probabilities. This exercise should clearly reveal the answer to part (g).) Question 6 [12] Do Exercise 4.115, pages 242-243 of Weiss (fourth edition). Let D = drug abuse T = teen responds AD = adult responds P(D/T) = 0.32 P(D/AD) = 0.27 P(T) = 0.332 P(AD) = 0.668 (a) P(D) = P(T and D) + P(AD and D) = P(T) x P(D/T) + P(AD) x P(D/AD) = 0.332 x 0.32 + 0.668 x 0.27 = 0.287 (b) P(T) = 0.332 (c) P(T/D) = P(T and D)/P(D) = P(T) x P(D/T)/P(D) = 0.332 x 0.32/0.287 = 0.371 (d) The interpretation of each item above is as follows: (i) 28.7% of all respondents say that drug abuse is the nation’s top problem (ii) 33.2% of all respondents were teenagers (iii) 37.1% of those respondents who say that drug abuse is the nation’s top problem were teenagers.