Unit 2 Notes 2300 Jenna G. Tichon Unit 2 Part 1 Objectives: By the end of this class the student should be able to: define basic sampling terms explain common concerns when taking samples identify sources of sampling and non-sampling error suggest remedies to reduce non-sampling error 2.1.1 Basic Sampling Terms Element An object on which a measurement is taken. Population A collection of elements about which we wish to make an inference. Sampling Unit Nonoverlapping collections of elements from the population that cover the entire population. (Sampling) Frame A list of sampling units. Sample A collection of sampling units drawn from a frame or frames. Note a unit and an element may or may not be the same thing. If your elements are puppies and you’re sampling individual puppies at pet shelters, your units and elements are the same. ∗ QUESTION: What could your sampling units be instead so that your elements and sampling units would not be the same? Suppose I wanted to survey adults in the City of Winnipeg about how often they wear their masks in public places. Our elements would be adults and our population would be adults in the City of Winnipeg. QUESTION: What might we use for sampling units? QUESTION: What might we use for a sampling frame? 2.1.2 Basic Considerations for Sampling Sample Size We want to estimate a population parameter. Our best guess is a sample statistic. We realize every sample is different but ideally we’d like to make sure our estimate (θ̂) is within B . We need to live with some error in our life or we’d need gigantic sample sizes so we also want P(|θ − θ|̂ Or , i.e. we only “want” a big error 100(α)% of the time. (Type 1 error rate) What sample size will we need for that? ̂ P(|θ − θ| > B) = α Sampling Method How are our elements spread out throughout our sampling frame? How easy is it for me to access them? Easy to sample, no groupings of common elements. Maybe simple random sample? < B) = 1 − α Are there defined subgroups? Should we stratify? Are our sampling units very geographically spread apart? Multistage? Are there defined groups where there’s no big difference between groups but inside the groups it is diverse? Clusters? Do I have an easily accessible list where the order is more or less random? Maybe systematic. How much money do I have available? We must always remember that sampling costs time and money. The “best” answer to every sampling question is survey the whole population but we can’t so we are not concerned with best so much as best within practical, time, and monetary constraints. An impractical answer is as useless as a blatantly wrong answer. Am I genuinely selecting my sample randomly? AT HOME: Read the Gallup statement at the end of section 2.3. As you’re reading the statement, what year does it sound like it was taken from? Note the year when you’re done reading in the citation and think about how that may be out of date or inappropriate today. Here are some ways that modern large opinion polling companies do their sampling: Angus Reid (http://angusreid.org/how-we-poll-ari/) A list of blog topics (https://news.gallup.com/topic/methodologyblog.aspx) by Gallup on modern survey methods at their company In particular an article by Gallup on the state of telephone (https://news.gallup.com/opinion/methodology/225143/listening-statetelephone-surveys.aspx) Pew Research Center (https://www.pewresearch.org/methods/u-s-survey-research/our-survey-methodology-in-detail/) 2.1.3 Error of Nonobservation vs. Error of Observation Errors of nonobservation are related to our sample making up only part of the target population and errors of observation are related to what is recorded about our sampling units being inaccurate. Errors of Nonobservation Sampling Error The distance between the recorded statistic and the population parameter due to only collecting a sample of the population. (e.g. our statistic changes between each sample merely because each sample is different, not because the parameter is changing.) Undercoverage When a sampling frame does not include the entire target population. (There can also be issues with the sampling frame containing units not in the target population.) Nonresponse When you cannot collect measurement on selected units in your sample. Our sampling error is something we have to live with as the price of being statisticians. Assuming we have 100% ideal conditions/compliance/measurement/sampling frame/etc… (ha!) we can atleast control this by setting α . Issues with coverage are hard to correct after the fact as there was a reason they were not included in the original sampling frame in the first place. Responsibly, you should report what your sampling frame was and how it compares to your target population. e.g. U of M alumni vs. alumni organization’s mailing list Eligible voters in Winnipeg vs. people on last year’s registered voters list Households in Winnipeg vs. houses listed in the telephone book Often those missing are missing for reasons that may make them important and unique parts of your population to survey. In particular at risk or low income populations can be marginalized from participation in opinion polling. We can broadly classify non-response into three causes: An inability to physically reach a sampling unit. e.g. No internet connection, no phone line, no permanent address. An inability of the sampling unit to give the correct response. e.g. They may not have the appropriate data available to them such as a person not being able to say how much they’ve paid in GST over the past 3 months. A person may refuse to answer the survey. QUESTION: What are some reasons a person may refuse to answer a survey? Errors of Observation We can broadly group errors of observation as being due to: Interviewers: Tone, age, gender, physical appearance, and demeanor of an interviewer can all affect how truthful people will be, intentionally or unintentionally (e.g. Not wanting to tell a woman they don’t support changes to parental leave vs. Being influenced by a perceived shameful tone in the way an interviewer reads a question.) Respondents: Respondents might not understand questions, may not seek clarification, may be embarrassed (or fearful) to answer truthfully, may exagerate, may make up answers to not appear uninformed, or confuse units of measurement. Measurement Instrument : Confusion around what the unit of measurement is or how something is defined. e.g. Does employed mean full time?; Does your children include adult children? Step-children?; Would you like me to qunatify my commute time in minutes or hours? Method of Data Collections : Accuracy can be affected by conducting personal interviews vs telephone interviews vs self-administered questionairres vs direct observation. QUESTION: I gave a question several times to my STAT 1150 students asking how many keys they had on the keyring with their house key. What do you think were some of the problems students had when deciding to answer it? 2.1.4 Reducing Error There are many ways research companies and researches attempt to reduce errors in their surveys: Callbacks: Making follow-up calls or sending reminder surveys (by mail or email) can help response rates. Follow-up calls should vary by time of day and week to catch people on different schedules. Rewards and Incentives: Surveys can offer monetary incentives for participating or put respondents into draws for a potential reward. Large survey companies with panels of people they select from may earn points towards gift cards or other rewards. Interviewer Training: Interviewers should have opportunities to practice asking questions under watchful eyes that can suggest improvements in intonation or pronunciation or demeanour that may get more truthful answers. Data Checks: Data can be cross referenced (e.g. age to year of birth), obvious “wrong” answers can be eliminated or corrected by followups if possible. Questionainnaire Construction: We will look in our next lecture how questions can be constructed to get honest and truthful answers from respondents and help eliminate people from lying due to not understanding questions. 2.1.5 Summary Summary A well thought out survey considers both how to pick sampling units as well as how to question them. We will always have sampling error we can’t control beyond fixing α but we should try to fix non-sampling error. We can have errors in both getting our sample and in getting our answers. There are techniques that can be employed to help minimize non-sampling error. 2.1.6 References for Reading: Textbook sections: 2.1 - 2.4 Slides for a talk on incentives in surveys (https://iriss.stanford.edu/sites/g/files/sbiybj6196/f/singer_slides.pdf) An academic paper on whether incentives degrade data quality (https://scholarworks.iu.edu/dspace/bitstream/handle/2022/23761/Does%20use%20of%20survey%20incentives%20degrade%20data%20quality sequence=1&isAllowed=y). Longer read, fair warning. 2.1.7 Practice Problems: Give some thoughts to textbook 2.1 to 2.7. Feel free to share ideas of thoughts on the forums for this class. Unit 2 Part 2 Objectives: By the end of this class the student should be able to: design a questionnaire word surveys to receive accurate and unbiased results plan the stages involved in designing a survey When designing a questionnaire there are a lot of things that may affect people’s answers. Unintentionally, or let’s hope not intentionally, answers can be swayed one way or another in the way the questions are worded or the survey is designed. Let’s look at some of the things that influence a survey: 2.2.1 Question Ordering When there are many options in a questions where you’re choosing between several choices, there can be a recency effect. Sanjeev and Balyan, 2014: A primacy effect occurs when some respondents remember choices that appear first in a given list and are therefore more likely to select these response options. It can also happen when an agreeable choice is read from a list, because respondents may select it and move on, without reading through the entire response list for a question. A recency effect, on the other hand, occurs when some respondents are more likely to remember the last choices of a list, and are therefore more likely to select a choice from the final part of a response list. This effect is much more pronounced when a response list has too many options or the scaling is wide. Randomization amongst all participants is a way to combat this. For ordinal questions you can reverse worst to best or best to worst for some. Similar questions can have an effect based on ordering, particularly if one goes from more general to more specific or vice versa. These are called context effects. The text gives an example about people being asked if they were happy in their marriage and if they were happy with their life in general. When it was asked life then marriage, the responses were 52% were very in life in general. When it was asked marriage then life, 38% responded they were very happy in life in general. The theory being people felt happy thinking about their marriages specifically that life in general seemed less great in comparison. Pew Research, n.d.: Another experiment embedded in a December 2008 Pew Research poll also resulted in a contrast effect. When people were asked “All in all, are you satisfied or dissatisfied with the way things are going in this country today?” immediately after having been asked “Do you approve or disapprove of the way George W. Bush is handling his job as president?”; 88% said they were dissatisfied, compared with only 78% without the context of the prior question. Magelssen et al., 2016: Assimilation effects entail that question order reduces differences in responses between adjacent questions; in contrast effects, question order increases differences. Question order effects occur when the thoughts and feelings triggered by a question carry over to the next question, thereby influencing the response. Another e.g. A person that is given a long list of question about crimes might respond differently to a question about if they’ve been a victim of crime as it primes them and gives them opportunities to remember things that have happened to them in the past. Having question written out or restated can help reduce the issues with question ordering or long numbers of choices to make sure they refocus on the given question. 2.2.2 Open vs Closed Questions Closed questions have a predetermined set of ansers or a finite numerical answer (e.g. age or pain rating 1 to 10) Open questions allows people answer however they would like. Pros and cons of each: closed is easier to analyze but harder to capture all possible choices and subject to effects from question order. More likely to “suggest other” in open ended list. This also implies some categories may get “over chosen” in a closed list. (Pew Research, n.d.) Sometimes pre-survey to get most common options for real survey to help capture what the public will answer as opposed to what the surveyers think they might answer. (Has Family Feud not taught us anything about the things people will suggest?) 2.2.3 Response Options A forced choice question makes a respondent select a yes or a no, a one or the other. Laur and Kennedy (2019) state the research is inconclusive on which is more accurate generally but research shows fairly consistently that the rates of agreement are higher in forced choice. In certain situations, however, it seems highly likely that people are more accurate when doing forced choice rather than “select all that apply” type questions. e.g. Someone is unlikely to report they’ve suffered from addiction when they haven’t (no benefit). Someone may report they haven’t suffered from addiction when they have (to not embarass themselves) so the reporting method with higher results is likely more accurate. They give an example of victimization rate questions. People were either asked if they were victimized by something in particular (e.g. job loss) in a yes/no format for six things or the person had to select from a list with all that applied. The rates were higher in the forced choice. 2.2.4 Wording of Quesetions Leading questions include extra information that purposefully influence people in a particular question. Pew Research, n.d.: An example of a wording difference that had a significant impact on responses comes from a January 2003 Pew Research Center survey. When people were asked whether they would “favor or oppose taking military action in Iraq to end Saddam Hussein’s rule,” 68% said they favored military action while 25% said they opposed military action. However, when asked whether they would “favor or oppose taking military action in Iraq to end Saddam Hussein’s rule even if it meant that U.S. forces might suffer thousands of casualties,” responses were dramatically different; only 43% said they favored military action, while 48% said they opposed it. The introduction of U.S. casualties altered the context of the question and influenced whether people favored or opposed military action in Iraq. Magelssen et al, 2016 For instance, in a classic study carried out in the USA, 23 % of the public agreed that too little was being spent on “welfare”, whereas 64 % agreed that too little was being spent on “assistance to the poor” [9]. The two terms were intended to describe the same policy, yet evidently evoked different judgments in the minds of respondents. Magelssen et al, 2016: An Australian study investigated patients’ views on AD [assisted dying] with the aid of face-to-face interviews in which all respondents were asked a set of questions describing AD in different ways [15]. The study demonstrated that question wording impacted on answers. In particular, to the question “Do you support the idea of euthanasia?” 79 % answered yes; 70 % answered yes to “Do you beieve that a doctor should be able to assist a patient to die?”; and only 34 % gave an affirmative answer to “Do you believe that a doctor should be able to deliberately bring about a patient’s death?”. It is generally good to give someone two options in the wording rather than a straight “Do you favour…?” The text suggested, for e.g., “Do you favor or oppose the use of capital punishment?” over “Do you favor captial punishment?”. Asking “Do you agree with…?” may make the interviewee feel like the interviewer thinks it’s agreeable and make them more likely to respond with yes. Only one question should be asked at a time. A question that addresses to two ideas is called a double barreled question. e.g. “Do you believe the IB program helped promote thinking about global issues and multiculturalism?” Don’t use double negatives: e.g. avoid “Do you favour or oppose not allowing teenage drivers to drive alone after midnight?” Recall measurement instruments from last class? You should be clear in writing out questions. “How much do you work a week?” could be better as “On average, how many hours a week are paid for work?” For in person interviews, a prop might be helpful to demonstrate height or volume. Hospitals often give pain scales with descriptors for each number when asking questions of patients. 2.2.5 Planning a Survey The text suggests the following series of steps as a checklist for a good questionnaire project: Statement of objectives. Target population. The frame. Sample Design. Method of Measurement. Measurement Instrument. Selection and training of fieldworkers. The pretest. Organization of fieldwork. Organization of data management. Data analysis. I would also in a step about summarizing and presenting your analysis and conclusions! 2.2.6 Summary Summary People are really easily influenced. Use this for good, not evil. Consider wording of questions, what options you give for answers, and the order of your questions. When readying results of an organization’s survey, always find out the question actually asked. There are lots of steps to consider before giving a questionnaire. 2.2.7 References for Reading: Sections 2.5 and 2.6 of the text Pew Research Center. (n.d.). Questionaire Design (https://www.pewresearch.org/methods/u-s-survey-research/questionnaire-design/). Pew Research Center. https://www.pewresearch.org/methods/u-s-survey-research/questionnaire-design/ (https://www.pewresearch.org/methods/u-s-survey-research/questionnaire-design/) Lasorsa, D. (2003). Question-Order Effects in Surveys: The Case of Political Interest, News Attention, and Knowledge. Journalism & Mass Communication Quarterly, 80(3), 499–512. https://doi.org/10.1177/107769900308000302 (https://doi.org/10.1177/107769900308000302) Magelssen, M., Supphellen, M., Nortvedt, P., & Materstvedt, L. (2016). Attitudes towards assisted dying are influenced by question wording and order: A survey experiment. BMC Medical Ethics, 17(1), 24–. https://doi.org/10.1186/s12910-016-0107-3 (https://doi.org/10.1186/s12910-016-0107-3) Sanjeev, M., & Balyan, P. (2014). Response Order Effects in Online Surveys: An Empirical Investigation. International Journal of Online Marketing (IJOM), 4(2), 28–44. https://doi.org/10.4018/ijom.2014040103 (https://doi.org/10.4018/ijom.2014040103) Laur, A. and Kennedy, C. (2019, May 9). When Online Survey Respondents Only ‘Select Some That Apply’. Pew Research. https://www.pewresearch.org/methods/2019/05/09/when-online-survey-respondents-only-select-some-that-apply (https://www.pewresearch.org/methods/2019/05/09/when-online-survey-respondents-only-select-some-that-apply) 2.2.8 Practice Problems: Consider your answers and reflect on questions 2.16 to 2.22 in the text.