Surveys FPP Chapter 19 General Idea Parameter Population Inference Sample Statistic Some new vocabulary Population Sample Parameter Statistic Inference Bias Non-response bias Response bias Simple random sample Convenience sampling Frame coverage bias Judgment sampling Voluntary sampling Probably others that I’ve missed Plan of Study 1. Issues in questionnaire design 2. Methods for selecting units to survey 3. Administration of surveys Challenges to writing good questions 1. 2. 3. 4. 5. 6. Defining objectives and specifying the kind of answers needed to meet objectives of the question Ensuring all respondents have a shared, common understanding of the question Ensuring people are asked questions to which they know the answers Asking questions respondents are able to answer in the terms required by the question Asking questions respondents are willing to answer accurately Asking questions that don’t lead respondent to a certain answer Steps to running a survey 1. Establish the target population 2. Obtain a sampling frame (this can be very difficult) 3. Select a sample 4. Obtain data from the sampled units Misspecifying target population 1994 Democratic gubernatorial primary in Arizona All polls predict Eddie Basha would trail front-runner by at least 9 points Result of election: Basha won Target population used in polls: registered voters who had voted in previous primaries Surveys that use census as sampling frame U.S. census often used as frame for many federal and social surveys target population here is folks living in U.S. U.S. census misses some people can you think of any examples? Samples take from frame are non-representative even before sampling Selecting samples Units sample should be representative of the target population How do we ensure this? Select a subset of units from the frame at random Most common method is to obtain a “simple random sample” If random sample is large enough, it should have characteristics that mirror the characteristics of the population frame. Obtaining survey data Remember the following when designing a survey Imperative that purpose of survey is stated clearly Confidentiality should be promised and kept At ISU there is a group that ok’s confidentiality of surveys is met Method for asking questions should be the same for all sampled units Unreliable methods of selecting samples What follows are examples of how NOT to select a sample Convenience sampling: Picking units that are easy to measure Judgement sampling: Picking units you judge as representative of the population Voluntary response sampling: Picking units who respond voluntarily What are some examples of each? Additional potential pitfalls Nonresponse bias: Units that do not respond differ from those that do. These folks will be under representated. Frame coverage bias: Frame doesn’t include all of target population Can we think of some examples? Example of voluntary response survey Nightline call-in poll: Ted Koppel asked people to call his show to express their opinion on whether the United Nations should continue to have its headquarters in New York 186,000 people called in with 67% saying no. Independent random sample: 72% said yes. Examples of problematic survey designs Shere Hite’s book, Women and Love: A Cultural Revolution in Progress (1987), claims: 84% of women “not satisfied emotionally with their relationships” (pg. 804) 95% of women “report forms of emotional and psychological harassment from men with whom they are in love relationships” (pg. 810) 70% of women “married five or more years are having sex outside of their marriages” (pg. 856) Hite’s survey To whom did she send a survey? 100,00 questionnaires mailed to professional women’s groups, counseling centers, church societies, and senior citizens’ centers. Her target population was women. What was her actually population? Hite’s survey What did the survey look like? 127 essay questions on questionnaire 4.5% of these questionnaires returned What was not taken into account? Hite’s survey How did she ask the questions? Questions use vague words like “love”. People have different interpretations of such words Questions were leading “Does your husband/lover treat you as an equal? Or are there times when he seems to you as an inferior? Leave you out of decisions? Act superior?” (pg. 795) Another problematic survey design The article “Abortion Rights Groups Surveying Voters’ Views”, by Jack Coffman, appeared in the December 26, 1989 issue of the St. Paul Pioneer Press Dispatch. Problems with Minnesota survey Random sampling comment 1 Say you collect data on units using a method other than a random sample, and you know these data are not representative of the population of interest. Then, you take a random sample from these collected data. This random sample is representative of the population. Wrongo !! Large random samples are representative of the population in the frame. Effectively, this methods uses the unrepresentative, collected data as a frame. By randomly sampling from a unrepresentative sample, you just get a smaller unrepresentative sample. Random sampling comment 2 Say you obtain data that are representative of the target population. Should you take a random sample from these collected data? This question arises when researchers use data collected by others, for example in a Stat 101 project. No! If you have a representative sample, use it. This sub-sampling method just reduces the amount of data you work with Random sampling comment 3 A census is a measurement of outcomes for all units in the population. For example the U.S.. Government does a census of the population every 10 years to apportion seats in the House of Representatives. It also takes censuses of agriculture and business. Why do survey instead of census? Surveys are cheaper They require much fewer people to contact Surveys results can be obtained more quickly Same reason as above This is important because we want to make policy decisions on current answers not answers that are months or years old. Surveys can be more accurate Fewer people to contact, less problems with interviewer effects and non-response bias Up shot: less data of high quality is better than more data of poor quality Random sampling comment 4 Most major surveys are not simple random samples They involve multiple stages of random selection e.g., randomly pick 100 cities. From these cities random pick 500 households, then random pick 1 person from each household Data collection like this are NOT representative of the population. However, because units are selected randomly, statistician can account for the non-representation. This is done by assigning a weight to each observation that reflects how many units it represents in the population A good question to ask here would be: Where do the weights come from? Generally when analyzing data from surveys that are not simple random samples it is wise to contact a professional statistician