Chapter 4: Sample Surveys and Experiments Planning studies through designed surveys and experiments is one of the four major themes of AP Statistics. The overall goal of Chapter 4 is for you to learn good data collection strategies — how to design a survey by randomly selecting participants and how to design a sound experiment by randomly assigning treatments to subjects — that will allow generalization from a sample to a population or will establish cause and effect. 4.1 Why Take Samples, and How Not To The conclusions you get from exploratory data analysis (Chapter 2 – 3) only apply to the specific data you are examining. Often you want to answer questions about a large group of individuals. To get sound answers you must produce data in a way that is designed to answer your questions. Basic Terminology Population – the entire group of individuals that you want information about. Population size – the number of units in the population. Sample – the part of the population you actually examine in order to gather information. Sampling Frame – the list of the population from which you select the sample. Census – a special case of sampling, where information is collected from every unit in the entire population. Parameter – a numeric quantity (usually unknown) used to describe a characteristic of the population. Usually represented by Greek letters. Statistic - a numeric quantity collected from a sample of data, used to estimate a parameter. Usually represented by Latin letters. Hint: remember population (parameters) and sample (statistic). Bias: A Potential Problem with Survey Data When talking about sampling, the terms “nonrepresentative” and “biased” have different meanings. A sampling method is biased if on average the statistics it produces are larger or smaller than the population parameters being estimated. Nonrepresentative refers to the sample itself not looking like a small version of the population. A biased sampling method tends to result in nonrepresentative samples, but not always. Fathom Activity 4.1a: Time in the Hospital 1 4.1: Why Take Samples, and How Not To There are two ways in which you can get biased results: 1. Sample Selection Bias (Sample Bias) Sample selection bias arises from the method used to select the sample. In all types of sample selection bias the method of taking the sample tends to result in statistics that are systematically too high or too low on average. Size Bias: using a method that gives larger units a bigger change of being in the sample. Our activity “Time in the Hospital” is an example of this. Voluntary Response Bias: occurs when sample members are self-selected volunteers, as in voluntary samples. An example would be call-in radio shows that solicit audience participation in surveys on controversial topics (abortion, affirmative action, gun control, etc.). The resulting sample tends to overrepresent individuals who have strong opinions. Convenience Sampling: one in which the units chosen are those that are easy to include. Example: Interviewing people at a shopping mall as they walk by. This is easy because you just choose the people that walk by you as you’re standing there without any random mechanism. Sometimes the people could ignore you so it all depends on what you are surveying. Judgment Sampling: the sample is selected based on judgment. This is usually an extension of convenience sampling. For example, a researcher may decide to draw the entire sample from one "representative" city, even though the population includes all cities. When using this method, the researcher must be confident that the chosen sample is truly representative of the entire population. Inadequate Sampling Frame: the sampling frame (list of the population) is not a complete list of the population you want to sample from. 2. Response Bias Even well chosen samples can result in contaminated data. Response bias arises from the method of collecting the response rather than from the method of collecting the sample itself. The main sources of this bias are: Nonresponse bias: caused by people not responding to a survey, and these people may differ from those who do respond. For example, a sample survey on attitudes toward parenting may miss all working mothers selected for the sample if the people are contacted during working hours. Questionnaire bias: arises from how you ask the questions (tone of voice, appearance of interviewer, order of questions, etc.) and is often the result of leading questions, such as “Do you favor the new tax proposal?” rather than the more balanced “Do you favor or oppose the new tax proposal?” (People often say “yes” rather than offer a true opinion.) Incorrect response: results from intentional lying or inaccurate measuring devices. It often occurs in situations that ask for recall, such as “How many times did you eat vegetables this week?” (People often guess on the more acceptable high side.) 2 4.1: Why Take Samples, and How Not To