CHAPTER 8 –PRODUCING DATA - SAMPLING TOPICS COVERED - Sections shown with numbers as in e-book Any topic listed on this document and not covered in class must be studied “On Your Own” (OYO) Section 8.1 – POPULATION VERSUS SAMPLE (pg. 202) Population Sample Inference: we look at samples to discover characteristics of populations Sampling design o Need a representative sample – planning a sample survey is very important! What population we want to describe? What we want to measure? (variables) Design the method you are going to use for sampling Section 8.2 – HOW TO SAMPLE BADLY (pg. 204) Biased samples (Bias: systematically favors certain outcomes - Generally higher or generally lower than the corresponding population value). It does not measure what we want to measure. It does not make sense to use results of a biased sample to perform inference Examples of bad samples. All of the listed below are non-probability samples o Convenient sample o Voluntary response o Mall-intercept (Over-represents middle class, retired, teenagers. Under-represents the poor, homeless, people in hospitals, etc) o Quotas: Interviewers select individuals to fill quotas; for example: 50% female and 50% male 25% of each: white, Hispanics, African Americas, others Non-probability samples: The probability of selecting the individuals can’t be determined. The interviewer uses his or her preference in selecting individuals Section 8.3 – SIMPLE RANDOM SAMPLES (pg. 205) Simple random sample – probability samples o Use the applet on page 206 on the e-book o Use table B to select a simple random sample o Use the calculator (MATH; arrow right to PRB, select 5:randInt) Some bias may occur in probability samples but it is less than in non-probability samples Section 8.4 – INFERENCE ABOUT POPULATION (pg. 209) Inference Margin of error Section 8.5 – OTHER SAMPLING DESIGNS (pg. 210) – Listen to STATS PORTAL Simple random sample (population consists of individuals that are not grouped) Stratified sample (individuals of the population are grouped according to some characteristics – we select randomly from each group) Multistage sample (individuals are in groups, select some groups at random, and then select at random from each group. Section 8.6 – CAUTIONS ABOUT SAMPLE SURVEYS (pg. 212) Sources of bias o Under-coverage Homeless, people who are always busy, who are in hospitals, motels In a land-line phone survey, people who have only cell phones o Non-response Big cities and minorities have a large non-response rate 19990 Census: 1.15% non-response; 2000-Census: 1.8% non-response. o Response: Respondent bias. - Did you cheat in exams? - Did you vote? Some people lie when asked about illegal or unpopular behavior - We may answer the way we think the interviewer wants us to answer instead of according to our beliefs. - Woman interviewing a man on his attitudes to domestic violence or feminism related topics Questions about the past (non-remembering clearly) Race and gender of interviewer may influence the way we answer certain questions Interviewer bias: Interviewer influences the response in a systematic way (on purpose, subconsciously or by ignorance). Gives subtle clues with body language or tone of voice. There may be prejudice on the part of the interviewer. It’s very important to train very well the interviewers o Question wording Confusing, ambiguous, loaded (use of double negatives) questions o Order of questions (see example 8.9 in book) Training and close supervision to avoid variation among interviewers is critical for preventing question wording bias. Section 8.7 – THE IMPACT OF TECHNOLOGY (pg. 214) Problems with random digit dialing (RDD) surveys o Number of households with cell-phone only is increasing o Caller ID screening SUMMARY A sample survey selects a sample from the population of all individuals about which we desire information. We base conclusions about the population on data from the sample. It is important to specify exactly what population you are interested in and what variables you will measure. The design of a sample describes the method used to select the sample from the population. Random sampling designs use chance to select a sample. The basic random sampling design is a simple random sample (SRS). An SRS gives every possible sample of a given size the same chance to be chosen. Choose an SRS by labeling the members of the population and using random digits to select the sample. Software can automate this process. To choose a stratified random sample, classify the population into strata, groups of individuals that are similar in some way that is important to the response. Then choose a separate SRS from each stratum. Failure to use random sampling often results in bias, or systematic errors in the way the sample represents the population. Voluntary response samples, in which the respondents choose themselves, are particularly prone to large bias. In human populations, even random samples can suffer from bias due to undercoverage or nonresponse, from response bias, or from misleading results due to poorly worded questions. Sample surveys must deal expertly with these potential problems in addition to using a random sampling design. Most national sample surveys are carried out by telephone,using random digit dialing to choose residential telephone numbers at random. Call screening is increasing nonresponse to such surveys, and the rise of cellphone-only households is increasing undercoverage.