Ch 4 - Designing Studies I can identify the population and sample in a survey. Population - the entire group of individuals about which we want information. Sample - the part of the population from which we actually collect information. Sample Survey 1st - determine what population we want to describe 2nd - determine exactly what we want to measure (define our variables) The student government at a high school surveys 100 of the students at a the school to get their opinions about a change to the bell schedule. What’s the population? Sample? What was being studied? An archaeological dig turns up large numbers of pottery shards, broken stone tools, and other artifacts. Students working on the project classify each artifact and assign it a number. The counts in different categories are important for understanding the site, so the project director chooses 2% of the artifacts at random and checks the students’ work. Identify the population and sample. I can understand two types of bias in sampling. Bias - when the design on a study will favor certain outcomes Convenience Sample - choosing individuals who are easiest to reach Voluntary Response - when the sample chooses themselves by responding to a general appeal. Why do each lead to bias? Convenience Bias unrepresentative of the entire population because answer will be influence by where you are. i.e. if you are surveying how people feel about the library tax and only ask people who are at the library Voluntary Response Bias only people with strongly opinions (in either direction) will respond. people can respond more than once i.e. call-ins, write-ints, internet voting When you identify the bias - also state IN WHICH DIRECTION! You are on the staff of a member of Congress who is considering a bill that would provide governmentsponsored insurance for nursing-home care. You report that 1128 letters have been received on the issue, of which 871 oppose the legislation. “I’m surprised that most of my constituents oppose the bill. I thought it would be quite popular,” says the congresswoman. Are you convinced that a majority of the voters oppose the bill? How would you explain the statistical issue to the congresswoman? In June 2008, Parade magazine posed the following question: “Should drivers be banned from using all cell phones?” Readers were encouraged to vote online at parade.com. The July 13, 2008 issue reported 2407 (85%) said “Yes” and 410 (15%) said “No.” a) What type of sample did Parade survey obtain? b) Explain why this is biased and is 85% too high or too low? Why? HW: Day 1 on the outline So, what’s a good method? SRS - simple random sample - n individuals chosen from a population in such a way that every set of n individuals has an equal chance to be in the sample actually selected. Random Rectangle Activity HW: day 2 on outline How can I select a SRS of 4 students from this class? Ideas: put all names in a hat, on equally sized slips of paper and select 4 of them Assign everyone a number and use a Random Digit Table (Table D) to select the four people How to use Table D 1.assign every individual in the population a digit. 2.the number of digits have to equal the number of digits in the population 3.start with 0 (or 00 or 000...) 4.decide what to do if you get a repeated digit or a digit not in the range you need 5.pick a line to start at and read consecutive groups of digits to select your sample Day 3 Other Sampling Methods Stratified Random Sample: when the population is grouped based on a similarity. Cluster Sample: when the population is divided into smaller groups that mirror the characteristics of the population. Other Sampling Methods Stratified Random Sample Cluster Sample • first, divide the population into smaller groups (mirror the population) next, choose a separate SRS from each stratum • next, choose an SRS of the clusters combine all SRSs to form the full sample • all individuals in each cluster are included in the sample groups are homogeneous - like the math class you’re in groups are heterogenous - like CLC groups first, classify population into similar groups (strata) A manager of a beach-front hotel wants to survey guests in the hotel to estimate overall customer satisfaction. The hotel has two towers, an older one to the south and a newer one to the north. Each tower has 10 floors of standard rooms (40 rooms per floor) and 2 floors of suites (20 suites per floor). Half of the rooms in each tower face the beach, while the other half of the rooms face the street. There are a total of 880 rooms. a) Explain how to select a simple random sample of 88 rooms. b) Explain how to select a stratified random sample of rooms. c) Explain how to select a cluster of rooms. d) Explain why selecting 2 of the 24 different floors would not be a good way to obtain a cluster sample. Advantages & Disadvantages SRS Advantage Disadvantage simple to carry out chance of over- or underrepresenting in the sample each individual in the population has the same chance to be selected no chance of over- or under-representing Stratified each individual in the population still has the same chance of being selected can be convenient when groups are already “created” Clustering each individual in the population still has the same chance of being selected a little more complicated to execute chance of over- or underrepresenting in the sample A Sample Free Response In response to nutrition concerns raised last year about food served in school cafeterias, the Smallville School District entered into a one-year contract with the Healthy Alternative Meals (HAM) company. Under this contract, the company plans and prepares meals for 2,500 elementary, middle, and high school students, with a focus on good nutrition. The school administration would like to survey the students in the district to estimate the proportion of students who are satisfied with the food under this contract. Two sampling plans for selecting the students to be surveyed are under consideration by the administration. One plan is to take a simple random sample of students in the district and then survey those students. The other plan is to take a stratified random sample of students in the district and then survey those students. (a) Describe a simple random sampling procedure that the administrators could use to select 200 students from the 2,500 students in the district. (b) If a stratified random sampling procedure is used, give one example of an effective variable on which to stratify in this survey. Explain your reasoning. (c) Describe one statistical advantage of using a stratified random sample over a simple random sample in the context of this study. Answers to part A Answers to part B Answers to part C What type of sampling is this? At a party there are 30 students over age 21 and 20 students under age 21. You choose at random 3 of those over 21 and separately at random 2 of those under 21 to interview about attitudes towards alcohol. You have given every student at the party the same chance to be interviewed. What is that chance? What type of sampling procedure was this? HINT: an SRS will allow for a sample to have all of a certain “group” or none of a “groups One the west side of Rocky Mountain National Park, many mature pine trees are dying due to infestation by pine beetles. Scientists would like to use sampling to estimate the proportion of all pine trees in the area that have been infested. Why would an SRS not be practical? Could they just sample the pines along the road? Suppose the sampling was carried out randomly and accurately and 35% of the pine trees sampled were infested. Can they conclude 35% of all pine trees are infested? Day 4 Inference and what can go wrong? Why do we sample? to infer about a population surveying a population takes too much time and money! Can we trust it? • YES - the law of probability allows for random sampling to work! • there are margins of error to account for the variability between the sample and the population. Nothing was wrong with the procedure! What can go wrong? There are different types of bias to cause sampling to go wrong: Sampling Errors Nonsampling Errors Sampling Voluntary Response Convenience Sample Undercoverage (we already know about voluntary and convenience) Nonresponse Response Bias Nonsampling Wording of Question Undercoverage *when some groups in the population are left out of the process when choosing the sample* Example: if you were to go to people’s houses and survey about the unemployment rate - you are leaving out all the homeless people and those who have jobs and are not home. Nonresponse *when an individual chosen for the sample can’t be contacted or refuses to participate* WARNING: the is not voluntary response bias, these individuals were chosen to be in the sample and do not want to be Example: when you make a call at SIRS and they hang up on you or tell you something rude of why they don’t want to participate! Response Bias *when the individual gives the wrong answer* Many factors contribute to this: • people know what the answer should be and give that • what the interviewer looks like • recalling past events Wording of Questions *confusing/leading questions that lead to a certain response* MOST IMPORTANT INFLUENCE ON ANSWERS! Never trust a survey unless you have seen the questions! Examples: Order of the questions, any prompts/cues given before the question Ch 4 Project by yourself or with a partner You will design and conduct an experiment to investigate the effects of response bias in surveys You can choose the topic, but you must design your experiment to answer one of the following questions: 1. 2. 3. 4. Can the wording of a questions create response bias? Do the characteristics of the interviewer create response bias? Does anonymity change the responses to sensitive questions? Does manipulating the answer choices change the response? Ch 4 Project see page 267 for what is required I will hand out a rubric - USE IT! Not only are you going to analyze the survey results, you will analyze if the way the survey was conducted biased the results due: October 31, 2013 (approved by Friday, October 24) start on HW: day 4