Chapter 12 Notes Chapter 11 Part I Answers Chapter 11 Part II Answers Chapter 12 Sampling Surveys How do we gather data? • • • • Surveys Opinion polls Interviews Studies – Observational • Retrospective (past) • Prospective (future) • Experiments 1. Population • the entire group of individuals that we want information about 2. Census • Gathering data involving the entire population 3. Why would we not use a census all the time? 1) 2) 3) 4) • • • Not accurate Very expensive Look at the U.S. census – it Perhaps impossible has a huge amount you of error If using destructive sampling, wouldin Since census ofknow any Suppose you it; taking plus it awanted takes a to long to destroypopulation thethe population takesweight time, censuses average of the compile the data making the Breaking strength of soda bottles are VERY costly to do! white-tail deer population data obsolete by the timeinwe Lifetime of flashlight batteries Texas – would it be get it! feasible to Safety ratings for cars do a census? 4. Sample • A part of the population that we actually examine in order to gather information • Use sample to generalize the population Why Do We Sample Anyway? HOW GOOD ARE CURRENT INSPECTION SYSTEMS Printed below is a story which can be used to demonstrate the effectiveness of 100% inspection. Assume that the letter “G” or “g” is a defective product caused by the Gremlin, and that you are the inspector. Allow yourself about 3 minutes to count all the G’s or g’s. Place your total in the box at the bottom of the story. 3 Minutes Total number found: 2 Minutes Total number found: 1 Minutes Total number found: 15 Seconds Total number found: TIME! Total number found: Why do we sample? Take a Census of all the “G” or “g” which appear in the story. The actual count is… 83 5. Sampling design • refers to the method used to choose the sample from the population 6. Sampling frame • a list of every individual in the population 7. Simple Random Sample (SRS) • consist of n individuals from the population chosen in such a way that –every individual has an equal chance of being selected 7. Simple Random Suppose we were to take an SRS of 50 SHS students – put each students’ Sample (SRS) name in a hat. Then randomly select • consist of n individuals from the 50 names from the hat. Each student has the same chancea to be population chosen in such way selected! that –every individual has an equal chance of being selected 7. Simple Random Sample (SRS) • consist of n individuals from the population chosen in such a way that –every individual has an equal chance of being selected –every set of n individuals has an equal chance of being selected 7. Simple Random Sample (SRS) Not only does each student have the • same consist of ntoindividuals the chance be selected –from but every possible groupchosen of 50 students the population in such has a way same chance to be selected! that Therefore, it has to be possible for all 50 students to be seniors in order for –every individual has an equal it to be an SRS! chance of being selected –every set of n individuals has an equal chance of being selected Jelly Blubber Activity • Marine biologist have just discovered a new variety of jellyfish called the Jelly Blubber • We are to study a colony of Jelly Blubbers and determine their average length (measured horizontally in centimeters) Jelly Blubber Activity • You will have 5 seconds to choose a Jelly Blubber that you think is of average length and then measure its length and report your results • Why is this not an appropriate sampling method? Jelly Blubber Activity • This time you are to choose 5 Jelly Blubbers which are a representative sample of the colony. Measure each blubber and calculate the mean length. • Why is this not an appropriate sampling method? Jelly Blubber Activity • Now take a Simple Random Sample (SRS) of 5 blubblers by generating 5 random numbers of 1 – 100. • Measure each of the 5 random blubbers and find the mean length for your SRS. What are the results of a census of the JellyBlubbers colony? Average size of all 100 members of the colony is 18.64cm. Mean 18.64cm; Standard Deviation 13.08cm; Median 13cm; IQR = 23.5cm Jelly Blubber Activity • Now take a Simple Random Sample (SRS) of 5 blubblers by generating 5 random numbers of 1 – 100. • Measure each of the 5 random blubbers and find the mean length for your SRS. • Why is this sampling method better than just selecting 5 on your own? 7. Simple Random Sample (SRS) • consist of n individuals from the population chosen in such a way that –every individual has an equal chance of being selected –every set of n individuals has an equal chance of being selected 8. Stratified random sample • population is divided into homogeneous groups called strata 8. Stratified random sample Homogeneous groups are groups that are alike based upon some characteristic of the group members. • population is divided into homogeneous groups called strata • SRS’s are pulled from each strata 8. Stratified random sample Suppose we were to take a stratified random sample of 50 SHS students. Since students are already divided by grade level, grade level can be our strata. Then randomly select a some seniors, juniors, sophomores, and freshman. How many depends of the proportion of the population. • population is divided into homogeneous groups called strata • SRS’s are pulled from each strata 8. Stratified random sample If a high school is 20% Senior, 20% Junior, 30% Sophomore, & 30% Freshman, then a 50 student sample should include... 10 Seniors, 10 Juniors, 15 Sophomores, and 15 Freshman. (Use SRS for each strata.) 8. Stratified random sample ON YOUR OWN: If a high school is 10% Senior, 20% Junior, 40% Sophomore, & 30% Freshman, then a 30 student sample should include... 9. Systematic random sample • select sample by following a systematic approach • randomly select where to begin Systematic random sample Suppose we want to do a systematic random sample of SHS students - number a list of students (There are approximately 2000 students – if we want a sample of 50, 2000/50 = 40) • select sample by Select a number between 1 and 40 at random. That student will be the first following a systematic student chosen, then choose every 40 approach student from there. • randomly select where to begin th 9. Suppose we want to do a systematic random sample of SHS students - NUMBER a list of students (sampling frame) - CALCULATE grouping size: 2000 students – need sample of 50, so 2000/50 = 40 9. Suppose we want to do a systematic random sample of SHS students - CALCULATE grouping size: 2000 students – need sample of 50, so 2000/50 = 40 - USE the grouping size: Select a number between 1 and 40 at random. That student will be the first student chosen, then choose every 40th student from there. Systematic random sample What if it doesn’t work evenly? Say there are 2011 students. 2011/50 = 40 r. 11 Your starting place will be chosen by randomly selecting a number between 1 & 51 instead of 1 & 40. From there choose every 40th student from your sample frame. 9. Systematic random sample ON YOUR OWN: You want to gather a sample from 1505 students systematically. Your sample size needs to be 30. What do you do? 9. 10. Cluster Sample • based upon heterogeneous groups which are representative of the population • randomly pick a cluster or clusters • Take an SRS of that cluster(s) 10. • Cluster Sample Suppose we want to do a cluster sample of SHS students. One way to do this would be to randomly select classrooms during based upon heterogeneous group2nd period. Perform a SRS of the students in which is representative those rooms! of the population • randomly pick a cluster or clusters • Take an SRS of that cluster(s) 11. Multistage sample • select successively smaller groups within the population in stages • SRS used at each stage 11. Multistage sample To use a multistage approach to sampling SHS students, we could first divide 2nd period classes by level (AP, Honors, Regular, etc.) and randomly select 4 second period classes from each group. Then we could randomly select 5 students from each of those classes. The selection process is done in stages! • select successively smaller groups within the population in stages • SRS used at each stage 12. Identify the sampling design a)The Educational Testing Service (ETS) needed a sample of colleges. ETS first divided all colleges into groups of similar types (small public, small private, etc.) Then they randomly selected 3 colleges from each group. Stratified random sample 12. Identify the sampling design b) A county commissioner wants to survey people in her district to determine their opinions on a particular law up for adoption. She decides to randomly select blocks in her district and then survey all who live on those blocks. Cluster sampling 12. Identify the sampling design c) A local restaurant manager wants to survey customers about the service they receive. Each night the manager randomly chooses a number between 1 & 10. He then gives a survey to that customer, and to every 10th customer after them, to fill it out before they leave. Systematic random sampling 13. Suppose your population consisted of these 20 people: 1) 6) Fred 11) Kathy 1) Aidan Aidan 2) Bob 7) Gloria 12) Lori 3) Chico Hannah 13) 13) Matthew Matthew We will8)need to use double 4) Dougdigit9)random Israel numbers, 14) Nan 15) Opus 5) Edward 10) Jung 16) Paul 17) Shawnie 18) Tracy 19) Uncle Sam 20) Vernon ignoring any number greater than 20. Start with Rowa sample 1 Use the following random digits to select ofIgnore. five from these people. Ignore. Ignore.Repeat. and read across. Row Stop when five people are selected. So 1 4 5 1 8 0 5 1 3 7 1 my sample would consist of : 2 0 1 5 5 1 8 1 5 7 0 3 8 9 9 3 4 3 5 0 6 3 Aidan, Edward, Matthew, Opus, and Tracy 14. Bias • ERROR Anything that causes the • favorsdata certain outcomes to be wrong! It might be attributed to the researchers, the respondent, or to the sampling method! 15. Sources of Bias • things that can cause bias in your sample • cannot do anything with bad data 16. Voluntary response An example would be the surveys in magazines that ask readers to mail in the survey. Other examples are callin shows, American Idol, etc. • People chose to respond Remember – the way to • Usually only people with determine Remember, the respondentvoluntary selects very strong opinions themselves to participate in the response is: survey! respond • Produces Self-selection bias results 17. Convenience sampling •Ask people who are example would stopping TheAn data obtained by be a convenience friendly-looking people– in the mall to easy ask sample willto be biased however this survey. Another example is the& method is often used for surveys surveys left on tables at restaurants results reported in newspapers and •Produces bias - a convenient magazines!method! results 18. Undercoverage People with unlisted phone numbers – usually high-income families •some groups of People without Suppose you take a phone numbers – sample by randomly population are left usually lowselecting names from income families the phone book – out of the sampling some groups will not have the opportunity process People with ONLY cell of being selected! phones – usually young adults 19. Nonresponse •People occurs when an are chosen the individual researchers, Because of hugebytelemarketing BUT refuse to participate. efforts in the past few years, chosen for the sample One surveys way to help the problem telephone have with a MAJOR self-selected! ofNOT nonresponse is to make can’t be contacted or follow problem with nonresponse! up contact with the people who refuses This is often confused are to not cooperate home with whenvoluntary you first response! contact them. • telephone surveys 70% nonresponse 20. Response bias Suppose we wanted to survey high school students on drug abuse and we used a uniformed police officer to interview each student in our sample – would we get honest answers? • occurs when the anything in the survey design influences the response –The interviewer can be cause –The survey’s wording •Wording must be nuetral 21. Source of Bias? a) Before the presidential election of 1936, FDR against Republican ALF Landon, the magazine Literary Digest predicting Landon winning the election in a 3-to-2 victory. A survey of 10 million people. George Gallup surveyed only 50,000 people and predicted that survey Undercoverage – since the Digest’s Roosevelt win. Theetc., Digest’s comes fromwould car owners, the survey people came from magazine car selected were mostly subscribers, from high-income owners, and telephone directories, etc. families thus mostly Republican! (other answers are possible) 21. b) Suppose that you want to estimate the total amount of money spent by students on textbooks each semester at Rice. You collect register Convenience sampling – easy way to collect data receipts for students as they or leave the bookstore during Undercoverage – students who buy lunch booksone fromday. on-line bookstores are included. 21. c) To find the average value of a home in Friendswood, one averages the price of homes that are listed for sale with a Undercoverage – leaves out homes realtor. that are not for sale or homes that are listed with different realtors. (other answers are possible) 22. Page 289 #2 A question posted on the Lycos Web site on 18 June 2000 asked visitors to the site to say whether they thought that marijuana should be legally available for medicinal purposes. 22. Page 289 #2 A question posted on the Lycos Web site on 18 June 2000 asked visitors to the site to say whether they thought that marijuana should be legally available for medicinal purposes. Identify the following items (if possible). If you can’t tell, then say so – this often happens when we read about a survey. a) The population all U.S. adults 22. Page 289 #2 A question posted on the Lycos Web site on 18 June 2000 asked visitors to the site to say whether they thought that marijuana should be legally available for medicinal purposes. Identify the following items (if possible). If you can’t tell, then say so – this often happens when we read about a survey. b) The population parameter of interest proportion that feels marijuana should be legalized for medicinal purposes 22. Page 289 #2 A question posted on the Lycos Web site on 18 June 2000 asked visitors to the site to say whether they thought that marijuana should be legally available for medicinal purposes. Identify the following items (if possible). If you can’t tell, then say so – this often happens when we read about a survey. c) The sampling frame none given –potentially all people with access to web site 22. Page 289 #2 A question posted on the Lycos Web site on 18 June 2000 asked visitors to the site to say whether they thought that marijuana should be legally available for medicinal purposes. Identify the following items (if possible). If you can’t tell, then say so – this often happens when we read about a survey. d) The sample those visiting the web site who responded 22. Page 289 #2 A question posted on the Lycos Web site on 18 June 2000 asked visitors to the site to say whether they thought that marijuana should be legally available for medicinal purposes. Identify the following items (if possible). If you can’t tell, then say so – this often happens when we read about a survey. e) The sampling method, including whether or not randomization was employed voluntary response (no randomization employed) 22. Page 289 #2 A question posted on the Lycos Web site on 18 June 2000 asked visitors to the site to say whether they thought that marijuana should be legally available for medicinal purposes. Identify the following items (if possible). If you can’t tell, then say so – this often happens when we read about a survey. f) Any potential sources of bias you can detect and any problems you see in generalizing to the population of interest voluntary response (no randomization employed) Random Rectangles Random Rectangles Population Parameter 𝝁 = 7.5 Chapter 11 Part I 5&6. Explain why each of the following simulations fails to model the real situation properly. a) Use a random integer 0 through 9 to represent the number of heads that appear when 9 coins are tossed. Chapter 11 Part I 5&6. Explain why each of the following simulations fails to model the real situation properly. b) A basketball player takes a foul shot. Look at a random digit, using an odd digit to represent a good shot and an even digit to represent a miss. Chapter 11 Part I 5&6. Explain why each of the following simulations fails to model the real situation properly. c) Use five random digits from 1 through 13 to represent the denominations of the cards in a poker hand. Chapter 11 Part I 5&6. Explain why each of the following simulations fails to model the real situation properly. d) Use random numbers 2 through 12 to represent the sum of the faces when two dice are rolled Chapter 11 Part I 5&6. Explain why each of the following simulations fails to model the real situation properly. e) Use a random integer 0 through 5 to represent the number of boys in a family of 5 children. Chapter 11 Part I 5&6. Explain why each of the following simulations fails to model the real situation properly. f) Simulate a baseball player’s performance at bat by letting 0 = an out, 1 = a single, 2 = a double, 3 = a triple, and 4 = a home run. Chapter 11 Part I 9. You’re pretty sure that your candidate for class president has about 55% of the votes in the entire school. But you’re worried that only 100 students will show up to vote. How often will the underdog (the one with 45% support) win? To find out you set up a simulation. a)Describe how you will simulate a component and its outcomes. Chapter 11 Part I 9. You’re pretty sure that your candidate for class president has about 55% of the votes in the entire school. But you’re worried that only 100 students will show up to vote. How often will the underdog (the one with 45% support) win? To find out you set up a simulation. b)Describe how you will simulate a trial. Chapter 11 Part I 9. You’re pretty sure that your candidate for class president has about 55% of the votes in the entire school. But you’re worried that only 100 students will show up to vote. How often will the underdog (the one with 45% support) win? To find out you set up a simulation. c)Describe the response variable. Chapter 11 Part I 10. When drawing five cards randomly from a deck, which is more likely, two pairs or three of a kind? A pair is exactly two of the same denomination. (Don’t count three 8’s as a pair – that’s 3 of a kind. And don’t count 4 of the same kind as two pair- that’s four of a kind, a very special hand.) How could you simulate 5-card hands? Be careful; once you’ve picked the 8 of spades for a hand, you can’t get it again until the next hand. a) Describe how you will simulate a component and its outcomes. Chapter 11 Part I 10. When drawing five cards randomly from a deck, which is more likely, two pairs or three of a kind? A pair is exactly two of the same denomination. (Don’t count three 8’s as a pair – that’s 3 of a kind. And don’t count 4 of the same kind as two pair- that’s four of a kind, a very special hand.) How could you simulate 5-card hands? Be careful; once you’ve picked the 8 of spades for a hand, you can’t get it again until the next hand. b)Describe how you will simulate a trial. Chapter 11 Part I 10. When drawing five cards randomly from a deck, which is more likely, two pairs or three of a kind? A pair is exactly two of the same denomination. (Don’t count three 8’s as a pair – that’s 3 of a kind. And don’t count 4 of the same kind as two pair- that’s four of a kind, a very special hand.) How could you simulate 5-card hands? Be careful; once you’ve picked the 8 of spades for a hand, you can’t get it again until the next hand. c) Describe the response variable. Chapter 11 Part I 11. Suppose a cereal manufacturer puts pictures of famous athletes on cards in boxes of cereal in the hope of boosting sales. The manufacturer announces that 20% of the boxes contain a picture of Tiger Woods, 30% a picture of Lance Armstrong, and the rest a picture of Serena Williams. Suppose you buy five boxes of cereal. Estimate the probability that you end up with a complete set of the pictures. Your simulation should use at least 10 runs. A component is… checking one box of cereal for the picture inside. Chapter 11 Part I 11. Suppose a cereal manufacturer puts pictures of famous athletes on cards in boxes of cereal in the hope of boosting sales. The manufacturer announces that 20% of the boxes contain a picture of Tiger Woods, 30% a picture of Lance Armstrong, and the rest a picture of Serena Williams. Suppose you buy five boxes of cereal. Estimate the probability that you end up with a complete set of the pictures. Your simulation should use at least 10 runs. I’ll look at a one-digit random number. Let 0-1 represent a box with… Tiger Woods Let 2-4 represent a box with… Lance Armstrong Let 5-9 represent a box with… Serena Williams Chapter 11 Part I 11. Suppose a cereal manufacturer puts pictures of famous athletes on cards in boxes of cereal in the hope of boosting sales. The manufacturer announces that 20% of the boxes contain a picture of Tiger Woods, 30% a picture of Lance Armstrong, and the rest a picture of Serena Williams. Suppose you buy five boxes of cereal. Estimate the probability that you end up with a complete set of the pictures. Your simulation should use at least 10 runs. Each trial consists of… check 5 boxes which is represented by 5 digits The response variable is… whether or not the 5 boxes had at least one of each athlete (a.k.a. “a complete set”) Chapter 11 Part I 11. Suppose a cereal manufacturer puts pictures of famous athletes on cards in boxes of cereal in the hope of boosting sales. The manufacturer announces that 20% of the boxes contain a picture of Tiger Woods, 30% a picture of Lance Armstrong, and the rest a picture of Serena Williams. Suppose you buy five boxes of cereal. Estimate the probability that you end up with a complete set of the pictures. Your simulation should use at least 10 runs. Conclusion: According to our simulation the probability that you end up with a complete set of pictures after checking 5 boxes is _________. However, it should be noted that only 10 trials were run. Chapter 11 Part I #1: TABLE OF RANDOM NUMBERS 78545 49201 05329 14182 10971 90472 44682 39304 19819 55799 Trial # 1 2 3 4 5 6 7 8 9 10 Complete Set? 7-W OUTCOMES 8-W 5-W 4-A 5-W NO 4-A 0-T 1-T 1-T 9-W 4-A 3-A 1-T 5-W 9-W 5-W 4-A 0-T 0-T 4-A 9-W 9-W 5-W 1-T 9-W 2-A 2-A 2-T 2-A 4-A 9-W 9-W YES YES YES YES YES NO YES NO NO 2-A 3-A 1-T 9-W 4-A 6-W 3-A 8-W 7-W 0-T 2-A 8-W 7-W 7-W 8-W 0-T 1-T 9-W 60% CHANCE OF COMPLETE SET Chapter 11 Part I #2: TABLE OF RANDOM NUMBERS 72749 13347 65030 26128 49067 27904 49953 74674 94617 13317 Trial # 1 2 3 4 5 6 7 8 9 10 7-W 1-T 6-W 2-A 4-A 2-A 4-A 7-W 9-W 1-T 2-A 3-A 5-W 6-W 9-W 7-W 9-W 4-A 4-A 3-A OUTCOMES 7-W 4-A 3-A 4-A 0-T 3-A 1-T 2-A 0-T 6-W 9-W 0-T 9-W 5-W 6-W 7-W 6-W 1-T 3-A 1-T 9-W 7-W 0-T 8-W 7-W 4-A 3-A 4-A 7-W 7-W 70% CHANCE OF COMPLETE SET Complete Set? NO YES YES YES YES YES NO NO YES YES Chapter 11 Part I #3: TABLE OF RANDOM NUMBERS 11071 44430 94664 91294 35163 05494 32882 23904 41340 61185 Trial # 1 2 3 4 5 6 7 8 9 10 1-T 4-A 9-W 9-W 3-A 0-T 3-A 2-A 4-A 6-W OUTCOMES 1-T 0-T 4-A 4-A 4-A 6-W 1-T 2-A 5-W 1-T 5-W 4-A 2-A 8-W 3-A 9-W 1-T 3-A 1-T 1-T 7-W 3-A 6-W 9-W 6-W 9-W 8-W 0-T 4-A 8-W 1-T 0-T 4-A 4-A 3-A 4-A 2-A 4-A 0-T 5-W 40% CHANCE OF COMPLETE SET Complete Set? NO NO NO YES YES YES NO YES NO NO Chapter 11 Part I #4: TABLE OF RANDOM NUMBERS 42831 95113 43511 42082 15140 34733 68076 18292 69486 80468 Trial # 1 2 3 4 5 6 7 8 9 10 4-A 9-W 4-A 4-A 1-T 3-A 6-W 1-T 6-W 8-W 2-A 5-W 3-A 2-A 5-W 4-A 8-W 8-W 9-W 0-T OUTCOMES 8-W 1-T 5-W 0-T 1-T 7-W 0-T 2-A 4-A 4-A 3-A 1-T 1-T 8-W 4-A 3-A 7-W 9-W 8-W 6-W 1-T 3-A 1-T 2-A 0-T 3-A 6-W 2-A 6-W 8-W 70% CHANCE OF COMPLETE SET Complete Set? YES YES YES YES YES NO NO YES NO YES Chapter 11 Part II 12. Suppose a cereal manufacturer puts pictures of famous athletes on cards in boxes of cereal in the hope of boosting sales. The manufacturer announces that 20% of the boxes contain a picture of Tiger Woods, 30% a picture of Lance Armstrong, and the rest a picture of Serena Williams. Suppose you really want the Tiger Woods picture. How many boxes of cereal do you need to buy to be pretty sure of getting at least one? Your simulation should use at least 10 runs. Chapter 11 Part II 14. A friend of yours got all 6 questions right on a multiple choice quiz, but now claims to have guessed blindly on every question. If each question offered 4 possible answers, do you believe her? Explain, basing your argument on a simulation involving 10 runs. (Make sure that you remember to define your simulation first. That means give the component, outcomes, trial, and response variable first. Then run 10 trials, analyze your response variable, and write your conclusion.) Use the following table for your simulation. Chapter 11 Part II 19. You are about to take the road test for your driver’s license. You hear that only 34% of candidates pass the test the first time, but the percentage rises to 72% on subsequent retests. Estimate the average number of test drivers take in order to get a license. Your simulation should use 10 runs. Chapter 11 Part II 25. Many couples want to have both a boy and a girl. If they decide to continue to have children until they have one child of each gender, what would the average family size be? Assume that boys and girls are equally likely. (Make sure that you remember to define your simulation first. That means give the component, outcomes, trial, and response variable first. Then run 10 trials, analyze your response variable, and write your conclusion.) Use the following table for your simulation.