AP STATISTICS Ch. 12: Sample Surveys Vocabulary: POPULATION- All experimental units that you want to make a conclusion about does not necessarily have to be a large group SAMPLING FRAME- list of individuals from whom the sample is drawn. Not always the population of interest. Ex: phone book, registered voter list, list of tax returns, school roster, etc. SAMPLE – small group of the population that you use in your experiment/study/survey. Hopefully representative of the population PARAMETER- describes a population. Often unknown. Fixed value. Ex: population mean STATISTIC – describes a sample of the population. Changes from sample to sample. We use the statistics from repeated samples to estimate the value of the parameter. Ex: sample mean VALUE PARAMETER STATISTIC Mean μ x ________________________________________________ Std. Dev. σ sx ________________________________________________ p̂ Proportion p ________________________________________________ A sample is said to be representative (or unbiased) if the statistics accurately reflect the population parameter. EXAMPLE 1: A polling agency takes a sample of 1500 American citizens from a list of tax returns and asks them if they are lactose intolerant. 12% say yes. This is interesting, since it has been shown that 15% of the population is lactose intolerant. 12% = _statistic_______ 15% = _parameter_________ Population? Sampling frame? Sample? Population: All American citizens Sampling Frame: List of tax returns Sample: 1500 Amercian citizens Parameter of Interest: The percent of American citizens that are lactose intolerant EXAMPLE 2: A random sample of 1000 people who signed a card saying they intended to quit smoking were contacted a year after they signed the card. It turned out that 210 (21%) of the sampled individuals had not smoked over the past six months. 21% = _statistic________ Population = All smokers Sampling frame= People who signed a card saying they intended to quit smoking Sample = 1000 people who signed the card Parameter of interest = Percent of people who had not smoked over the past six months (quit smoking) EXAMPLE 3: On Tuesday, the bottles of tomato ketchup filled in a plant were supposed to contain an average of 14 ounces of ketchup. Quality control inspectors sampled 50 bottles at random from the day’s production. These bottles contained an average of 13.8 ounces of ketchup. 14 = _parameter______ 13.8 = _statistic________ Population? Sample? Sampling frame? Population: All ketchup bottles Sampling Frame: Ketchup bottles from one day’s production Sample: 50 bottles Parameter: The average number of ounces of ketchup in the bottles EXAMPLE 4: A researcher wants to find out which of two pain relievers works better. He takes 100 volunteers and randomly gives half of them medicine #1 and the other half medicine #2. 17% of people taking medicine 1 report improvement in their pain and 20% of people taking medicine #2 report improvement in their pain. 17% = _statistic_____ 20% = __statistic_____ Population? Sampling frame? Sample? Population: All people Sampling Frame: All Volunteers Smaple: 100 volunteers, 50 taking medicine #1 and 50 taking medicine #2 Parameter: Percent of people who showed improvement in their pain BIAS VS. VARIABILITY BIAS – consistent, repeated measurements that are not close to the population parameter. Basically accuracy. VARIABILITY - basically like reliability. Consistent measurements (doesn’t matter if they are accurate or not.) To reduce bias… use random sampling To reduce variability … use larger samples SAMPLING VARIABILITY – Different samples give different results (even if they are from the same population) Different size samples give us different results Bigger samples are better! SAMPLING DISTRIBUTION- If we take lots of samples of the same size and make a histogram True parameter Larger samples yield smaller variability: Lots of samples sizes of 100: True parameter Lots of samples sizes of 1000: True parameter Bias vs. Variability: True parameter True Parameter high bias, low variability low bias, high variability True parameter high bias, high variability True parameter low bias, low variability BIAS VS. VARIABILITY EXAMPLES: *bias accuracy * variability reliability high bias, low variability low bias, high variability high bias, high variability low bias, low variability UNBIASED ESTIMATOR- When the center of a sampling distribution (histogram) is equal to the true parameter SAMPLING DESIGNS GOOD SAMPLING DESIGNS 1) Simple Random Sample (SRS)- Every experimental unit has the same chance of being picked for the sample and every possible sample has the same chance of being picked. - give every subject in the population a number use TRD and read across to select your sample ignore repeats Example: Take and SRS of 5 from the following list. Start at line 31 in the table. 01Smith 07Jones 13Holloway 02DeNizzo 08David 14Adams 03Schaefer 09Gray 15Capito Sample: 04Meyers 10Gingrich 16Card 05Dietrich 11Moreland 17Hall 06Walsh 12Whitter 18Jordan Example: Take and SRS of 4 from the following list of math teachers. Start at line 18. 01McGlone 02Szarko 03Stotler 04Timmins 05Gemgnani 06Lorenz 07McCuen 08Bellavance 09Kelly 10Arden 11O’Brien 12Lake 13Wilson 14Woodring 15Wheeles 16McNelis 17Robinson 18Bainbridge Sample: 2) STRATIFIED RANDOM SAMPLE (not SRS) Divide population into groups with something in common (called STRATA) o Example: gender, age, etc. Take separate SRS in each strata and combine these to make the full sample o Can sometimes be a % of each strata Example: We want to take an accurate sample of CB South students. There are 540 sophomores, 585 juniors, and 530 seniors. Take a stratified sample. 3) SYSTEMATIC RANDOM SAMPLEThe first experimental unit is selected at random, and each additional experimental unit is selected at a predetermined interval. Examples: Surveying every 5th person that comes through the back door at CB South Selecting a random person and then every 10th person on a list 4) CLUSTER SAMPLE – Population is broken down into groups. All members of one (or more) group are taken as the sample. The CB South population is broken down into sophomores, juniors, and seniors. We then sample ALL of the members in one grade level. 5) MULTI-STAGE SAMPLE Used for large populations Example: sampling the population of the USA: -Counties (pick 50) - zip codes - two streets - 3 houses CB South Example: - Block - 10 classes - 5 students BIASED SAMPLING METHODS: 1) VOLUNTARY RESPONSE SAMPLESChooses itself by responding to a general appeal. Ex: call-in, write-in, etc. 2) CONVENIENCE SAMPLESSelecting individuals that are easiest to reach/contact TYPES OF BIAS IN A SAMPLE: UNDERCOVERAGESampling in a way that leaves out a certain portion of the population that should be in your sample. Ex: telephone polls, registered voter list, etc. NONRESPONSEBias introduced when a large amount of those sampled do not respond. (You were selected, and you chose not to respond) Ex: Don’t answer the phone or hang up, don’t mail back a questionnaire, refuse to answer questions RESPONSE BIAS Anything in the survey design that influences responses Ex: respondents lying responses to try to please the interviewer unwillingness to reveal personal facts or information leading or confusing questions Voluntary Response BiasSome people are eager to volunteer when they have a strong opinion on the matter