Chapter 4 Section 4.1 Check Your Understanding, page 211: 1. The company inspector is using a convenience sample. This could lead him to think that the oranges are of better quality than they really are, if the farmer puts the best oranges on top. 2. Nightline was using a voluntary response sample. Only those who feel particularly strongly about the issue are likely to respond. In this case, those who are happy that the United Nations has its headquarters in the US already have what they want and so are less likely to worry about responding to the question. Check Your Understanding, page 219: 1. It might be difficult to give a survey to an SRS of 200 fans because you would have to identify 200 different seats, go to those seats in the arena and find the people who are sitting there. This means going to 200 different locations throughout the arena, which would take time. There is also the problem that people are not always in their seats throughout the game and not all seats will necessarily be occupied in any given game. 2. For a stratified sample, it would be better to take the lettered rows as the strata because each lettered row is the same distance from the court and so would contain only seats with the same (or nearly the same) ticket price. This means that all people in any given stratum would have paid roughly the same amount for their tickets. 3. For a cluster sample, it would be better to take the numbered sections as the clusters because they include all different seat prices. Each section contains seats with many different ticket prices so the people in a section would mirror the characteristics of the population as a whole. Check Your Understanding, page 224: 1. (a) Using the telephone directory as a sampling frame is an example of a sampling error. This will result in undercoverage because those who are not listed in the phone book (those who do not have a phone or have only a cell phone) do not have the opportunity to be chosen. (b) If the person cannot be contacted, this is an example of a nonsampling error. This did not occur because of the way the sample was chosen, but rather was an effect of the way the survey was administered. (c) If you choose to interview people walking past you on a sidewalk, this is a sampling error. Who you find will depend on where (in what neighborhood, etc.) you are standing. This has to do with how you choose your sample. 2. This question will result in fewer people suggesting that we should ban disposable diapers by making it sound like they are not a problem in the landfill. The author of the question highlights several other items that take up more space in the landfill, which makes it look like disposable diapers are really not a problem. Exercises, page 226: 4.1 The population is (all) local businesses. The sample is the 73 businesses that return the Questionnaire. 4.2 The population is all the artifacts discovered at the dig. The sample is those artifacts (2% of the population) that are chosen for inspection. 4.3 The population is the 1000 envelopes stuffed during a given hour. The sample is the 40 envelopes selected. Chapter 4: Designing Studies 83 4.4 The population is all 45,000 people who made credit card purchases. The sample is the 137 people who returned the survey form. 4.5 Only persons with a strong opinion on the subject, strong enough that they are willing to spend the time and money, will respond to this advertisement. 4.6 Letters to legislators are an example of a voluntary response sample—the proportion of letters opposed to the insurance should not be assumed to be a fair representation of the attitudes of the congresswoman’s constituents. Only those who have very strong opinions will write in. 4.7 This is a voluntary response sample. It is biased in favor of those who feel most strongly about the issue being surveyed. 4.8 (a) A voluntary response sample. (b) It is biased toward readers who feel most strongly about the issue. 85% is probably higher than the true percent because it is likely that readers who feel most strongly about this issue have in some way been involved in an accident caused by cell phone use while driving. 4.9 (a) A convenience sample. (b) It is unlikely that the first 100 students to arrive at school are representative of the student population in general. 7.2 hours is probably higher since you might expect that the students who arrive first are those who got a good night’s sleep the night before. Students who received less sleep the night before are probably more likely to run late the next morning. 4.10 This is a convenience sample. It is easy to find lots of people in a mall. However, it is likely to give a higher percentage for the unemployment figure because the unemployed have more time to be at the mall than those who are employed. 4.11 (a) Number the 40 students from 01 to 40 alphabetically. Go to the random number table and pick a starting point. Record two-digit numbers, skipping any that aren’t between 01 and 40 or are repeats, until you have 5 unique numbers between 01 and 40. (b) Starting at line 107 we read off the following numbers: 82 (ignore) 73 (ignore) 95 (ignore) 78 (ignore) 90 (ignore) 20 80(ignore) 74 (ignore) 75 (ignore) 11 81 (ignore) 67 (ignore) 65 (ignore) 53 (ignore) 00 (ignore) 94 (ignore) 38 31 48 (ignore) 93 (ignore) 60(ignore) 94 (ignore) 07. So we have picked: Johnson (20), Drasin (11), Washburn (38), Rider (31), and Calloway (07). 4.12 (a) Number the 33 complexes from 01 to 33 alphabetically. Go to the random number table and pick a starting point. Record two-digit numbers, skipping any that aren’t between 01 and 33 or are repeats, until you have 3 unique numbers between 01 and 33. (b) Starting at line 117 we read off the following numbers: 38 (ignore) 16 79 (ignore) 85 (ignore) 32 62 (ignore) 18. So we have picked: Fairington (16), Waterford Court (32) and Fowler (18). 4.13 (a) Number the plots from 0001 to 1410. Go to the random number table and pick a starting point. Record four-digit numbers, skipping any that aren’t between 0001 and 1410 or are repeats, until you have 141 unique numbers between 0001 and 1410. (b) Starting at line 131 we read off the following numbers: 0500 7166 3281 1941 4873 0419 7855 7645 1959 6565 6873 2552 5984 2920 8796 4316 5937 3931 6859 7150 4574 0418 (ignore all numbers not in bold). So the first three plots in our sample are plots 0500, 0419 and 0418. 4.14 (a) Number the gravestones from 00001 to 55914. Go to the random number table and pick a starting point. Record 5-digit numbers, skipping any that aren’t between 00001 and 55914 or are repeats, until you have 395 unique numbers between 00001 and 55914. (b) Starting at line 127 we read off the 84 The Practice of Statistics for AP*, 4/e following numbers: 43909 99477 25330 64359 40085 (ignore all numbers not in bold). So the first three gravestones picked are those numbered 43909, 25330 and 40085. 4.15 If you always begin at the same place, then the results would not be random. You would end up using the same sample in every case. 4.16 (a) False—if it were true, then after looking at 39 digits, we would know whether or not the 40th digit was a 0. (b) True—there are 100 pairs of digits 00 through 99, and all are equally likely. (c) False— 0000 is just as likely as any other string of four digits. 4.17 (a) Assuming none of the phones can be shipped until after the inspection, inspecting a random sample of 20 phones could hold up the shipping process. Additionally, in order to obtain a random sample, the phones must be numbered in some way. Keeping track of the ordering of 1000 phones may be difficult. (b) It is possible that the quality of the phones produced changes over the course of the day so that the last phones manufactured are not representative of the day’s production. (c) This is not an SRS because each sample of 20 phones does not have the same probability of being selected. In fact, the 20 phones that are sampled will be the 50th, 100th,…, 1000th, the others have no chance of being sampled. 4.18 (a) To obtain an SRS, every tree would need to have an equal chance of being included in the sample. It is not practical to even identify every tree in the park. (b) This sampling method is biased because these trees are unlikely to be representative of the population. Trees along the main road are more likely to be damaged by cars and people and may be more susceptible to infestation. (c) The scientists can be confident that the actual percentage of pine trees in the area that are infected by the pine beetle is near 35% although there is always some error associated with using sampling to estimate population parameters. 4.19 Assign 01 to 30 to the students (in alphabetical order). Starting on line 123 gives 08-Ghosh, 15Jones, 07-Fisher, and 27-Shaw. Assigning 0–9 to the faculty members gives 1-Besicovitch and 0Andrews. 4.20 Label the 500 midsize accounts from 001 to 500, and the 4400 small accounts from 0001 to 4400. Starting at line 115, the first five accounts in each strata are 417, 494, 322, 247, and 097 for the midsize group, then 3698, 1452, 2605, 2480, and 3716 for the small group. 4.21 (a) Use the three types of seats (sideline, corner and end zone) as the three strata since ticket prices will be similar within each stratum but different between the three strata. (b) It might be easier to obtain a cluster sample because a stratified random sample will still likely choose seats all over the stadium, which would make it very time consuming to get to everyone. A cluster sample would be easier to obtain, because there would be many people sitting all together who would be part of the sample. In this case one would use the section numbers for the clusters. 4.22 (a) Using a stratified random sample would assure the manager that he got opinions from each type of room. Use each type of view as the strata and randomly pick 60 guests from each stratum. (b) A cluster sample would be a simpler option because someone could just slip the forms under a specific pattern of doors. All rooms ending in a specific number would be the clusters. For example, all rooms ending in 16. Presumably these are all stacked above each other on the 30 floors. The manager should just pick two random numbers that represent rooms on the water side and two random numbers that represent rooms on the golf course side and then survey all 30 rooms (one per floor) that end in that number. Chapter 4: Designing Studies 85 4.23 It is not an SRS, because some samples of size 250 have no chance of being selected (e.g., a sample containing 250 women). 4.24 The chance of being interviewed is 3/30 for students over age 21 and 2/20 for students under age 21. This is 1/10 in both cases. It is not an SRS because not all combinations of students have an equal chance of being interviewed. For instance, groups of 5 students all over age 21 have no chance of being interviewed. 4.25 (a) This is cluster sampling. (b) Answers will vary. Label each block from 01 through 65. Beginning at line 142, record two-digit numbers, skipping any that aren’t between 01 and 65 or are repeats. The 5 identified blocks are 02, 32, 26, 34, and 08. The statistical applet selected blocks 10, 20, 45, 36, and 60. 4.26 (a) Split the 200 addresses into 5 groups of 40 each. Looking for 2-digit numbers from 01 to 40, the table gives 35 so the systematic random sample consists of 35, 75, 115, 155, and 195. (b) Every address has a 1-in-40 chance of being selected, but not every subset has an equal chance of being picked—for example, 01, 02, 03, 04, and 05 cannot be selected by this method. 4.27 Households without telephones or with unlisted numbers are omitted from this frame. Such households would likely be made up of poor individuals (who cannot afford a phone), those who choose not to have phones, and those who do not wish to have their phone number published. If the variable being measured tends to have different values for those excluded by this sampling method, then our smaple result would be off in a particular direction from the truth about the population of households. 4.28 This will miss those who do not have telephones. This means that we will be likely underrepresenting the poor in our sample. 4.29 (a) You are sampling only from the lower priced ticket holders. (b) This is a sampling error. The sampling frame differs from the population of interest (undercoverage). 4.30 (a) Nonsampling error. People may lie in response to questions about past drug use. It is not an error due to the act of taking a sample, rather it is a response error. (b) Nonsampling error. This is an example of a processing error. (c) Sampling error. This will suffer from the same forms of bias as any voluntary response survey. 5,029 0.8906 or = 0.1094 so the nonresponse rate is 1 − 0.1094 = 45,956 89.1%. (b) It is likely that the high amount of nonresponse gave the researchers a lower mean number of miles driven because those who drive more are at home less to answer the phone. 4.31 (a) The response rate was 4.32 The higher no-answer rate was probably the second period—when families are likely to be vacationing or spending time outdoors. A high rate of nonresponse makes sample results less reliable because you don’t know how these individuals would have responded. It is very risky to assume that they would have responded exactly the same way as those individuals who did respond. 4.33 More than 171 respondents have run red lights. We would not expect very many people to claim they have run red lights when they have not, but some people will deny running red lights when they have. 86 The Practice of Statistics for AP*, 4/e 4.34 People likely claim to wear their seat belts because they know they should; they are embarrassed or ashamed to say that they do not always wear seat belts. Such bias is likely in most surveys about seat belt use (and similar topics). 4.35 (a) The wording is clear. The question is slanted in favor of warning labels. (b) The question is clear, but it is clearly slanted in favor of national health insurance by asserting it would reduce administrative costs. (c) The wording is too technical for many people to understand—and for those who do understand the question, it is slanted because it suggests reasons why one should support recycling. It could be rewritten to something like: “Do you support economic incentives to promote recycling?” 4.36 (a) The question is clear, but the two options presented are too extreme; no middle position on gun control is allowed. Many students may suggest that this question is likely to elicit more responses against gun control (that is, more people will choose 2). (b) The question is so complicated that it isn’t clear. It is also slanted; the phrasing of this question will tend to make people respond in favor of a nuclear freeze. Only one side of the issue is presented. 4.37 c 4.38 d 4.39 d 4.40 c 4.41 e 4.42 c 4.43 The predicted sleep debt for a 5-day school week, based on the least-squares regression equation, is 2.23 + 3.17(5) = 18.08 hours, a little more than 3 hours greater than what was found in the research study. Based on their collected data, the students have reason to be skeptical of the research study’s reported results. 4.44 (a) The 95th percentile is the amount of bandwidth below which 95 percent of all 5 minute mesurements fall. (b) The method using the 98th percentile would cost the company more because it would suggest a higher usage of bandwidth by the company. Section 4.2 Check Your Understanding, page 233: 1. This was an experiment because a treatment (brightness of screen) was imposed on the laptops. 2. This was an observational study. Students were not assigned to a particular number of meals to eat with their family per week. 3. The explanatory variable is the number of meals per week eaten with their family and the response variable is probably their GPA (or some other measure of their grades). 4. This is an observational study and there may well be lurking variables that are actually influencing the response variable. For instance, families that eat more meals together may also be families where the parents show more interest in their childrens’ education and therefore help them to do better in school. Chapter 4: Designing Studies 87 Check Your Understanding, page 240: 1. 2. Using an alphabetical list of the students, assign each student a number between 01 and 29. Pick a line of the random number table and read off two digit numbers until you have 15 numbers between 01 and 29. These students belong in the treatment group where students will meet in small groups. The other students will view the videos alone. 3. The purpose of the control group is to have a group to compare to. Presumably the students have been evaluating their own performances by themselves before. If you incorporate such a group into your experiment, you can evaluate if the group work is actually better. Check Your Understanding, page 244: 1. No, this experiment did not take the placebo effect into account. It is possible that women who “thought” they were getting an ultrasound would have different reactions to pregnancy than those who knew that they hadn’t received an ultrasound. 2. This experiment was not double-blind. While the people weighing the babies at birth may not have known whether that particular mother had an ultrasound or not, the mothers did know whether they had had an ultrasound or not. This means that the mothers may have affected the outcome since they knew whether they had received the treatment or not. 3. An improved design would have been one in which all mothers were treated as if they had an ultrasound, but for some mothers the ultrasound machine just wasn’t turned on (but this fact would not be obvious to the woman). This means that the ultrasound would have to have been done in such a way so that the woman could not see the screen. Exercises, page 253: 4.45 (a) This was an observational study because no treatment was imposed on the mothers. The researchers simply asked them to report both their chocolate consumption and their babies’ temperament. (b) The explanatory variable is the mother’s chocolate consumption and the response variable is the baby’s temperament. (c) No, this study is an observational study so we cannot make a conclusion of cause and effect. There could be a lurking variable that is actually causing the difference in temperament. 4.46 (a) This was an observational study because no treatment was imposed on the children. The researchers simply followed them through their 6th year in school, asking adults to rate their behavior at several times along the way. (b) The explanatory variable was the amount of time in child care from birth to age four-and-a-half. The response variable was the adult ratings of their behavior. (c) No, this study is an observational study so we cannot make a conclusion of cause and effect. There could be a lurking variable that is actually causing the difference in adult ratings of their behavior. 4.47 (a) This was an experiment because students were randomly assigned to the different teaching methods. (b) Since this was an experiment with proper randomization, the teacher can conclude that using the computer animation appears to result in higher increases in test scores. 4.48 (a) This is an example of an observational study. The researchers did not assign people to either use or not use cell phones. (b) No, this study is an observational study so we cannot make a conclusion of cause and effect. 88 The Practice of Statistics for AP*, 4/e 4.49 One possible lurking variable would private versus public schools. Private schools tend to have smaller classes, and private school students might tend to earn higher scores. There might be something else about the private schools, however, that leads to that success other than the small class sizes. So final success could be dependent on either of these two variables. 4.50 One possible lurking variable is level of academic motivation. Those who drink may have less academic motivation leading to lower grades. So if students do not do well, we are not sure if it is because of the alcohol itself or if it is simply a matter of lower level of academic motivation. 4.51 Experimental units: pine seedlings. Explanatory variable: Light intensity. Treatments: full light, 25% light and 5% light. Response variable: dry weight at the end of the study. 4.52 Subjects: The students living in the selected dormitory. Explanatory variable: The rate structure. Treatments: Paying one flat rate, or paying peak/off-peak rates. Response variables: The amount and time of use and total network use. 4.53 Subjects: the individuals who were called. Explanatory variables: (1) information provided by interviewer; (2) whether caller offered survey results. Treatments: (1) giving name/no survey results; (2) identifying university/no survey results; (3) giving name and university/no survey results; (4) giving name/offer to send survey results; (5) identifying university/offer to send survey results; (6) giving name and university/offer to send survey results. Response variable: whether or not the interview was completed. 4.54 Experimental units: middle schools. Explanatory variables: whether physical activity program was offered and whether nutrition program was offered. Treatments: (1) activity intervention; (2) nutrition intervention; (3) both interventions; (4) neither intervention. Response variables: physical activity and lunchtime consumption of fat. 4.55 Experimental units: fabric specimens. Explanatory variables: (1) roller type; (2) dyeing cycle time; (3) temperature. Treatments: (1) metal, 30 minutes, 150 degrees; (2) natural, 30 minutes, 150 degrees; (3) metal, 40 minutes, 150 degrees; (4) natural, 40 minutes, 150 degrees; (5) metal, 30 minutes, 175 degrees; (6) natural, 30 minutes, 175 degrees; (7) metal, 40 minutes, 175 degrees; (8) natural, 40 minutes, 175 degrees. Response variable: a quality measurement. 4.56 Subjects: students. Explanatory variables: (1) Step height; (2) metronome pace. Treatments: (1) 5.75 inches, 14 steps/minute; (2) 5.75 inches, 21 steps/minute; (3) 5.75 inches, 28 steps/minute; (4) 11.5 inches, 14 steps/minute; (5) 11.5 inches, 21 steps/minute; (6) 11.5 inches, 28 steps/minute. Response variable: increase in heart rate. 4.57 There was no control group for comparison purposes. We don’t know if this was a placebo effect or if the flavonol actually affected the blood flow. 4.58 There was no control group for comparison purposes this year. Over a year, many things can change: the state of the economy, hiring costs (due to an increasing minimum wage or the cost of employee benefits), etc. In order to draw conclusions, we would need to make the $500 bonus offer to some people and not to others during the same time frame, and compare the two groups. 4.59 (a) Write all names on slips of paper, put them in a container and mix thoroughly. Pull one slip out and note the name on it. That person gets assigned treatment 1. Pull another name out and assign that person to treatment 2. The third person gets assigned treatment 3. Keep rotating through the treatments Chapter 4: Designing Studies 89 until all have been assigned. (b) Assign the students numbers between 001 and 120. Pick a spot on Table D and read off the first 40 numbers between 001 and 120, skipping any that aren’t between 001 and 120 or are repeats. These are assigned to treatment 1. The next 40 numbers read are assigned to treatment 2. The remaining are assigned to treatment 3. (c) Assign the students numbers between 001 and 120. Using the RandInt function on the calculator, and ignoring all repeats, assign the first 40 numbers chosen to treatment 1, the next 40 to treatment 2, and so on. 4.60 (a) Write all names on slips of paper, put them in a container and mix thoroughly. Pull one slip out and note the name on it. That person gets assigned treatment 1. Pull another name out and assign that person to treatment 2. The third person gets assigned treatment 3. Keep rotating through the treatments until all have been assigned. b) Assign the students numbers between 001 and 150. Pick a spot on Table D and read off the first 25 numbers between 001 and 150, skipping any that aren’t between 001 and 120 or are repeats. These are assigned to treatment 1. The next 25 numbers read are assigned to treatment 2. Keep doing this until all people have been assigned to one of the 6 treatments. (c) Assign the students numbers between 001 and 150. Using the RandInt function on the calculator, and ignoring all repeats, assign the first 25 numbers chosen to treatment 1, the next 25 to treatment 2, and so on. 4.61 (a) This type of design is called a completely randomized design. The outline is given below: (b) Write the names of the patients on 36 identical slips of paper, put them in a hat, and mix them well. Draw out 9 slips. The corresponding patients will be in Group 1. Draw out 9 more slips. Those patients will be in Group 2. The next 9 slips drawn will be the patients in Group 3, and the remaining 9 patients will be assigned to Group 4. 4.62 (a) This is a completely randomized design. The outline is given below: (b) Assign the plots the labels 01 through 18. Write the labels on 18 identical slips of paper, put them in a hat, and mix them well. Draw out 6 slips. The corresponding plots will be in Group 1. Draw out 6 more slips. These plots will be in Group 2, and the remaining 6 plots will be in Group 3. 4.63 (a) Expense, condition of the patient, etc. In a serious case, when the patient has little chance of surviving, a doctor might choose not to recommend surgery; it might be seen as an unnecessary measure, 90 The Practice of Statistics for AP*, 4/e bringing expense and a hospital stay with little benefit to the patient. (b) Randomly assign the patients to two groups of 150 each. One group will receive the traditional surgery and the other group will receive the new method of treatment. At the end of the study, measure how many patients survived. 4.64 (a) Comparing this year to last year would not be a good idea because there may be lurking variables that have changed over time. (b) Randomly divide the 120 rural schools into two groups. In one group, offer the teacher better pay for good attendance. In the other group, do nothing. At the end of the study period, compare the attendance of the teachers. 4.65 (a) The principle of experimental design that is violated here is random assignment. If players are allowed to choose which treatment they get, those who choose one particular treatment over the other may be different in a fundamental way. For example, maybe the weaker players will be more likely to choose the new method and the stronger players would stick with weight lifting. (b) The response variable of the number of push-ups that a player can do could be part of what the coach should measure, but this only measures one kind of upper-body strength and he should probably combine this with other measures as well. 4.66 In a controlled scientific study, the effects of factors other than the nonphysical treatment (e.g., the placebo effect, differences in the prior health of the subjects) can be eliminated or accounted for, so that the differences in improvement observed between the subjects can be attributed to the differences in treatments. 4.67 (a) First we need to control for the effects of lurking variables and to use at least two groups for comparison purposes. Next we need random assignment to help create roughly equivalent groups before the treatments are administered. Finally, we need replication to ensure that a difference in response between the two groups is dueto the treatments and not chance variation. (b) There were two groups in the study: one in which the children were assigned to an intensive preschool program and one in which they weren’t. All of the children were given nutritional supplements and help from social workers. The children were assigned at random to the two groups, and there were a total of 111 children in the experiment. 4.68 The researcher should use Plan B. If he uses Plan A and discovers that the plants which had the weed killer X applied did better, he will not know if this is because of weed killer X or because these were the healthier plants to begin with. 4.69 (a) The placebo was the harmless leaf. (b) The results support the idea of a placebo effect because the subjects developed rashes on the arm exposed to the placebo (a harmless leaf) simply because they thought they were being exposed to the active treatment (a poison ivy leaf). 4.70 (a) If only the new drug is administered, and the subjects are then interviewed, their responses will not be useful, because there will be nothing to compare them to: How much “pain relief” does one expect to experience? Also, the placebo effect may lead some subjects to report a decrease in pain even if the new drug is ineffective. (b) The subjects should certainly not know what drug they are getting—a patient told that she is receiving a placebo, for example, will probably not experience any pain relief. 4.71 Because the experimenter knew which subjects had learned the meditation techniques, he (or she) may have had some expectations about the outcome of the experiment: if the experimenter believed that meditation was beneficial, he may subconsciously rate that group as being less anxious. 4.72 “Double-blind” means that the treatment (testosterone or placebo) assigned to a subject was unknown to both the subject and those responsible for assessing the effectiveness of that treatment. Chapter 4: Designing Studies 91 “Randomized” means that patients were randomly assigned to receive either the testosterone supplement or a placebo. “Placebo-controlled” means that some of the subjects were given placebos. Even though these possess no medical properties, some subjects may show improvement or benefits just as a result of participating in the experiment; the placebos allow those doing the study to observe this effect. 4.73 (a) Control: The effects of lurking variables on the response, whether the woman became pregnant, were controlled by controlling the manner in which the treatments were applied: half of the women received acupuncture treatment 25 minutes before embryo transfer and again 25 minutes after the transfer, the other half lay still for 25 minutes after the transfer. Random assignment: Randomly assigning the women to the two treatments should eliminate any systematic bias in assigning the subjects and should also balance out the effects of any lurking variables across the two treatment groups. Replication: Eighty women were assigned to each treatment. These groups are large enough to ensure that differences in the pregnancy rates of the two groups are due to the treatments themselves and not to chance variation in the random assignment. (b) The difference in the percent of women who received acupuncture and became pregnant and those who lay still and became pregnant was large enough to conclude that the difference was most likely due to the treatments rather than to chance. (c) We should be cautious about drawing conclusions based on the results of one study. The way this study was designed, it’s possible that the observed difference is due in part to the placebo effect since the women were aware of which treatment they received. If possible, another study should be done in which the control group received a fake acupuncture treatment. 4.74 (a) Researchers randomly assigned participants to diets to make sure that the two groups are as similar as possible before the treatments are administered. (b) The difference in weight loss seen was large enough to conclude that the difference was most likely due to the treatments rather than to chance. (c) Even though the low-carb dieters lost 2 kg more over the year than the low-fat group, this difference was small enough that it could be due just to chance variation in the random assignment, and not to the treatments themselves. 4.75 (a) Write “yawn” on 14 slips of paper and “no yawn” on 36 slips of paper. Mix the slips and draw out 16 of them. These will be the people subjected to the treatment with no yawn seed. The remainder will be in the yawn-seed treatment group. (b) We would conclude that yawning is not contagious. In our 50 random re-assignments, 10 yawns out of 34 people was not at all unusual. 4.76 (a) Dotplots for both groups are shown below. The differences for the patients in the active group follow a distribution with a gap between 1 and 4. We might even stretch a bit and say that the distribution is roughly symmetric with a mean of about 5. The differences ranged from 0 to 10. The distribution of the differences for the patients in the inactive group is skewed to the right with a center slightly above 1. Many patients reported no change in their pain ratings (a difference of 0) and the largest difference was 5. This means that those in the active group had a higher mean difference (5) than those in the inactive group (1) and had a distribution with much more variability (range = 10 for the active group and range = 5 for the inactive group). 92 The Practice of Statistics for AP*, 4/e (b) The average difference for the active group is 5.241 and the average difference for the inactive group is 1.095. The difference in these two means (active – inactive) is 4.146. (c) Write each patient’s name and difference in pain score on an index card, shuffle them up and deal them into two piles with 29 in one pile and the rest in the other. The 29 in the first pile will be considered the active group. (d) The Fathom dotplot shows that none of the 50 simulated differences was larger than 4.146. Thus, a difference of 4.146 would be extremely unlikely if both types of magnets provided the same level of relief. We would conclude that the active magnets probably do provide relief for polio patients. 4.77 (a) The blocks are the different diagnoses (e.g. asthma). Within any given diagnosis, we are looking for differences in patients’ health and satisfaction with medical care between doctors and nurse practitioners. (b) A completely randomized design would assign patients to two treatment groups (doctors and nurses) without regard to their diagnosis. This ignores differences between patients with asthma, diabetes, and high blood pressure, which would probably result in a great deal of variability in measures of health and satisfaction in both groups. That would make it harder to compare the effectiveness of nurses and doctors. Blocking will control for the variability in subjects’ responses due to their diagnosis. This will allow researchers to look separately at health and satisfaction for patients with each of the three diagnoses, as well as to better assess the relative effectiveness of nurses and doctors. 4.78 (a) The blocks are the sexes. The cancer reacts differently to treatments in men and women so we want to eliminate sex as a lurking variable. We want to test all three types of treatments in both men and women. (b) If we used a completely randomized design, we could end up with a treatment that is given much more frequently to one of the two sexes. Then we will not know if any differences we see between that treatment and the others are due to the treatment itself, or the fact that cancer reacts differently in women than men. (c) If the researchers had only 800 men and no women, we would not have to have a block design. We could just randomly assign the treatments to the subjects. Unfortunately, we would only be able to make conclusions about how the treatments work in men. 4.79 (a) The difference in soil fertility among the plots is a potential lurking variable. A completely randomized design could assign one of the varieties of corn to more fertile plots just by chance. If those plots produced extremely high yields, we wouldn’t know if it was due to the corn variety or to soil fertility. Blocking will allow researchers to control for the variability in yield due to soil fertility. (b) The researchers should use the rows as the blocks. All plots in the same row have the same amount of fertility and so are as similar as possible. (c) Let the digits 1-5 correspond to the five corn varieties A-E. Begin with, say, line 110, and assign the letters to the rows from west to east (left to right). Use a new line (111, 112, 113, 114, and 115) for each row. For example, for Block 1, we obtain 3, 4, 4, 4, 1, 3, 3, 2 corresponding to varieties C, D, A, B and lastly E being planted in the first row from west to east (ignoring non-bolded numbers). The remaining rows are assigned using this same process. The results of this assignment are shown in the table below. Chapter 4: Designing Studies 93 Block 1 C D A B E Block 2 A D E C B Block 3 E C D A B Block 4 B E D C A Block 5 D E A C B Block 6 A D C B E 4.80 (a) A randomized block design would be better in this case to control for the lurking variable of initial weight. A completely randomized design could assign several of the more overweight subjects to the same weight-loss treatment. If these subjects lost much more weight than subjects receiving the other treatments, researchers wouldn’t know if it was due to the treatment or to subjects’ initial weights. (b) The blocks should be based on how overweight the subjects are so that the subjects within a block are as similar as possible. If we block on last name, there will be potentially many differences between the people in a block. (c) Ordered by increasing weight, the five blocks are (1) Williams-22, Deng-24, Hernandez-25, and Moses-25; (2) Santiago-27, Kendall-28, Mann-28, and Smith-29; (3) Brunk-30, Obrach-30, Rodriguez-30, and Loren-32; (4) Jackson-33, Stall-33, Brown-34, and Cruz-34; (5) Birnbaum-35, Tran-35, Nevesky-39, and Wilansky-42. The exact randomization will vary with the starting line in Table D. Different methods are possible; perhaps the simplest is to number the subjects from 1 to 4 within each block, then assign the members of block 1 to a weight-loss treatment, then assign block 2, etc. For example, starting on line 133, we assign 4-Moses to treatment A, 1-Williams to B, and 3Hernandez to C (so that 2-Deng gets treatment D), then carry on for block 2, etc. 4.81 (a) If all rats from litter 1 were fed diet A and we found diet A to be better, we would not know if this was because of the diet itself, or because that rat strain was different from the other rat strain. This is what it means for the strain and the diet to be confounded. We cannot separate out the effects of one variable from the effects of the other. (b) A better design would be a randomized block design with the strains as the blocks. In this case, each diet would be given to some rats of each strain. 4.82 (a) Every instructor has their own teaching style. If we assign two instructors to teach using standard technology and two to use multimedia technology, and we find a difference between the two sets of sections, we will not know if the difference is due to the technology or to the instructor. (b) A better design would be to use the instructors as blocks since each instructor teaches two sections. In one randomly chosen section they would use standard technology and in the other they would use multimedia. 4.83 (a) This is a completely randomized design. (b) Have the students be the blocks. Have each student drive the simulator twice – once with a hands-free cell phone and once without, in random order. (c) So that we are not measuring an order effect – that the students are better at the driving simulator the second time no matter what the treatment is. 94 The Practice of Statistics for AP*, 4/e 4.84 (a) This was a matched pairs design because each volunteer got each treatment in a random order. (b) The investigators chose this type of design over a completely randomized design because they recognized that each individual would have different characteristics to their blood vessels. This way they can directly compare the blood flow with and without the bittersweet chocolate under the same conditions (in the same body). (c) It is important to randomize the order of the treatments for each subject so that we are not measuring a time effect. We want to make sure that any effect we see is due to the chocolate and not to the time at which we measured it. 4.85 (a) (b) All subjects will perform the task twice, once in each temperature condition. Randomly choose which temperature each subject works in first by flipping a coin. 4.86 (a) A figure with 6 circular areas is shown below. Randomly assign three of the circles to be treated with additional CO2 and the other three circles to be left untreated. At the end of the study, compare tree growth in the treated and untreated areas. Table D was used to select 3 areas for the treatment, starting at line 104. The first 4 digits are: 5 2 7 1. We cannot use the 7 because it is more than 6. Therefore, we would treat areas 5, 2 and 1. (b) A figure with 3 pairs of circular areas is shown below. For each pair, we randomly assign one of the two areas to receive additional CO2 and the other to be left untreated. Compare tree growth for the treated and untreated area in each pair. A coin was flipped for each pair. If the coin landed heads then the top area was treated and the bottom area was left untreated. If the coin landed tails then the top area is left untreated and the bottom area is treated Chapter 4: Designing Studies 95 4.87 (a) This experiment confounds gender with deodorant. If the students find a difference between the two groups, they will not know if it is a gender difference or due to the deodorant. (b) A better design would be a matched pairs design. In this case each student would have one armpit randomly assigned to receive deodorant A and the other deodorant B. Have each student rate the difference between their own armpits at the end of the day. 4.88 (a) This experiment confounds the alarm setting (either set or not) with weekend days and week days. It is likely that Justin goes to bed at different times on the weekend than he does on the weekday and this may have an effect on his average wake-up time. (b) A better design would be a randomized block design with the weekend days being one block and the week days being the other block. Justin would randomly assign one weekend day to set his alarm and the other day for having no alarm, and do likewise for the week days. This allows us to make sure the days in which the alarm is set are similar to the days in which it is not set. 4.89 Take the 50 volunteers and randomly assign them to one of two equal groups (25 volunteers each). Now randomly select one of the two groups. In this group, give men razor A to use. Give razor B to the men in the other group. Measure how close the razor shaves. On the next morning, give the men the other razor and measure how close the razor shaves. Analyze the difference in closeness. 4.90 Take the 30 students and divide randomly into two equal groups (15 students each). Now randomly select one of the two groups. In this group, have music playing while a story is read. In the other group, read a story without music in the background. Test the students for recall. Now reverse the treatments. For those students who had music with the first story, read to them with no background noise and use background music with the others. Test the students for recall. Analyze the difference in recall for the individual students. 4.91 c 4.92 a 4.93 b 4.94 d 96 The Practice of Statistics for AP*, 4/e 4.95 c 4.96 d 4.97 c 4.98 b 4.99 (a) Since we know the weights of seeds of a variety of winged bean are approximately Normal, we can use the Normal model to find the percent of seeds that weigh more than 500 mg. First, we standardize 500 mg: z= x−µ σ = 500 − 525 −25 = = −0.23 110 110 Using Table A, we find the proportion of the standard Normal curve that lies to the left of z = −0.23 to be 0.4090, which means that 1 – 0.4090 = 0.5910 lies to the right of z = −0.23. Thus, 59.1% of seeds weigh more than 500 mg. (b) We need to find the z-score with 10% (or 0.10) to its left. The value z = −1.28 has proportion 0.1003 to its left, which is the closest proportion to 0.10. Now, we need to find the value of x for the seed weights that gives us z = −1.28: x − 525 −1.28 = 110 −1.28(110) = x − 525 525 − 1.28(110) = x 384.2 = x If we discard the lightest 10% of these seeds, the smallest weight among the remaining seeds is 384.2 mg. 4.100 The scatterplot of IQ’s for Twin A and Twin B is given below. We see that there is a reasonably strong ( r = 0.91) linear relationship between the IQ’s of the twins. 4.101 If we subtract Twin B – Twin A, and look at a dotplot of the differences, we see the graph given below. Chapter 4: Designing Studies 97 Since all but one of the differences are positive, this suggests that in most cases Twin B (the one living in the higher income homes) tends to have a higher IQ. This is confirmed by computing the mean and standard deviation. The mean difference is 5.83 and the standard deviation is 3.93. This says that the IQ of the twin living in homes with higher incomes has, typically, an IQ that is 5.83 points higher than their corresponding twin and that the average difference for sets of twins is about 3.93 points away from that 5.83. Section 4.3 Exercises, page 269: 4.102 If the study involves random sampling, then we can make inferences about the population from which we sampled. If the study involves random assignment we can make inferences about cause and effect. 4.103 Since this study involved random assignment to the treatments (foster care or institutional care), we can infer cause and effect. Therefore we can conclude that living in foster care in Romania is better than living in an institution. 4.104 Since this study involved random assignment to the treatments (freezer or room temperature), we can infer cause and effect. Therefore we can conclude that storing batteries in the freezer leads to a higher average charge for batteries produced by this company. Also, since the batteries were randomly chosen, we can generalize to the whole population of batteries. 4.105 Since this study did not involve random assignment to a treatment we cannot infer cause and effect. Also, since the individuals were not randomly chosen, we cannot generalize to a larger population. 4.106 Since this study involved a random sample, we can make an inference about the population. It appears that those who attend religious services regularly have a lower risk of dying younger. But we cannot infer cause and effect. We do not know that attending religious services is the reason for this lower risk. 4.107 Daytime running lights may be effective because they catch the attention of other drivers. As they become more common, they may be less effective at catching the attention of other drivers because people may simply get used to them. We also need to pay attention to how much reduction there is from using daytime running lights. If it’s only a very small amount, the cost of installing them may not be justified. 4.108 The psychologist should not generalize to a team of employees that spend months developing a new product that never works right and is abandoned. She has “put together” a team of students. This suggests that there was no randomization involved. But regardless of that, students are likely to be in a different place in their lives than employees who are on the job for at least several months and likely much longer. Also, the disappointment associated with losing games during an evening is not likely to be equivalent to the disappointment felt after months of hard work. 4.109 Answers will vary. Possible answers include: (a) Many people would consider pricking a finger to be of minimal risk. (b) Fewer people would consider drawing blood from an arm to be of minimal risk. (c) It is unlikely that very many people would consider inserting a tube into the arm that remains there to be of minimal risk. 98 The Practice of Statistics for AP*, 4/e 4.110 Answers will vary. Possible answers include: (a) A non-scientist will be more likely to consider the subjects as people and not be blinded by the scientific results that might be discovered. (b) You might consider at least two outside members. A member of the clergy might be chosen because we would expect them to help lead the committee in ethical and moral discussions. You might also choose a patient advocate to speak for the subjects involved. 4.111 Answers will vary. Possible answers include: (a) Many would consider this to be an appropriate use of collecting data without participants’ knowledge because the data is, in effect, anonymous. (b) Many would consider this to be appropriate because the meetings are public and the psychologist is not misleading the participants. (c) Most would consider this to be inappropriate because the psychologist is misleading the other participants and attending private meetings. 4.112 Answers will vary. One possible answer is: Any collection of data on minors should be made with parental consent only. This allows the parents to be aware of what is being asked of their children and they can decide if the subject matter is appropriate for their children. 4.113 The responses to the GSS are confidential. The person taking the survey knows who is answering the questions (they were chosen in some random fashion), but will not share the results of individuals with anyone else. 4.114 This describes the anonymous screening. The patient never gives their name, but rather is just assigned a number. No one at the clinic can put the results together with a name because the name was never given. 4.115 In this case the subjects were not able to give informed consent. They did not know what was happening to them and they were not old enough to understand the ramifications in any event. 4.116 Answers will vary. One possible answer is: Yes, providing these potentially life-changing services to some but not all seniors in the study is unethical. We can’t withhold important services from some people. 4.117 From the given two-way table of response by gender, find and compare the conditional distributions of response for men alone and women alone. These values are in the table below. To find the conditional distributions, divide each entry in the table by its column total. Response Strongly Agree Agree Neither Disagree Strongly Disagree Male 14.7% 52.3% 16.9% 11.8% 4.3% Female 9.3% 38.8% 21.9% 19.3% 10.7% We also present the same conditional distributions in the bar chart below. Chapter 4: Designing Studies 99 From the table and the bar chart we see that men are more likely to view animal testing as justified if it might save human lives: over two-thirds of men agree or strongly agree with this statement, compared to slightly less than half of the women. The percentages who disagree or strongly disagree tell a similar story: 16% of men versus 30% of women. 4.118 The mean is not resistant to outliers. We are told that Cisco systems stock went up 60,600%. This is clearly an outlier and will greatly influence the mean. Since the outlier is very positive, this will make the mean much higher than the median. In fact, the outlier has such a big influence that it even changes the sign from negative to positive. Chapter Review Exercises (page 271) R4.1 (a) The population is Ontario residents; the sample is the 61,239 people interviewed. (b) The sample size is very large, so if there were large numbers of both sexes in the sample—this is a safe assumption since we are told this is a “random sample”—these two numbers should be fairly accurate reflections of the values for the whole population. R4.2 Answers will vary. One possible answer is: (a) Announce in daily bulletin that there is a survey concerning student parking available in the main office for students who want to respond. Since voluntary surveys are generally responded to only by those who feel strongly about the issue, these results will likely be biased. (b) Personally interview a group of students having lunch in the center quad. Convenience samples are not generally representative of the population leading to biased results. R4.3 (a) Alphabetically associate each name with a two-digit number: Agarwal = 01, Andrews = 02, …, Wilson = 25. Move from left to right reading pairs of digits until you find three different pairs between 01 and 25. (b) Using the numbers given, choose 17 52 17 80 09 46 23. These numbers correspond to Musselman, Fuhrmann and Smith. 100 The Practice of Statistics for AP*, 4/e R4.4 A stratified random sample would probably be best here; one could select 50 faculty members from each type of institution. If a large proportion of faculty in your state work at a particular class of institution, it may be useful to stratify unevenly. If, for example, about 50% teach at Class I institutions, you may want half your sample to come from Class I institutions. A simple random sample might miss faculty from one particular type of institution, especially if there are not many faculty at that type of institution. A cluster sample might introduce bias. If the clusters are taken to be different schools, faculty at one school may have a different opinion than faculty at other schools depending on their particular student body. R4.5 (a) A potential source of bias related to the question wording is that people may not remember how many movies they watched in a movie theater in the past year. It might help the polling organization to shorten the amount of time that they ask about, perhaps 3 or 6 months. (b) A potential source of bias not related to the question wording is that the poll contacted people through “residential phone numbers.” Since more and more people (especially younger adults) are using only a cellular phone (and do not have a residential phone), the poll omitted these people from the sampling frame. These same people might be more likely to watch movies in a movie theater. The polling organization should include cell phone numbers in their list of possible numbers to call. R4.6 (a) The data were collected after the anesthesia was administered. Hospital records were used to “observe” the death rates, rather than imposing different anesthetics. (b) One possible confounding variable could be type of surgery. If one anesthesia is used more often with a type of surgery that has a higher death rate anyway, we wouldn’t know if the death rate was higher because of the anesthesia type or the surgery type. R4.7 The experimental units are the potatoes used in the experiment. The explanatory variables are storage time and time from slicing until cooking. There are six treatments: (1) fresh picked and cooked immediately, (2) fresh picked and cooked after an hour, (3) stored at room temperature and cooked immediately, (4) stored at room temperature and cooked after an hour, (5) stored in refrigerator and cooked immediately, (6) stored in refrigerator and cooked after an hour. The response variables are ratings of color and flavor. R4.8 Assign each person a number and using a random number table or technology, choose half to be in one group and the other half to be in the second group. When a person from the first group is called, use the first description. When a person in the second group is called, use the second description. Record their responses. See the figure below. R4.9 (a) The design accounted for the placebo effect by giving some patients a treatment that should have no effect at all, but looks, tastes and feels like the St.-John’s-wort. These people think that they are being treated but, in fact they are not. So if they get better, this would be a case of the placebo effect. (b) The study should be double-blind. The subjects should not know which treatment they are getting so that the researchers can measure how much placebo effect there is. But the researchers should also be blinded so that they cannot influence how they measure the results. Chapter 4: Designing Studies 101 R4.10 (a) This is a randomized block design. Blocking helps control for the variability in responses due to people’s running habits. (b) We use randomization to make sure that the two groups of people in each block are as similar as possible before the treatments are administered. (c) A difference in rate of infection may have been due to the effects of the treatments, or it may simply have been due to random chance. Saying that the placebo rate of 68% is “significantly more” than the Vitamin C rate of 33% means that the observed difference is too large to have occurred by chance alone. In other words, Vitamin C appears to have played a role in lowering the infection rate of runners. R4.11 (a) Randomly assign 15 students to Group 1 (easy mazes) and the other 15 to Group 2 (hard mazes). Compare the time estimates of Group 1 with those of Group 2. (b) Each student does the activity twice, once with the easy mazes, and once with the hard mazes. Randomly decide (for each student) which set of mazes is used first. Compare each student’s “easy” and “hard” time estimate (for example, by looking at each “hard” minus “easy” difference). (c) The matched pairs design would be more likely to detect a difference because it controls for the variability between subjects. R4.12 (a) This does not meet the requirements of informed consent because the subjects did not know the nature of the experiment before they agreed to participate. (b) All individual data should be kept confidential and the experiment should go before an institutional review board before being implemented. (c) This would allow for inference about cause-and-effect if the students are randomly assigned to the two treatments. AP Statistics Practice Test (page 274) T4.1 c. A census is defined to be measuring all individuals in the population. T4.2 e. Ignore numbers that are larger than 816 or are duplicate numbers. T4.3 d. In order to infer cause and effect, we must run a well-designed experiment. This was an observational study. T4.4 c. This is the definition of a Simple Random Sample. T4.5 b. By randomly assigning treatments we are attempting to make the different groups look as similar as possible so that we can reduce the likelihood of a confounding variable. T4.6 b. It is very difficult to show cause and effect using observational studies. It is much easier in an experiment where the researcher has control over how the treatments are applied. T4.7 d. By stratifying we can control how many people we survey in each of the different kinds of areas. T4.8 d. Bias in the responses means that you are getting responses that are systematically different from the truth. 102 The Practice of Statistics for AP*, 4/e T4.9 d. This is a completely randomized design because you randomly assign subjects to one of the four groups. There are two factors: Length of ad (30 seconds or 60 seconds) and Repeat (1 time or 3 times). T4.10 b. In a matched pairs design, the two observations in the pair should be as similar as possible. So use a subjective method for pairing the plots. Once the pairs are chosen, then randomly assign the two treatments to the two plots in the pair. T4.11 d. The teachers who responded likely feel more strongly about the issue and shouldn’t be considered to be representative of the entire population of teachers under consideration. T4.12 (a) The experimental units are the acacia trees. The treatments are placing either active beehives, empty beehives or nothing in the trees. The response variable is the damage caused by elephants to the trees. (b) Randomly assign 24 of the acacia trees to have active beehives placed in them, 24 randomly to have empty beehives placed in them and the remaining 24 to remain empty. To do this, assign the trees numbers from 01 to 72 and use a random number table to pick 24 2-digit numbers in this range. Those trees will get the active beehives. The trees associated with the next 24 2-digit numbers will get the empty beehives and the remaining 24 trees will remain empty. Compare the damage caused by elephants to the trees with active beehives, those with empty beehives and those with no beehives. T4.13 (a) It is not a simple random sample because not all samples were possible. For instance, given their method, they could not have had all respondents from the east coast. (b) One adult was chosen at random to control for lurking variables. Perhaps household members who generally answer the phone have a different opinion than those who don’t generally answer the phone. (c) There was undercoverage in this survey. Those who do not have telephones, or those who have only cell phones were not part of the sampling frame. So their opinions would not have been measured. Since cell-phone-only users tend to be younger, the results of the survey may not accurately reflect the entire population’s opinion. T4.14 (a) Each of the 11 individuals will be a block in a matched pairs design. Each participant will take the caffeine tablets on one of the two-day sessions and the placebo on the other. The order in which they take the caffeine or the placebo is decided randomly. The tapping test is administered at the end of each two-day trial. The results to be compared are the differences between the caffeine and placebo scores on the tapping test. The blocking was done to control for individual differences in dexterity. (b) The order was randomized to control for any possible influence of the order in which the treatments were administered on the subject’s tapping speed. (c) It is possible to carry out this experiment in a doubleblind manner. This means that neither the subjects nor the people who come in contact with them during the experiment (including those who record the number of taps) had knowledge of the order in which the caffeine or placebo was administered. Chapter 4: Designing Studies 103