2/17/08 International Statistical Literacy Competition of the ISLP http://www.stat.auckland.ac.nz/~iase/islp/competition Training package 2 This document contains more activities to start training students for the international statistical literacy competition, some articles and some small lessons. You and/or your students may choose the activities that are more adequate for the level of your students at the moment. The questions here are not the “typical” competition questions, rather they are building blocks, but will help train to answer the more comprehensive ones. Besides, you can choose to do them here and there, in an idle moment in your class, for fun. Of course, keep doing whatever you are doing in your data analysis and chance curriculum. As you know, most of these resources come from resources prepared by others and posted in the ISLP web site. We select them to convey the building blocks of the competition and also to promote resources of countries. 1.- Activity: Sandwich Problem (Warm-Up) http://www.doe.virginia.gov/VDOE/Instruction/Elem_M/mprobstat.ht ml Format: Large Group Objectives: Participants will develop an appreciation for graphical representations of data and the need for statistics. Materials: Sandwich Problem Narrative Activity Sheet, and Sandwich Problem Graph Activity Sheet Time Required: 10 minutes Directions: 1. Without informing the participants, break them into two groups (front of the audience versus back of the audience). 2. Distribute the two sandwich problem activity sheets FACE DOWN; distribute the graphical representation of the sandwich data to one group; and the narrative version of the data to the other group. 3. Tell the groups that this is a test on the sandwich data and that you are going to keep track of the people who raise their hand first to answer the questions. Then ask them to turn over their papers and respond to the following questions. Keep track of those who answer first, expecting that those with the graphical answer will respond first. Ask the following three questions. Call on the first person who raises a hand to answer the question. 1. What sandwich was preferred more by people than any other sandwich? 2. What sandwich types were preferred by only two people? 3. What sandwich type did Oliver prefer? 4. After reinforcing that one part of the room was doing better than the others in answering the questions, show the entire group a copy of both types of data. This is a good place to begin the discussion of why statistics is important in this “information age.” Distribute the extra copies of activity sheets so that each participant has a copy of both the graph and narrative sandwich problem activity sheets. Virginia Department of Education Sandwich Problem – Page 2 5. The Lunch Bunch’s Favorites Laura had peanut butter and jelly. Kenny had plain jelly. Oliver also had plain jelly. Katie and David had plain peanut butter. Oh, I forgot to mention that Steven, Isabel, and Sam also had peanut butter and jelly. Kristen had peanut butter and fluff. Mariko had plain fluff while Sally and Ty had jelly and fluff. Number Who Preferred Sandwich The Lunch Bunch’s Favorites Isabel Steven David Oliver Katie Kenny Mariko Laura Plain Jelly Plain Fluff Peanut Butter & Jelly Plain Peanut Butter Sam Ty Kristen Peanut Butter & Fluff Sandwich Types Sally Jelly & Fluff 2. Activity: Sixth Grade Mystery Data http://www.doe.virginia.gov/VDOE/Instruction/Elem_M/mprobstat.ht ml Format: Small Groups; Whole Group Objectives: Participants will develop an understanding of the relationship between the question and the analysis of the data. Materials: Copies of Sixth Grade Mystery Data, Copies of Questions, Copies of Graphs A, B, and C Time Required: 30 minutes Directions: 1. Divide the participants into small groups of four to five. Give each group a copy of the Sixth Grade Mystery Data and a copy of the questions to be answered. Tell them they have 15 minutes to answer the questions and to discuss their solutions. 2. After the small groups have completed the task, have the entire group share their solutions and how they arrived at those solutions. Focus the discussion on the relationship of the question to the data. 3. Discuss Graph A (Ice Cream Preferences). Have participants share questions about this graph that could be asked of K-2 students. 4. Discuss Graph B (Number of Cavities). Have participants share questions about this graph that could be asked of students in grades 3-5. 5. Discuss Graph C (Relationship of Height to Age). Have participants share questions about this graph that could be asked of students in grades 6-8. Sixth Grade Mystery Data Look at the graphs on the next pages. Each graph shows something about a classroom of sixth graders. 1. Which of the five graphs do you think shows: a. The number of cavities the sixth graders have? b. The number of people in the sixth graders’ families? c. The ages of the sixth graders’ mothers? d. The heights of the sixth graders in inches? 2. Why do you think the graph you picked for d is the one that shows the heights of sixth graders? Why do you think the other graphs don’t show the sixth graders’ heights? 3. One of the graphs was not selected to answer question one above. What do you think this data display might represent? Why? Sixth Grade Mystery Data Graph 1 0 1 x x 2 x x x x x x 3 x x x x x x x 4 x x x x x 5 x x 6 7 x 63 x x 64 x x x 65 x 66 x x 51 x x x 52 x x x 53 x x x 54 x 8 x 9 Graph 2 x 58 59 x 60 x 61 x x 62 x x 67 x x x 68 x 69 x x 70 x x 55 x x x 56 x 57 x 58 x 71 x x 72 x 73 Graph 3 x 46 47 x x 48 x x 49 x 50 8 Graph 4 x x x x x x x x x x 0 1 x x x x x x x x x x x x x 2 3 4 5 6 7 8 Graph 5 x x x x x x x x x x x x x x x x x x x x x x x 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 9 Graph A Ice Cream Preferences Vanilla Chocolate Mint Chip Cookies & Cream Other = 1 student Graph B Number of Cavities X X X X X X X X X X X X X X X X X X X X 0 1 2 3 4 5 6 Graph C Height in Inches Relationship of Height to Age 75 70 65 60 55 50 45 40 35 30 25 20 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Age in Years 3. Activity: Graph Detective http://www.doe.virginia.gov/VDOE/Instruction/Elem_M/mprobstat.ht ml Format: Pairs or small groups Objectives: Participants will apply the skills they learned from the previous activities to analyze graphs for missing attributes. Materials: Set of graphs with missing information Time Required: 20 minutes Directions: 1. To wrap up the session, give each group a set of three graphs and conclusions and ask them to discuss whether the conclusion is accurate and what factors about the graph may have lead to inaccurate conclusions. Key factors in the misleading graphs include the following: Graph 1: Missing years – A trend cannot be seen by examining only two years. USA Today uses a lot of these graphs to compare two years. This graph compares unemployment in 1992 and 1998. Although 1998 may be lower than 1992, we should not assume that 1999 will be lower than 1998 or that the trend between 1992 and 1998 was downward. We need to see the other years to determine if there is a trend over time or if 1998 is a fluctuation. Graph 2: Broken scales – Often graphs imply larger differences than are true because the scale is broken. In this graph, because the scale is broken, it appears that there were twice as many births in July than there were in June. Graph 3: Size Distortions – Picture graphs using objects to demonstrate change can be misleading because the size of the objects may not truly represent the relative numerical value. This graph shows the relative earnings of men and women. Women earn approximately 70¢ for every dollar that men earn; yet, the graph implies that men earn nearly three times as much as women because of the relative size of the threedimensional bars. Would You Draw the Same Conclusion? Unemployment Rate Unemployment Rates (1992 - 1998) Conclusion: Unemployment rates have fallen steadily since 1992 and will continue to fall in the near future. 10 8 6 4 Group’s Conclusion: 2 0 1992 1998 Year Conclusion: The number of births doubled between June and July. 340 330 320 310 300 290 280 270 260 N ov p Se l Ju M ay Ja M ar Group’s Conclusion: n Births (Thousands) Number of Births per Month Month Earning Power of Men versus Women 100 cents 70 cents $ Women $ Conclusion: Men make almost three times as much as women. The bar for men has nearly three times as much volume. Group’s Conclusion Men 15 4. Activity: What’s In the Bag? Format: Pairs Objectives: Participants will conduct simple probability experiments to predict outcomes. Materials: Paper bags, color tiles, recording sheet Time Required: 20 minutes Directions: 1. Organize participants into pairs. 2. Give each pair a paper bag with 10 color tiles inside (7 blue and 3 red). 3. Pairs will pull out one tile (without looking into the bag) and record the color on their recording sheet. The tile should be returned to the bag. Each pair will pull out tiles following this process a total of ten times 4. As pairs finish, have them record their results on a class graph at the front of the room. 5. When the class data is complete, have pairs look at the total number of blue and red tiles pulled and then make their prediction about the number of blue and red tiles in the bag. 6. After everyone has had a chance to predict, discuss the predictions and reasons why. 7. Have pairs look into their bags and record the actual results. 8. Discuss why their predictions may have differed from the actual number. What was helpful in making their predictions? 9. Repeat using bags with four colors of tiles. Discuss differences noted. 16 What’s In the Bag? http://www.doe.virginia.gov/VDOE/Instruction/Elem_M/mprobstat.ht ml Pick one tile from the paper bag. Record the color on the table below. Put the tile back into the bag. Choose another tile. Repeat this process 9 more times. Blue Red My prediction: There are ________ blue tiles. There are _________ red tiles. Let’s try again with four colors! Actual results: blue tiles ____________ red tiles _____________ Blue Red Yellow Green My prediction: Actual results: There are ________ blue tiles. blue tiles ____________ There are _________ red tiles. red tiles _____________ There are _________ yellow tiles. yellow tiles ___________ There are _________ green tiles. green tiles _____ Page 15 – Page 18 Tree Diagrams You are trying to decide which pizza to order for dinner. Your choices for crust are: regular, thin, and deep dish. You only want one topping and will either choose pepperoni or sausage. Construct a tree diagram to show the possibilities you have from which to choose one crust with one topping. How would your sample space change if you added bacon as a third topping choice? 19 5. Activity: The Real Meal Deal Format: Pairs Objective: Participants will use a tree diagram and the Fundamental Counting Principle to determine the sample space of an event. Materials: Real Meal Restaurant Menu, chart paper for tree diagrams Time Required: 20 minutes Background: The Fundamental Counting Principle is a method for finding the number of ways that two or more events can occur by multiplying the number of ways that each event can occur. The Principle states that, if successive choices are made, then the total number of choices is the product of the number of choices at each stage. For example, if you have 3 shirts and 2 pairs of jeans, then you have a total 6 different outfits to wear. Each shirt may be worn with each pair of jeans. There are 3 shirts times 2 pairs of jeans for a total of 6 outfits. A tree diagram is a visual way to see all of the outcomes. SHIRTS JEANS OUTFITS (OUTCOMES) blue denims plaid shirt, blue denims plaid shirt black denims plaid shirt, black denims blue denims red shirt, blue denims red shirt black denims red shirt, black denims blue denims blue shirt, blue denims blue shirt black denims blue shirt, black denims Directions: 1. Based on the menu of the Real Meal Restaurant, participants will use the Fundamental Counting Principle to determine the number of different meals that can be served. 2. Based on customer wishes, participants will determine and display the choices using a tree diagram. Page 20 REAL MEAL RESTAURANT SANDWICHES: Ham and Turkey Club Rachael on Rye Sliced BBQ Pork Hamburger Deli Cold Cut Special BLT FRENCH FRIES: small medium SALADS: DRESSINGS: Ranch French Creamy Italian Garden Salad Chef Salad Cobb Salad BEVERAGES: Soft Drinks: Coke Pepsi Sprite Tea: Coffee: Milk: regular small large medium large medium medium large large low-fat 1. How many possible meals can be served at the Real Meal, choosing only one item from each category? 2. How many choices are there if a customer wants the following: a. a soft drink, sandwich, and fries? Display the choices with a tree diagram. b. a sandwich, fries, and milk? Display the choices with a tree diagram. c. salad with dressing and tea? Display the choices with a tree diagram. d. a sandwich, salad with dressing, and coffee? Display the choices with a tree diagram. Page 21 6.The JellyBlubber Colony Objective: Materials: Time: Instructions: This activity introduces the Simple Random Sample (SRS) to students, and shows why this process helps to get an unbiased sample statistic. Relying on our perceptions can often be deceiving. In this exercise students are asked to determine the average length of a jellyblubber (a recently discovered marine species) using a variety of techniques. The student will learn that a Simple Random Sample (SRS) is the most accurate method of determining this parameter, and that intuition can be deceptive. One ‘The JellyBlubber Colony’ worksheet and one ruler per student. 1 period Pass out the worksheet upside down. Ask students to not look at the sheet until they are instructed. Tell the students a story about the recently discovered colony of jellyblubbers, a new marine species, and that our task is to try to determine the average length (measured horizontally) of a blubber. Allow the students to look at the Colony for five seconds. They will then estimate the average length of a blubber. The teacher plots the students’ guesses as a dotplot, then leads the entire class in a discussion of the dataset. The student is now told to choose a representative sample of 10 blubbers. Once they have made their choice, they measure the length of each blubber and calculate the mean length. The teacher plots these values on a new dotplot, followed by a whole class discussion of dataset. Now the student takes a SRS of 10 blubbers, as follows. Each blubber is numbered from 1 to 100. They generate 10 random numbers from a random number table in the range 1 to 100. They calculuate the mean length of those ten blubbers. The teacher plots these means on a third dotplot. Each dotplot must have the same scale for comparison purposes. The class discusses the difference in the distributions - location, spread, outliers, etc. The actual average length of a blubber is 19.4 cm. Which method gave the best estimate? How accurate was it? How much spread was there around the correct value? Discussion: A student decides to generate a random sample by closing her eyes and pointing at the sheet of blubbers randomly. She choses the blubber to which her finger is closest. Comment on this method of generating a SRS. Extension: A similar exercise can be conducted by putting a number of pieces of string Page 22 References: Page of varying lengths into a bag and having students pull out a ‘random sample’ of lengths of string. Since a longer piece is more likely to be selected than a shorter one, the sample generated in this fashion is likely to give a biased result - one that is too large. Statistics, Concepts and Controversies, 4th Edition, David S. Moore 23 The JellyBlubber Colony Blubber # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 Page Length 9 5 9 33 22 5 10 40 20 10 12 5 8 41 5 32 5 10 21 20 34 5 32 5 9 40 5 49 9 41 5 20 43 7 20 10 5 14 15 10 41 Blubber # 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 24 Length 35 37 9 25 5 10 9 45 40 8 20 25 10 8 37 8 20 13 34 42 40 40 40 30 20 7 5 25 17 8 8 5 13 42 10 5 10 27 30 10 42 42 43 44 45 46 47 48 49 50 5 17 15 40 5 30 8 5 40 92 93 94 95 96 97 98 99 100 6 10 25 7 40 8 5 40 20 The JellyBlubber Colony |---------|---------|---------|---------|---------|---------|---------|--------| 0 5 10 15 20 25 30 35 40 |---------|---------|---------|---------|---------|---------|---------|---------| 0 Page 5 10 15 20 25 25 30 35 40 |---------|---------|---------|---------|---------|---------|---------|---------| 0 5 10 15 20 25 30 35 40 |---------|---------|---------|---------|---------|---------|---------|--------0 Page 5 10 15 20 26 25 30 35 40 1 2 The JellyBlubber Colony 4 5 9 12 10 7 11 6 3 13 8 18 15 19 16 14 17 25 21 23 22 26 24 20 34 30 36 33 27 35 28 29 31 47 46 32 37 41 44 42 48 43 39 38 45 40 49 52 50 51 59 61 56 60 58 57 55 54 53 70 65 68 69 63 62 64 66 71 74 73 67 72 79 80 75 76 78 77 90 89 83 Page 81 82 27 84 85 88 99 87 Virginia Department of Education Real Meal Restaurant Activity Sheet – Page 28 7. HEIGHTS OF SINGERS IN THE NY CHORAL SOCIETY IN 1979 Description: http://lib.stat.cmu.edu/DASL/Stories/SingerHeights.html Heights of singers in the NY Choral Society in 1979. The data set below contains self-reported, to the nearest inch, heights of singers in the New York Choral Society in 1979. Voice parts in order from highest pitch to lowest pitch are Soprano, Alto, Tenor, Bass. The first two are female voices and the last two are male voices. The original dataset included two divisions for each voice part. This dataset here reports only soprano 1, alto 1, tenor 1, and bass 1 from the original dataset. Number of cases: 39 Variable Names: 1. Soprano: Heights of sopranos (in inches) 2. Alto: Heights of altos (in inches) 3. Tenor: Heights of tenors (in inches) 4. Bass: Heights of basses (in inches) The Data: Soprano Alto 64 65 69 62 62 72 66 68 71 65 67 66 60 67 76 61 63 74 65 67 71 66 66 66 65 63 68 63 72 67 67 62 70 65 61 65 62 66 72 65 64 70 68 60 68 65 61 73 63 66 66 65 66 68 62 66 67 65 62 64 66 70 Tenor 72 70 72 69 73 71 72 68 68 71 66 68 71 73 73 70 68 70 75 68 71 Bass 29 62 65 63 65 66 65 62 65 66 65 61 65 66 65 62 65 64 63 65 69 61 66 65 61 63 64 67 66 68 70 74 70 75 75 69 72 71 70 71 68 70 75 72 66 72 70 69 One can examine how height varies across voice range, or make comparisons of sopranos and altos and separate comparisons of tenors and basses. There is some evidence of the shortest singers reporting greater heights, possibly to avoid standing in the front row in a concert. 30 Activity suggestions Ask students to construct an appropriate graph to compare the heights of the different groups. Image: A side-by-side boxplot of each voice part provide a graphical display of this data. This could be one way to summarize it. An alternative way is to summarize with histograms. 31 8.-FIRE SEASON STATISTICS Give the students the handout given separately in Appendix A. USDA Forest Service Fire Season Statistics Objective To extrapolate information and further questions for investigation from fire season statistics. Materials for each team * copy of "Fire Season Statistics" student handout (PDF or HTML) * calculator Procedure 1. Having information about previous fire seasons can help land managers look for areas they may need to monitor in coming years. In this activity, students will look at data regarding wildland fire totals for the year 2000 as reported by the National Interagency Fire Center. 2.Organize students into groups and provide each group with a copy of the "Fire Season Statistics" student handouts and a calculator. 3.Have each group discuss the data as it is currently presented. What information is conveyed? What general conclusions can students draw? What, if any, patterns do they see? How might the data be reconfigured to illustrate different aspects of the data set? (One avenue of inquiry is suggested in the questions section on the student handout.) 4. Have groups decide how to present the information in a meaningful way. Students might consider tables, bar graphs, pie graphs, or some other way to represent the data. Based on what they find in the data, what kind of campaign would they design to reduce wildland fires? 5. What additional information would students want in this data set? What points would they like clarified? 6. As an extension, have students look at and compile weekly situation reports for each month published online by the National Interagency Coordination Center. How many fires and acres burned were there each month? Where did the largest fires occur? How do each month's totals compare to the prescribed fire 32 totals for that month? Find the reports at: http://www.cidi.org/wildfire/ Or better yet compile statistics from South Africa. Activity Answer A first step could be to collapse the data into total fires and total acres. Students' data analysis will differ depending upon what they choose to highlight. Students will likely have a number of additional questions prompted by the data set, such as: * What is each agency's jurisdiction? Is there any overlap in fires reported? * What were the data collection strategies? Were they the same for all agencies? * What systems were in place to ensure data reliability? * Specifically, what kind of fires are included in each of the categories? * What kind of fires are included under each of the other human-caused fire categories? * Why was state/private data not included for prescribed fires? Is that data available elsewhere? 33 9.-Newspaper article Give your student a sheet with the newspaper article and the questions given after its text. Teen boys forget whatever it was http://ink.news.com.au/mercury/mathguys/articles/1997/970421a1.h tm By FRANK ZELLER in London TEENAGE girls can concentrate more than three times longer than boys, according to a new study on academic performance in British high schools. While the average boy will become distracted from an independent task after only four minutes, girls have an attention span of 13 minutes, researchers have found. The nine-minute gap helps explain why girls now outperform boys in virtually all high school examinations in Britain. The researchers found that most girls happily worked on their own when asked to do so by their teachers, while boys tended to become distracted within minutes. But boys achieved much better results where the teacher took a leading role and actively involved them in the learning process, according to the book to be published in the United Kingdom this week, Can Boys Do Better? 34 One of the authors, former headmaster Peter Downes warned the decline in male academic performance had to be halted with a new approach to teaching. "Everybody has come to terms with the fact that boys are doing less well than they should," he said. "If we pretend the problem is not there, we are in danger of producing generations of disaffected young males for whom the education process is debilitating." A changing job market was increasingly challenging boys to learn "feminine skills" as traditional manual jobs were replaced by more technology. "The need for men to be strong and physical - the hunter and provider - is greatly reduced," Mr Downes said. "Although the male still has his biological function, we have got to feminise the male, in a way, to give him a new role in the world of the future." Question for learners: 1. Summarise the results reported in this article and speculate on their significance for students and teachers. 2. Speculate on how the researchers might have calculated the average concentration time. 3. What other factors influence examination results for girls and boys? 35 Teacher This article could cause debate between girls and boys in a mathematics class (or a social science class). Discussion could focus on how the concentration time was calculated and the "cause-effect" claim in the third paragraph. There are obviously many factors that influence both concentration time and examination results. Students should discuss these and then decide how significant this article is. It is important that students see the distinction between an observational study and an experiment and between a bad and a good experiment. The article is summarizing an observational study, where there statements of cause and effect can not be made. One may observe a relation but this does not imply causation. In addition to the distraction being more prevalent among boys, there are many other factors that could determine the distraction and their performance. The same story goes for girls. This article is a good opportunity to discuss these issues with students. 36 10.-Back to 2007 Community Survey (Statistics South Africa, interactive data) http://www.statssa.gov.za/ 1.-Is the unemployment rate higher among women or among men in Kwazulu Natal? 2.- Does ethnicity have any relation with labour market status? Show numerically and graphically. 3.- Does gender have any relation with labour market status? Show numerically and graphically. 4.- If you had to compare the labour market status of KwazuluNatal and other provinces of South Africa, how would you present the numbers of the Community Survey table given above? 37 11.- Maps of South Africa (Statistics South AfricaGeography) 38 39 The maps above show distribution of the population of South Africa according to some characteristics. Do you see any relations among the three variables described? Explain what you are basing your conclusions on. Which graphs are you using to come up with a conclusion? 40 12.- The problem of False Positives http://www.skeptics.org.nz/SK:VIEWARTICLE:1001.7019:waDeptTOC.1, A1177 Mass screening programmes have generated considerable controversy in this country. But these programmes have inherent limitations, which need to be better understood In 1996 the Skeptical Inquirer published an article by John Allen Paulos on health statistics. Among other things this dealt with screening programmes. Evaluating these requires some knowledge of conditional probabilities, which are notoriously difficult for humans to understand. Paulos presented his statistics in the form of a table; a modified version of this is shown in the table below. Have the condition Totals Test Positive 10,980 Test Negative 989,020 Totals 1,000,000 Table 1 990 10 Do not have the condition 9,990 989,010 1,000 999,000 Of the million people screened, one thousand (0.1%) will have the condition. Of these 1% will falsely test negative (10) and 99% will correctly exhibit the condition. So far it looks good, but 1% of those who do not have the condition also test positive, so that the total number who test positive is 10980. Remember that this is a very accurate test. So what are the odds that a random person who is told by their doctor that s/he has tested positive, actually has the condition? The answer is 990/10980 or 9%. In this hypothetical case the test is 99% accurate, a much higher accuracy rate than any practical test available for mass screening. Yet over 90% of those who test positive have been diagnosed incorrectly. In the real world (where tests must be cheap and easy to run) a very good test might achieve 10% false negatives and positives. To some extent the total percentage of false results is fixed, 41 but screening programmes wish to reduce the number of false negatives to the absolute minimum; in some countries they could be sued for failing to detect the condition. This can only be done by increasing the chance of false positives or inventing a better test. Any practical test is likely to have its results swamped with false positives. Consider a more practical example where the base rate is the same as previously, but there are 10% false negatives and positives, ie the test is 90% accurate. Again 1 million people are tested (see Table 2 below). Have the condition Do not have the condition Totals Test Positive 900 99,000 100,800 Test Negative 100 889,100 899,200 Totals 1,000 999,000 1,000,000 Table 2. Base rate is 0.1%. Level of false positives=10%; level of false negatives=10% This time the total number testing positive is 100800. But nearly one hundred thousand of them do not have the condition. The odds that any person who tested positive actually has the condition is 900/100800, or a little under 1%. This time, although 90% of these people have been correctly diagnosed, 99% of those who test positive have been diagnosed incorrectly. In both these cases the incidence of the condition in the original population was 0.1%. In the first example the screened population testing positive had an incidence two orders of magnitude higher than the original population, but this was unrealistic. In the second example those testing positive in the screened population had an incidence one order of magnitude higher than the general population. This is what a good mass screening test can do – to raise the incidence of the condition by one order of magnitude above the general population. However any person who tests positive is unlikely to have the condition and all who test positive must now be further investigated with a better test. So screening programmes should not be aimed at the general population, unless the condition has a very high incidence. Targeted screening does not often improve the accuracy of the tests, but it aims at a sub-population with a higher incidence 42 of the condition. For example, screening for breast cancer (a relatively common condition anyway) is aimed at a particular age group. Humans find it very difficult to assess screening, and doctors (unless specifically trained) are little better than the rest of the population. It has been shown fairly convincingly that data are most readily understood when presented in tables as above. For example the data in Table 3 was presented to doctors in the UK. Suppose they had a patient who screened positive; what was the probability that that person actually had the condition? When presented with the raw data, 95% of them gave an answer that was an order of magnitude too large. When shown the table (modified here for consistency with previous examples) about half correctly assessed the probability of a positive test indicating the presence of the disease. Have the condition Do not have the condition Totals Test Positive 8,00 99,000 107,000 Test Negative 2,000 891,000 893,000 Totals 10,000 990,000 1,000,000 Table 3. Base rate is 1%. False negative rate=20%; False positive rate=10% This time the total number who test positive is 107 000. But nearly one hundred thousand of them do not have the condition. The odds that any person who tested positive actually has the condition are 8000/107 000 or about 7.5%. Now remember that nearly half the UK doctors, even when shown this table could not deduce the correct result. If your doctor suggests you should have a screening test, how good is this advice? Patients are supposed to be supplied with information so that they can make an informed decision. Anybody who presents for a screening test in NZ may find it impossible to do this. My wife attempted to get the data on breast screening from our local group. She had to explain the meaning of “false negative”, “false positive” and “base rate”. The last is a particularly slippery concept. From UK figures the chances of a 40-year-old woman developing breast cancer by the age of 60 is nearly 4% (this is the commonest form of cancer in women). However, when a 43 sample of women in the 40-60 age group are screened, the number who should test positive is only about 0.2%. Only when they are screened each year, will the total of correct positives approach 4%. The number of false positives (again using overseas figures) is about 20 times the number of correct positives so a women in a screening programme for 20 years will have a very good chance of at least one positive result, but a fairly low probability of actually having breast cancer. I do not think NZ women are well prepared for this. The Nelson group eventually claimed that the statistics my wife wanted on NZ breast cancer screening did not seem to be available. But, they added, “we (the local lab) have never had a false negative.” From the recent experience of a close friend, who developed a malignancy a few months after a screening test, we know this to be untrue. What they meant was that they had never seen a target and failed to diagnose it correctly as a possible malignancy requiring biopsy. This may have been true but it is no way to collect statistics. Screening for breast cancer is generally aimed at the older age group. In the US a frequently quoted figure is that a woman now has a one in eight chance of developing breast cancer, which is higher than in the past. This figure is correct but it is a lifetime incidence risk; the reason it has risen is that on average women are living longer. The (breast cancer) mortality risk for women in the US is one in 28. A large number who develop the condition do so very late in life and die of some other condition before the breast cancer proves fatal. Common Condition Breast cancer is a relatively common condition and would appear well suited for a screening programme. The evaluation of early programmes seemed to show they offered considerable benefit in reducing the risk of death. However later programmes showed less benefit. In fact as techniques improved, screening apparently became less effective. This caused some alarm and a study published in 1999 by the Nordic Cochrane Centre in Copenhagen looked at programmes world wide, and attempted to better match screened populations with control groups. The authors claimed that women in screening programmes had no better chance of survival than unscreened populations. The reactions of those 44 running screening programmes (including those in NZ) were to ignore this finding and advise their clients to do the same. If there are doubts as to the efficacy of screening for breast cancer, there must be greater doubts about screening for other cancers in women, for other cancers are rarer. Any other screening programme should be very closely targeted. Unfortunately the risk factors for a disease may make targeting difficult. In New Zealand we have seen cases where people outside the target group have asked to be admitted into the screening programme, so they also “can enjoy the benefits”. Better education is needed. Late-onset diabetes is more common among Polynesians than among New Zealanders in general, and Polynesians have very sensibly accepted that this is true. Testing Polynesians over a certain age for diabetes makes sense, particularly as a test is quick, cheap and easy to apply. Testing only those over a certain body mass would be even more sensible but may get into problems of political correctness. Cervical cancer is quite rare so it is a poor candidate for a mass screening programme aimed at a large percentage of the female population. The initial screening is fast and cheap. If the targeted group has an incidence that is one order of magnitude higher than the general population, then the targeting is as good as most tests. Screening the whole female population for cervical cancer is a very dubious use of resources. My wife and I were the only non-locals travelling on a bus in Fiji when we heard a radio interview urging "all women" to have cervical screening done regularly. The remarkably detailed description of the test caused incredible embarrassment to the Fijian and Indian passengers; we had the greatest difficulty in concealing our amusement at the reaction. The process was subsidised by an overseas charity. In Fiji, where personal hygiene standards are very high, and (outside Suva) promiscuity rates pretty low, and where most people pay for nearly all health procedures, this seemed an incredibly poor use of international aid. Assessment Impossible Screening for cervical cancer has been in place in NZ for some time. Unfortunately we cannot assess the efficacy of the 45 programme because proper records are not available. An attempt at an assessment was defeated by a provision of the Privacy Act. The recent case of a Gisborne lab was really a complaint that there were too many false negatives coming from a particular source. However this was complicated by a general assumption among the public and media that it is possible to eliminate false negatives. It should be realised that reducing false negatives can only be achieved by increasing the percentage of false positives. As can be seen from the data above, it is false positives that bedevil screening programmes. Efforts to sue labs for false negatives are likely to doom any screening programme. To some extent this has happened in the US with many labs refusing to conduct breast xray examinations, as the legal risks from the inevitable false negatives are horrendous. Large sums are being spent in NZ on screening programmes; taxation provides the funds. Those running the programmes are convinced of their benefits, but it is legitimate to ask questions. Is this spending justified? Some Post-Scripts: January 15 2000 New Scientist P3: Ole Olsen & Peter Gøtzsche of the Nordic Cochrane Centre in Copenhagen published the original meta-analysis of seven clinical trials in 2000. The resulting storm of protest, particularly from cancer charities, caused them to take another look. They have now reached the same conclusion: mammograms do not reduce breast cancer deaths and are unwarranted. October 2001: In recent TV interviews some people concerned with breast cancer screening in NZ were asked to comment on this meta-analysis. Once again the NZ commentators stated firmly that they were certain that screening programmes in NZ "had saved lives" but suggested no evidence to support their view. March 23 2002 New Scientist P6: The International Agency for Research on Cancer (IARC) funded by the WHO claims to have reviewed all the available evidence. They conclude that screening women below the age of 50 is not worthwhile. However, screening women aged from 50-69 every two years reduces the risk of dying of breast cancer by 35%. 46 According to New Scientist, the figures from Britain are that of 1000 women aged 50, 20 will get breast cancer by the age of 60 (2%); of these six will die. Screening every two years would cut the death rate to four. [It is obvious that these are calculations, not the result of a controlled study!] The IARC states that organised programmes of manual breast examination do not bring survival benefits (they call for more studies on these). If NZ has similar rates then screening programmes aimed at 50-60 year old women should save approximately 50 lives per annum. Teacher’s notes: This article helps discuss the issues of false positives in screening tests. You may also bring up the problem of screening for testosterone and other performance enhancing drugs that athletes take. A separate article is given separately with another nice table and graph (perhaps for more advanced students). But most importantly, the tables presented in this article can be used to play with different types of probability : Probability of A given B, probability of A and B and probability of A by itself (or probability of B by itself). Conditional probability (probability of A given B) is often misunderstood in the media and in many reports, and it is important that learners are aware of that. 47