3. Activity: Graph Detective

advertisement
2/17/08
International Statistical Literacy Competition of the ISLP
http://www.stat.auckland.ac.nz/~iase/islp/competition
Training package 2
This document contains more activities to start training students for the international statistical literacy
competition, some articles and some small lessons. You and/or your students may choose the activities
that are more adequate for the level of your students at the moment. The questions here are not the
“typical” competition questions, rather they are building blocks, but will help train to answer the more
comprehensive ones. Besides, you can choose to do them here and there, in an idle moment in your
class, for fun. Of course, keep doing whatever you are doing in your data analysis and chance
curriculum.
As you know, most of these resources come from resources prepared by others and posted in the ISLP
web site. We select them to convey the building blocks of the competition and also to promote
resources of countries.
1.- Activity:
Sandwich Problem (Warm-Up)
http://www.doe.virginia.gov/VDOE/Instruction/Elem_M/mprobstat.ht
ml
Format:
Large Group
Objectives:
Participants will develop an appreciation for graphical representations of data
and the need for statistics.
Materials:
Sandwich Problem Narrative Activity Sheet, and Sandwich Problem Graph
Activity Sheet
Time Required: 10 minutes
Directions:
1. Without informing the participants, break them into two groups (front of the audience versus
back of the audience).
2. Distribute the two sandwich problem activity sheets FACE DOWN; distribute the graphical
representation of the sandwich data to one group; and the narrative version of the data to
the other group.
3. Tell the groups that this is a test on the sandwich data and that you are going to keep track
of the people who raise their hand first to answer the questions. Then ask them to turn
over their papers and respond to the following questions. Keep track of those who answer
first, expecting that those with the graphical answer will respond first. Ask the following
three questions. Call on the first person who raises a hand to answer the question.
1. What sandwich was preferred more by people than any other sandwich?
2. What sandwich types were preferred by only two people?
3. What sandwich type did Oliver prefer?
4. After reinforcing that one part of the room was doing better than the others in answering the
questions, show the entire group a copy of both types of data. This is a good place to
begin the discussion of why statistics is important in this “information age.”
Distribute the extra copies of activity sheets so that each participant has a copy of both the
graph and narrative sandwich problem activity sheets.
Virginia Department of Education
Sandwich Problem – Page
2
5.
The Lunch Bunch’s Favorites
Laura had peanut butter and jelly. Kenny had
plain jelly. Oliver also had plain jelly. Katie and
David had plain peanut butter. Oh, I forgot to
mention that Steven, Isabel, and Sam also had
peanut butter and jelly. Kristen had peanut
butter and fluff. Mariko had plain fluff while
Sally and Ty had jelly and fluff.
Number Who Preferred Sandwich
The Lunch Bunch’s Favorites
Isabel
Steven
David
Oliver
Katie
Kenny
Mariko
Laura
Plain
Jelly
Plain
Fluff
Peanut
Butter
& Jelly
Plain
Peanut
Butter
Sam
Ty
Kristen
Peanut
Butter
& Fluff
Sandwich Types
Sally
Jelly
&
Fluff
2. Activity:
Sixth Grade Mystery Data
http://www.doe.virginia.gov/VDOE/Instruction/Elem_M/mprobstat.ht
ml
Format:
Small Groups; Whole Group
Objectives:
Participants will develop an understanding of the relationship between the
question and the analysis of the data.
Materials:
Copies of Sixth Grade Mystery Data, Copies of Questions, Copies of Graphs
A, B, and C
Time Required: 30 minutes
Directions:
1. Divide the participants into small groups of four to five. Give each group a copy of the Sixth
Grade Mystery Data and a copy of the questions to be answered. Tell them they have 15
minutes to answer the questions and to discuss their solutions.
2. After the small groups have completed the task, have the entire group share their solutions
and how they arrived at those solutions. Focus the discussion on the relationship of the
question to the data.
3. Discuss Graph A (Ice Cream Preferences). Have participants share questions about this
graph that could be asked of K-2 students.
4. Discuss Graph B (Number of Cavities). Have participants share questions about this graph
that could be asked of students in grades 3-5.
5. Discuss Graph C (Relationship of Height to Age). Have participants share questions about
this graph that could be asked of students in grades 6-8.
Sixth Grade Mystery Data
Look at the graphs on the next pages. Each graph shows
something about a classroom of sixth graders.
1.
Which of the five graphs do you think shows:
a. The number of cavities the sixth graders have?
b. The number of people in the sixth graders’
families?
c. The ages of the sixth graders’ mothers?
d. The heights of the sixth graders in inches?
2.
Why do you think the graph you picked for d is the
one that shows the heights of sixth graders? Why
do you think the other graphs don’t show the sixth
graders’ heights?
3.
One of the graphs was not selected to answer
question one above. What do you think this data
display might represent? Why?
Sixth Grade Mystery Data
Graph 1
0
1
x
x
2
x
x
x
x
x
x
3
x
x
x
x
x
x
x
4
x
x
x
x
x
5
x
x
6
7
x
63
x
x
64
x
x
x
65
x
66
x
x
51
x
x
x
52
x
x
x
53
x
x
x
54
x
8
x
9
Graph 2
x
58
59
x
60
x
61
x
x
62
x
x
67
x
x
x
68
x
69
x
x
70
x
x
55
x
x
x
56
x
57
x
58
x
71
x
x
72
x
73
Graph 3
x
46
47
x
x
48
x
x
49
x
50
8
Graph 4
x
x
x
x
x
x
x
x
x
x
0
1
x
x
x
x
x
x
x
x
x
x
x
x
x
2
3
4
5
6
7
8
Graph 5
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
9
Graph A
Ice Cream Preferences
Vanilla
Chocolate
Mint Chip
Cookies &
Cream
Other
= 1 student
Graph B
Number of Cavities
X
X
X
X
X
X X
X
X X X X X
X X X X X X X
0 1 2 3 4 5 6
Graph C
Height in Inches
Relationship of Height to Age
75
70
65
60
55
50
45 
40 
35
30
25
20
7





8



































9
10 11 12 13 14 15 16 17 18 19 20
Age in Years
3. Activity:
Graph Detective
http://www.doe.virginia.gov/VDOE/Instruction/Elem_M/mprobstat.ht
ml
Format:
Pairs or small groups
Objectives:
Participants will apply the skills they learned from the previous activities to
analyze graphs for missing attributes.
Materials:
Set of graphs with missing information
Time Required: 20 minutes
Directions:
1. To wrap up the session, give each group a set of three graphs and
conclusions and ask them to discuss whether the conclusion is
accurate and what factors about the graph may have lead to
inaccurate conclusions. Key factors in the misleading graphs
include the following:



Graph 1: Missing years – A trend cannot be seen by examining only two years. USA
Today uses a lot of these graphs to compare two years. This graph compares
unemployment in 1992 and 1998. Although 1998 may be lower than 1992, we should
not assume that 1999 will be lower than 1998 or that the trend between 1992 and 1998
was downward. We need to see the other years to determine if there is a trend over
time or if 1998 is a fluctuation.
Graph 2: Broken scales – Often graphs imply larger differences than are true because
the scale is broken. In this graph, because the scale is broken, it appears that there
were twice as many births in July than there were in June.
Graph 3: Size Distortions – Picture graphs using objects to demonstrate change can be
misleading because the size of the objects may not truly represent the relative
numerical value. This graph shows the relative earnings of men and women. Women
earn approximately 70¢ for every dollar that men earn; yet, the graph implies that men
earn nearly three times as much as women because of the relative size of the threedimensional bars.
Would You Draw the Same Conclusion?
Unemployment Rate
Unemployment Rates (1992 - 1998)
Conclusion: Unemployment rates have fallen
steadily since 1992 and will continue to fall in the
near future.
10
8
6
4
Group’s Conclusion:
2
0
1992
1998
Year
Conclusion: The number of births doubled between
June and July.
340
330
320
310
300
290
280
270
260
N
ov
p
Se
l
Ju
M
ay
Ja
M
ar
Group’s Conclusion:
n
Births (Thousands)
Number of Births per Month
Month
Earning Power of Men versus Women
100 cents
70 cents
$
Women
$
Conclusion: Men make almost three times as much as
women. The bar for men has nearly three times as much
volume.
Group’s Conclusion
Men
15
4. Activity:
What’s In the Bag?
Format:
Pairs
Objectives:
Participants will conduct simple probability experiments to predict outcomes.
Materials:
Paper bags, color tiles, recording sheet
Time Required: 20 minutes
Directions:
1. Organize participants into pairs.
2. Give each pair a paper bag with 10 color tiles inside (7 blue and 3 red).
3. Pairs will pull out one tile (without looking into the bag) and record the color on their
recording sheet. The tile should be returned to the bag. Each pair will pull out tiles
following this process a total of ten times
4. As pairs finish, have them record their results on a class graph at the front of the room.
5. When the class data is complete, have pairs look at the total number of blue and red tiles
pulled and then make their prediction about the number of blue and red tiles in the bag.
6. After everyone has had a chance to predict, discuss the predictions and reasons why.
7. Have pairs look into their bags and record the actual results.
8. Discuss why their predictions may have differed from the actual number. What was helpful
in making their predictions?
9. Repeat using bags with four colors of tiles. Discuss differences noted.
16
What’s In the Bag?
http://www.doe.virginia.gov/VDOE/Instruction/Elem_M/mprobstat.ht
ml
Pick one tile from the paper bag. Record the
color on the table below. Put the tile back into
the bag. Choose another tile. Repeat this
process 9 more times.
Blue
Red
My prediction:
There are ________ blue tiles.
There are _________ red tiles.
Let’s try again with four colors!
Actual results:
blue tiles ____________
red tiles _____________
Blue
Red
Yellow
Green
My prediction:
Actual results:
There are ________ blue tiles.
blue tiles ____________
There are _________ red tiles.
red tiles _____________
There are _________ yellow tiles.
yellow tiles ___________
There are _________ green tiles.
green tiles _____
Page 15
– Page 18
Tree Diagrams
You are trying to decide which pizza to order for
dinner. Your choices for crust are: regular,
thin, and deep dish. You only want one topping
and will either choose pepperoni or sausage.
Construct a tree diagram to show the
possibilities you have from which to choose
one crust with one topping.
How would your sample space change if you
added bacon as a third topping choice?
19
5. Activity:
The Real Meal Deal
Format:
Pairs
Objective:
Participants will use a tree diagram and the Fundamental Counting Principle
to determine the sample space of an event.
Materials:
Real Meal Restaurant Menu, chart paper for tree diagrams
Time Required: 20 minutes
Background:
The Fundamental Counting Principle is a method for finding the number of
ways that two or more events can occur by multiplying the number of ways
that each event can occur. The Principle states that, if successive choices
are made, then the total number of choices is the product of the number of
choices at each stage.
For example, if you have 3 shirts and 2 pairs of jeans, then you have a total 6
different outfits to wear. Each shirt may be worn with each pair of jeans.
There are 3 shirts times 2 pairs of jeans for a total of 6 outfits.
A tree diagram is a visual way to see all of the outcomes.
SHIRTS
JEANS
OUTFITS (OUTCOMES)
blue denims
plaid shirt, blue denims
plaid shirt
black denims
plaid shirt, black denims
blue denims
red shirt, blue denims
red shirt
black denims
red shirt, black denims
blue denims
blue shirt, blue denims
blue shirt
black denims
blue shirt, black denims
Directions:
1. Based on the menu of the Real Meal Restaurant, participants will use the Fundamental
Counting Principle to determine the number of different meals that can be served.
2. Based on customer wishes, participants will determine and display the choices using a tree
diagram.
Page
20
REAL MEAL RESTAURANT
SANDWICHES:
Ham and Turkey Club
Rachael on Rye
Sliced BBQ Pork
Hamburger
Deli Cold Cut Special
BLT
FRENCH FRIES: small
medium
SALADS:
DRESSINGS: Ranch
French
Creamy Italian
Garden Salad
Chef Salad
Cobb Salad
BEVERAGES:
Soft Drinks:
Coke
Pepsi
Sprite
Tea:
Coffee:
Milk: regular
small
large
medium
large
medium
medium
large
large
low-fat
1.
How many possible meals can be served at the Real Meal, choosing only one item from
each category?
2.
How many choices are there if a customer wants the following:
a. a soft drink, sandwich, and fries? Display the choices with a tree diagram.
b. a sandwich, fries, and milk? Display the choices with a tree diagram.
c. salad with dressing and tea? Display the choices with a tree diagram.
d. a sandwich, salad with dressing, and coffee? Display the choices with a tree
diagram.
Page
21
6.The JellyBlubber Colony
Objective:
Materials:
Time:
Instructions:
This activity introduces the Simple Random Sample (SRS) to students, and
shows why this process helps to get an unbiased sample statistic.
Relying on our perceptions can often be deceiving. In this exercise students
are asked to determine the average length of a jellyblubber (a recently
discovered marine species) using a variety of techniques. The student will
learn that a Simple Random Sample (SRS) is the most accurate method of
determining this parameter, and that intuition can be deceptive.
One ‘The JellyBlubber Colony’ worksheet and one ruler per student.
1 period
Pass out the worksheet upside down. Ask students to not look at the sheet
until they are instructed.
Tell the students a story about the recently discovered colony of
jellyblubbers, a new marine species, and that our task is to try to determine
the average length (measured horizontally) of a blubber.
Allow the students to look at the Colony for five seconds. They will then
estimate the average length of a blubber. The teacher plots the students’
guesses as a dotplot, then leads the entire class in a discussion of the dataset.
The student is now told to choose a representative sample of 10 blubbers.
Once they have made their choice, they measure the length of each blubber
and calculate the mean length. The teacher plots these values on a new
dotplot, followed by a whole class discussion of dataset.
Now the student takes a SRS of 10 blubbers, as follows. Each blubber is
numbered from 1 to 100. They generate 10 random numbers from a random
number table in the range 1 to 100. They calculuate the mean length of
those ten blubbers. The teacher plots these means on a third dotplot. Each
dotplot must have the same scale for comparison purposes.
The class discusses the difference in the distributions - location, spread,
outliers, etc.
The actual average length of a blubber is 19.4 cm. Which method gave the
best estimate? How accurate was it? How much spread was there around
the correct value?
Discussion:
A student decides to generate a random sample by closing her eyes and
pointing at the sheet of blubbers randomly. She choses the blubber to which
her finger is closest. Comment on this method of generating a SRS.
Extension:
A similar exercise can be conducted by putting a number of pieces of string
Page
22
References:
Page
of varying lengths into a bag and having students pull out a ‘random
sample’ of lengths of string. Since a longer piece is more likely to be
selected than a shorter one, the sample generated in this fashion is likely to
give a biased result - one that is too large.
Statistics, Concepts and Controversies, 4th Edition, David S. Moore
23
The JellyBlubber Colony
Blubber #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Page
Length
9
5
9
33
22
5
10
40
20
10
12
5
8
41
5
32
5
10
21
20
34
5
32
5
9
40
5
49
9
41
5
20
43
7
20
10
5
14
15
10
41
Blubber #
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
24
Length
35
37
9
25
5
10
9
45
40
8
20
25
10
8
37
8
20
13
34
42
40
40
40
30
20
7
5
25
17
8
8
5
13
42
10
5
10
27
30
10
42
42
43
44
45
46
47
48
49
50
5
17
15
40
5
30
8
5
40
92
93
94
95
96
97
98
99
100
6
10
25
7
40
8
5
40
20
The JellyBlubber Colony
|---------|---------|---------|---------|---------|---------|---------|--------|
0
5
10
15
20
25
30
35
40
|---------|---------|---------|---------|---------|---------|---------|---------|
0
Page
5
10
15
20
25
25
30
35
40
|---------|---------|---------|---------|---------|---------|---------|---------|
0
5
10
15
20
25
30
35
40
|---------|---------|---------|---------|---------|---------|---------|--------0
Page
5
10
15
20
26
25
30
35
40
1
2
The JellyBlubber Colony
4
5
9
12
10
7
11
6
3
13
8
18
15
19
16
14
17
25
21
23
22
26
24
20
34
30
36
33
27
35
28
29
31
47
46
32
37
41
44
42
48
43
39
38
45
40
49
52
50
51
59
61
56
60
58
57
55
54
53
70
65
68
69
63
62
64
66
71
74
73
67
72
79
80
75
76
78
77
90
89
83
Page 81
82
27
84
85
88
99
87
Virginia Department of Education
Real Meal Restaurant Activity Sheet – Page 28
7. HEIGHTS OF SINGERS IN THE NY CHORAL SOCIETY IN 1979
Description:
http://lib.stat.cmu.edu/DASL/Stories/SingerHeights.html
Heights of singers in the NY Choral Society in 1979. The data
set below contains self-reported, to the nearest inch, heights
of singers in the New York Choral Society in 1979. Voice parts
in order from highest pitch to lowest pitch are Soprano, Alto,
Tenor, Bass. The first two are female voices and the last two
are male voices. The original dataset included two divisions for
each voice part. This dataset here reports only soprano 1, alto
1, tenor 1, and bass 1 from the original dataset.
Number of cases: 39
Variable Names:
1. Soprano: Heights of sopranos (in inches)
2. Alto: Heights of altos (in inches)
3. Tenor: Heights of tenors (in inches)
4. Bass: Heights of basses (in inches)
The Data:
Soprano Alto
64
65
69
62
62
72
66
68
71
65
67
66
60
67
76
61
63
74
65
67
71
66
66
66
65
63
68
63
72
67
67
62
70
65
61
65
62
66
72
65
64
70
68
60
68
65
61
73
63
66
66
65
66
68
62
66
67
65
62
64
66
70
Tenor
72
70
72
69
73
71
72
68
68
71
66
68
71
73
73
70
68
70
75
68
71
Bass
29
62
65
63
65
66
65
62
65
66
65
61
65
66
65
62
65
64
63
65
69
61
66
65
61
63
64
67
66
68
70
74
70
75
75
69
72
71
70
71
68
70
75
72
66
72
70
69
One can examine how height varies across voice range, or make
comparisons of sopranos and altos and separate comparisons of
tenors and basses. There is some evidence of the shortest
singers reporting greater heights, possibly to avoid standing in
the front row in a concert.
30
Activity suggestions
Ask students to construct an appropriate graph to compare the
heights of the different groups.
Image:
A side-by-side boxplot of each voice part provide a graphical
display of this data. This could be one way to summarize it. An
alternative way is to summarize with histograms.
31
8.-FIRE SEASON STATISTICS
Give the students the handout given separately in Appendix A.
USDA Forest Service
Fire Season Statistics
Objective
To extrapolate information and further questions for
investigation from fire season statistics.
Materials for each team
* copy of "Fire Season Statistics" student handout (PDF or
HTML)
* calculator
Procedure
1. Having information about previous fire seasons can help
land managers look for areas they may need to monitor in coming
years. In this activity, students will look at data regarding
wildland fire totals for the year 2000 as reported by the
National Interagency Fire Center.
2.Organize students into groups and provide each group with a
copy of the "Fire Season Statistics" student handouts and a
calculator.
3.Have each group discuss the data as it is currently
presented. What information is conveyed? What general
conclusions can students draw? What, if any, patterns do they
see? How might the data be reconfigured to illustrate different
aspects of the data set? (One avenue of inquiry is suggested in
the questions section on the student handout.)
4. Have groups decide how to present the information in a
meaningful way. Students might consider tables, bar graphs, pie
graphs, or some other way to represent the data. Based on what
they find in the data, what kind of campaign would they design
to reduce wildland fires?
5. What additional information would students want in this
data set? What points would they like clarified?
6. As an extension, have students look at and compile weekly
situation reports for each month published online by the
National Interagency Coordination Center. How many fires and
acres burned were there each month? Where did the largest fires
occur? How do each month's totals compare to the prescribed fire
32
totals for that month? Find the reports at:
http://www.cidi.org/wildfire/ Or better yet compile statistics
from South Africa.
Activity Answer
A first step could be to collapse the data into total fires and
total acres. Students' data analysis will differ depending upon
what they choose to highlight.
Students will likely have a number of additional questions
prompted by the data set, such as:
* What is each agency's jurisdiction? Is there any overlap
in fires reported?
* What were the data collection strategies? Were they the
same for all agencies?
* What systems were in place to ensure data reliability?
* Specifically, what kind of fires are included in each of
the categories?
* What kind of fires are included under each of the other
human-caused fire categories?
* Why was state/private data not included for prescribed
fires? Is that data available elsewhere?
33
9.-Newspaper article
Give your student a sheet with the newspaper article and the
questions given after its text.
Teen boys forget whatever it was
http://ink.news.com.au/mercury/mathguys/articles/1997/970421a1.h
tm
By FRANK ZELLER
in London
TEENAGE girls can concentrate more than three times longer than
boys, according to a new study on academic performance in
British high schools.
While the average boy will become distracted from an independent
task after only four minutes, girls have an attention span of 13
minutes, researchers have found.
The nine-minute gap helps explain why girls now outperform boys
in virtually all high school examinations in Britain.
The researchers found that most girls happily worked on their
own when asked to do so by their teachers, while boys tended to
become distracted within minutes.
But boys achieved much better results where the teacher took a
leading role and actively involved them in the learning process,
according to the book
to be published in the United Kingdom this week, Can Boys Do
Better?
34
One of the authors, former headmaster Peter Downes warned the
decline in male academic performance had to be halted with a new
approach to teaching.
"Everybody has come to terms with the fact that boys are doing
less well than they should," he said.
"If we pretend the problem is not there, we are in danger of
producing generations of disaffected young males for whom the
education process is
debilitating."
A changing job market was increasingly challenging boys to learn
"feminine skills" as traditional manual jobs were replaced by
more technology.
"The need for men to be strong and physical - the hunter and
provider - is
greatly reduced," Mr Downes said.
"Although the male still has his biological function, we have
got to feminise the male, in a way, to give him a new role in
the world of the future."
Question for learners:
1. Summarise the results reported in this article and speculate
on their significance for students and teachers.
2. Speculate on how the researchers might have calculated the
average concentration time.
3. What other factors influence examination results for girls
and boys?
35
Teacher
This article could cause debate between girls and boys in a
mathematics class (or a social science class). Discussion could
focus on how the concentration time was calculated and the
"cause-effect" claim in the third paragraph. There are obviously
many factors that influence both concentration time and
examination results. Students should discuss these and then
decide how significant this article is.
It is important that students see the distinction between an
observational study and an experiment and between a bad and a
good experiment. The article is summarizing an observational
study, where there statements of cause and effect can not be
made. One may observe a relation but this does not imply
causation. In addition to the distraction being more prevalent
among boys, there are many other factors that could determine
the distraction and their performance. The same story goes for
girls. This article is a good opportunity to discuss these
issues with students.
36
10.-Back to 2007 Community Survey (Statistics South
Africa, interactive data)
http://www.statssa.gov.za/
1.-Is the unemployment rate higher among women or among men in
Kwazulu Natal?
2.- Does ethnicity have any relation with labour market status?
Show numerically and graphically.
3.- Does gender have any relation with labour market status?
Show numerically and graphically.
4.- If you had to compare the labour market status of KwazuluNatal and other provinces of South Africa, how would you present
the numbers of the Community Survey table given above?
37
11.- Maps of South Africa (Statistics South AfricaGeography)
38
39
The maps above show distribution of the population of South
Africa according to some characteristics. Do you see any
relations among the three variables described? Explain what you
are basing your conclusions on. Which graphs are you using to
come up with a conclusion?
40
12.- The problem of False Positives
http://www.skeptics.org.nz/SK:VIEWARTICLE:1001.7019:waDeptTOC.1,
A1177
Mass screening programmes have generated considerable
controversy in this country. But these programmes have inherent
limitations, which need to be better understood
In 1996 the Skeptical Inquirer published an article by John
Allen Paulos on health statistics. Among other things this dealt
with screening programmes. Evaluating these requires some
knowledge of conditional probabilities, which are notoriously
difficult for humans to understand.
Paulos presented his statistics in the form of a table; a
modified version of this is shown in the table below.
Have the condition
Totals
Test Positive
10,980
Test Negative
989,020
Totals
1,000,000
Table 1
990
10
Do not have the condition
9,990
989,010
1,000
999,000
Of the million people screened, one thousand (0.1%) will have
the condition. Of these 1% will falsely test negative (10) and
99% will correctly exhibit the condition. So far it looks good,
but 1% of those who do not have the condition also test
positive, so that the total number who test positive is 10980.
Remember that this is a very accurate test. So what are the odds
that a random person who is told by their doctor that s/he has
tested positive, actually has the condition? The answer is
990/10980 or 9%.
In this hypothetical case the test is 99% accurate, a much
higher accuracy rate than any practical test available for mass
screening. Yet over 90% of those who test positive have been
diagnosed incorrectly.
In the real world (where tests must be cheap and easy to run) a
very good test might achieve 10% false negatives and positives.
To some extent the total percentage of false results is fixed,
41
but screening programmes wish to reduce the number of false
negatives to the absolute minimum; in some countries they could
be sued for failing to detect the condition. This can only be
done by increasing the chance of false positives or inventing a
better test. Any practical test is likely to have its results
swamped with false positives.
Consider a more practical example where the base rate is the
same as previously, but there are 10% false negatives and
positives, ie the test is 90% accurate. Again 1 million people
are tested (see Table 2 below).
Have the condition
Do not have the condition
Totals
Test Positive 900
99,000
100,800
Test Negative 100
889,100
899,200
Totals
1,000
999,000
1,000,000
Table 2. Base rate is 0.1%. Level of false positives=10%; level
of false negatives=10%
This time the total number testing positive is 100800. But
nearly one hundred thousand of them do not have the condition.
The odds that any person who tested positive actually has the
condition is 900/100800, or a little under 1%. This time,
although 90% of these people have been correctly diagnosed, 99%
of those who test positive have been diagnosed incorrectly.
In both these cases the incidence of the condition in the
original population was 0.1%. In the first example the screened
population testing positive had an incidence two orders of
magnitude higher than the original population, but this was
unrealistic. In the second example those testing positive in the
screened population had an incidence one order of magnitude
higher than the general population.
This is what a good mass screening test can do – to raise the
incidence of the condition by one order of magnitude above the
general population. However any person who tests positive is
unlikely to have the condition and all who test positive must
now be further investigated with a better test.
So screening programmes should not be aimed at the general
population, unless the condition has a very high incidence.
Targeted screening does not often improve the accuracy of the
tests, but it aims at a sub-population with a higher incidence
42
of the condition. For example, screening for breast cancer (a
relatively common condition anyway) is aimed at a particular age
group.
Humans find it very difficult to assess screening, and doctors
(unless specifically trained) are little better than the rest of
the population. It has been shown fairly convincingly that data
are most readily understood when presented in tables as above.
For example the data in Table 3 was presented to doctors in the
UK. Suppose they had a patient who screened positive; what was
the probability that that person actually had the condition?
When presented with the raw data, 95% of them gave an answer
that was an order of magnitude too large. When shown the table
(modified here for consistency with previous examples) about
half correctly assessed the probability of a positive test
indicating the presence of the disease.
Have the condition
Do not have the condition
Totals
Test Positive 8,00
99,000
107,000
Test Negative 2,000
891,000
893,000
Totals
10,000
990,000
1,000,000
Table 3. Base rate is 1%. False negative rate=20%; False
positive rate=10%
This time the total number who test positive is 107 000. But
nearly one hundred thousand of them do not have the condition.
The odds that any person who tested positive actually has the
condition are 8000/107 000 or about 7.5%. Now remember that
nearly half the UK doctors, even when shown this table could not
deduce the correct result. If your doctor suggests you should
have a screening test, how good is this advice?
Patients are supposed to be supplied with information so that
they can make an informed decision. Anybody who presents for a
screening test in NZ may find it impossible to do this. My wife
attempted to get the data on breast screening from our local
group. She had to explain the meaning of “false negative”,
“false positive” and “base rate”. The last is a particularly
slippery concept. From UK figures the chances of a 40-year-old
woman developing breast cancer by the age of 60 is nearly 4%
(this is the commonest form of cancer in women). However, when a
43
sample of women in the 40-60 age group are screened, the number
who should test positive is only about 0.2%. Only when they are
screened each year, will the total of correct positives approach
4%.
The number of false positives (again using overseas figures) is
about 20 times the number of correct positives so a women in a
screening programme for 20 years will have a very good chance of
at least one positive result, but a fairly low probability of
actually having breast cancer. I do not think NZ women are well
prepared for this.
The Nelson group eventually claimed that the statistics my wife
wanted on NZ breast cancer screening did not seem to be
available. But, they added, “we (the local lab) have never had a
false negative.” From the recent experience of a close friend,
who developed a malignancy a few months after a screening test,
we know this to be untrue. What they meant was that they had
never seen a target and failed to diagnose it correctly as a
possible malignancy requiring biopsy. This may have been true
but it is no way to collect statistics.
Screening for breast cancer is generally aimed at the older age
group. In the US a frequently quoted figure is that a woman now
has a one in eight chance of developing breast cancer, which is
higher than in the past. This figure is correct but it is a
lifetime incidence risk; the reason it has risen is that on
average women are living longer. The (breast cancer) mortality
risk for women in the US is one in 28. A large number who
develop the condition do so very late in life and die of some
other condition before the breast cancer proves fatal.
Common Condition
Breast cancer is a relatively common condition and would appear
well suited for a screening programme. The evaluation of early
programmes seemed to show they offered considerable benefit in
reducing the risk of death. However later programmes showed less
benefit. In fact as techniques improved, screening apparently
became less effective. This caused some alarm and a study
published in 1999 by the Nordic Cochrane Centre in Copenhagen
looked at programmes world wide, and attempted to better match
screened populations with control groups. The authors claimed
that women in screening programmes had no better chance of
survival than unscreened populations. The reactions of those
44
running screening programmes (including those in NZ) were to
ignore this finding and advise their clients to do the same.
If there are doubts as to the efficacy of screening for breast
cancer, there must be greater doubts about screening for other
cancers in women, for other cancers are rarer. Any other
screening programme should be very closely targeted.
Unfortunately the risk factors for a disease may make targeting
difficult. In New Zealand we have seen cases where people
outside the target group have asked to be admitted into the
screening programme, so they also “can enjoy the benefits”.
Better education is needed.
Late-onset diabetes is more common among Polynesians than among
New Zealanders in general, and Polynesians have very sensibly
accepted that this is true. Testing Polynesians over a certain
age for diabetes makes sense, particularly as a test is quick,
cheap and easy to apply. Testing only those over a certain body
mass would be even more sensible but may get into problems of
political correctness.
Cervical cancer is quite rare so it is a poor candidate for a
mass screening programme aimed at a large percentage of the
female population. The initial screening is fast and cheap. If
the targeted group has an incidence that is one order of
magnitude higher than the general population, then the targeting
is as good as most tests. Screening the whole female population
for cervical cancer is a very dubious use of resources.
My wife and I were the only non-locals travelling on a bus in
Fiji when we heard a radio interview urging "all women" to have
cervical screening done regularly. The remarkably detailed
description of the test caused incredible embarrassment to the
Fijian and Indian passengers; we had the greatest difficulty in
concealing our amusement at the reaction. The process was
subsidised by an overseas charity. In Fiji, where personal
hygiene standards are very high, and (outside Suva) promiscuity
rates pretty low, and where most people pay for nearly all
health procedures, this seemed an incredibly poor use of
international aid.
Assessment Impossible
Screening for cervical cancer has been in place in NZ for some
time. Unfortunately we cannot assess the efficacy of the
45
programme because proper records are not available. An attempt
at an assessment was defeated by a provision of the Privacy Act.
The recent case of a Gisborne lab was really a complaint that
there were too many false negatives coming from a particular
source. However this was complicated by a general assumption
among the public and media that it is possible to eliminate
false negatives. It should be realised that reducing false
negatives can only be achieved by increasing the percentage of
false positives. As can be seen from the data above, it is false
positives that bedevil screening programmes.
Efforts to sue labs for false negatives are likely to doom any
screening programme. To some extent this has happened in the US
with many labs refusing to conduct breast xray examinations, as
the legal risks from the inevitable false negatives are
horrendous.
Large sums are being spent in NZ on screening programmes;
taxation provides the funds. Those running the programmes are
convinced of their benefits, but it is legitimate to ask
questions. Is this spending justified?
Some Post-Scripts:
January 15 2000 New Scientist P3: Ole Olsen & Peter Gøtzsche of
the Nordic Cochrane Centre in Copenhagen published the original
meta-analysis of seven clinical trials in 2000. The resulting
storm of protest, particularly from cancer charities, caused
them to take another look. They have now reached the same
conclusion: mammograms do not reduce breast cancer deaths and
are unwarranted.
October 2001: In recent TV interviews some people concerned with
breast cancer screening in NZ were asked to comment on this
meta-analysis. Once again the NZ commentators stated firmly that
they were certain that screening programmes in NZ "had saved
lives" but suggested no evidence to support their view.
March 23 2002 New Scientist P6: The International Agency for
Research on Cancer (IARC) funded by the WHO claims to have
reviewed all the available evidence. They conclude that
screening women below the age of 50 is not worthwhile. However,
screening women aged from 50-69 every two years reduces the risk
of dying of breast cancer by 35%.
46
According to New Scientist, the figures from Britain are that of
1000 women aged 50, 20 will get breast cancer by the age of 60
(2%); of these six will die. Screening every two years would cut
the death rate to four. [It is obvious that these are
calculations, not the result of a controlled study!]
The IARC states that organised programmes of manual breast
examination do not bring survival benefits (they call for more
studies on these). If NZ has similar rates then screening
programmes aimed at 50-60 year old women should save
approximately 50 lives per annum.
Teacher’s notes:
This article helps discuss the issues of false positives in
screening tests. You may also bring up the problem of screening
for testosterone and other performance enhancing drugs that
athletes take. A separate article is given separately with
another nice table and graph (perhaps for more advanced
students).
But most importantly, the tables presented in this article
can be used to play with different types of probability :
Probability of A given B, probability of A and B and probability
of A by itself (or probability of B by itself). Conditional
probability (probability of A given B) is often misunderstood in
the media and in many reports, and it is important that learners
are aware of that.
47
Download