MMartin-Chapter

advertisement
An Elephant Never Forgets – Effective
Analogies for Teaching Statistical Modeling
Michael A. Martin
Research School of Finance, Actuarial Studies and Applied Statistics, Australian National
University, Canberra, ACT 0200, AUSTRALIA
Abstract
Analogies are useful and potent tools for introducing new topics in statistics to
students. Martin (2003, 2008) considered the case for teaching with analogies in
introductory statistics courses, and also gave many examples of particular analogies that had been successfully used to make difficult statistical concepts more accessible to students. In this article, we explore more deeply analogies for statistical
concepts from more advanced topics such as regression modeling and highdimensional data.
Introduction
Many students approach their statistics classes with trepidation, perhaps because
many of the concepts they encounter seem so foreign. Yet, despite a lexicon
steeped in jargon and technical expressions, much statistical thinking has its basis
in ideas with which most students will be already familiar – the trick for statistics
educators, it seems, is to bridge that gap between existing, familiar ideas and new,
forbidding ones. Analogy is an effective tool for bridging this gap, with some particularly evocative uses including the alignment between statistical hypothesis
testing and the process of a criminal trial (Feinberg, 1971; Bangdiwala, 1989;
Brewer, 1989; among many others), and the idea of a sample mean as a “balance
point” for the data (Moore, 1997, p.262 as well as on the cover of the text). These
famous examples leverage the key features of analogical thinking:



access – the relevant source idea must be retrieved from memory
mapping – alignment between elements (both objects and relationships)
in the source and the target must be identified
evaluation and adaptation – the appropriateness of the mappings need to
be assessed and adapted where necessary to account for critical features
of the target
2

learning – the target is understood, and new knowledge and relevant
items and relationships are added to memory. The “transfer” from old to
new domains is completed and the new situation can be accessed without
reference to the source domain.
These elements are described in detail in the monograph by Holyoak and Thagard
(1995), in which is presented a comprehensive, modern overview of analogical
thinking. Martin (2003) explored the use of analogies in teaching statistics and offered many examples of analogies that had been effectively used in his statistics
classes, including the legal analogy for hypothesis testing and the balance point
analogy for the average. Martin later presented this work at the OZCOTS 2008
conference (Martin, 2008). In the original 2003 paper and in the OZCOTS presentation, Martin focused on analogies and examples useful in a first course in statistics – the critical time when students first encounter our “mysterious” discipline.
In this article, we consider examples and analogies specific to statistical concepts
from more advanced topics from regression modeling and high-dimensional data
analysis. We explore in more detail the mappings – for both items and relationships – that exist between the source and target ideas, critique the strengths and
weaknesses of the analogies, and offer some new ideas that have been found useful in describing these more advanced topics.
In describing and critiquing the examples below, we utilize the “teaching-withanalogies” framework developed by Glynn (1991) (see also Glynn and Law, 1993;
Glynn, 1994, 1995, 1996; and Glynn, Duit and Thiele, 1995, for further discussion
and refinements). This framework identified six steps: introduce target concept;
access source analog; identify relevant features of both source and target; map elements and relationships between source and target; assess breakdowns of the
analogy; adaption and conclusion. These six steps essentially give form to the four
features (access, mapping, evaluation, learning) listed above, and allow the construction of powerful analogs for thinking and learning.
This article is designed to be read in combination with the earlier article by Martin
(2003), in which a formal argument is made supporting the use of analogies in
teaching statistics, so the focus of this article is principally descriptive. The focus
of that paper was largely on analogies for teaching a first course in statistics, while
this article gives more consideration to and provides more detail for topics covered
in a later course on statistical modeling.
Analogies for Describing Regression Modeling
Martin (2003) introduced several analogies useful in the context of describing regression models. We explore some of these examples in greater detail here, in-
3
cluding a couple of analogies not included in the 2003 article or the OZCOTS
Martin (2008) presentation.
Analogy 1: Signal-to-noise and F-ratios
Most students become familiar with hypothesis testing by considering tests for
means and proportions, and so come to associate testing with the location-scale
structure of Z and t tests. Similarly, in regression contexts, tests for coefficients
also work in this familiar way. So, when F tests are introduced, the immediate reaction is that this test is somehow different, as it is now based on a ratio rather
than a scaled difference. Worse still, that ratio is “tampered with” through degrees
of freedom adjustments! To motivate the use of a ratio-based test statistic, the
analogous concept of a “signal-to-noise” ratio is a useful one. Almost without exception these days, students use Wi-Fi technology every day, so the idea that signals emanate from some central server and then are degraded by noise as the wireless device moves further from the source is a very familiar one. Most devices
measure “signal strength” using bars – a rudimentary graphical display. The idea
of a signal-to-noise ratio is thus a natural one, and the further idea that as the signal-to-noise ratio drops, the ability of the receiver to satisfactorily recover the true
signal drops with it. In this analogy, the correspondence between objects is strong
(signal/model; noise/error), and a key relationship (the use of a multiplicative
measure of distance) also holds. As a result, the analogy has strong appeal and
good memorability. On the other hand, there are some unmapped elements: in the
Wi-Fi example, the notion of distance from the server is not represented in the target domain, and the role played by degrees of freedom in the F test has no direct
map back to the source domain. As a result, the map is incomplete, but good
enough to serve to motivate further discussion.
The way that degrees of freedom impact the definition of the F test statistic is often a difficult one for students to understand. To elucidate this idea, one approach
that has been successful is the notion of “a fair fight”. In comparing the signal
with the noise, we wish to make this comparison as “fair” as possible, but the numerator in the F statistic is based on only one piece of information (the location of
the line), while the denominator is based, essentially, on n – 2 pieces of information (this having been established when degrees of freedom were discussed),
and so in order to make the comparison “fair” we must scale each of the numerator and denominator by the number of pieces of information on which each is
based. This argument seems to resonate with some students, though the idea attempts to use knowledge about degrees of freedom that may be too “new” for students to readily access initially. This problem leads to inevitable questions: why is
the line based on only one piece of information? Why n – 2? Why isn’t the regular
ratio (unadjusted by degrees of freedom) good enough? These are tough questions
– and the analogy is not strong enough to provide accessible answers. Of course,
the questions have reasonable answers, but the answers lie outside the map implied by the analogy. The double-edged sword of analogies remains that while
4
they can produce in students that “eureka” moment, when the map is incomplete
they can instead produce frustration.
Analogy 2: The undiscovered island and partitioning variability with sequential sums of squares
Martin (2003) introduced the analogy of the “undiscovered island” to explain how
the order in which variables are fit in a model changes their sequential sums of
squares. Here, the analogy is explored more deeply, with a view to more clearly
incorporating the notion of multicollinearity and its effect on the sequential breakdone of explained variation in the analysis of variance. The analogy describes an
uninhabited, unexplored island in the days of the great exploration of the oceans
by colonial powers. The source idea is that the exploration and the claiming of territory depended critically on which explorer arrived first. So, as explorers arrive at
the island one after another, they are only able to explore and claim territory that
has not already been claimed. Further, some parts of the island are impenetrable
jungle, so some territory cannot be explored (remembering that the colonial powers did not have access to Google Earth!). Mapped objects (source/target) exist in
both domains (explorers/covariates; explored territory/explained variation; impenetrable jungle/unexplained variation; sequence of arrival of explorers/sequence of
fit in model). The map is fairly strong, and the story sufficiently engaging that students can readily transfer the idea from the source domain to the target domain.
Further, other notions such as multicollinearity and marginal explanatory power
can also be integrated into the analogy with strongly mapped elements. Panels A
and B, below, show how two great explorers coming from the same direction can
each look “marginal” if they happen to arrive after the other one. This situation is
an analog of two “good” variables that are roughly collinear – i.e. they are both
carrying much the same information – so, the order in which they are fit determines which of them seems most important in terms of explaining variation in the
response. Panel C shows the situation when variables (explorers) are roughly independent (coming from completely different directions) – in this case, the order
of fit (arrival) doesn’t matter as the way the variation is explained (island is partitioned) does not change. In either case, the total amount of explored island (variation explained) is the same irrespective of the order of arrival (order of fit), so the
fitted model itself does not change, only the way that the territory has been divided up (explained variation has been partitioned).
5
Panel A: X1 and X2 come from roughly the same direction (collinear), X1 arrives
first (gains largest sequential sum of squares), leaving little for X2
Panel B: X1 and X2 come from roughly the same direction (collinear), X2 arrives
first (gains largest sequential sum of squares), leaving little for X1
6
Panel C: Panel A: X1 and X2 come from opposite sides (roughly independent), order of arrival does not materially affect territory claimed (sums of squares)
Analogy 3: Symptoms versus disease and leverage versus influence
Perhaps the most frustrating experience when teaching regression is that of having
students confuse high leverage with influence. Over many years, this one concept
seems to have been the hardest of all to reliably communicate in my regression
courses. Why this should be is hard to pinpoint, as the distinctions could hardly
have been made clearer, with many facts and examples used as evidence of the
difference between the two concepts. For instance, leverage is a function only of
the covariates, not the response, so it simply cannot be the same as influence
which must, necessarily, involve some consideration of the response variable. Yet
in almost every assessment item, students routinely (and cheerfully) declared
points with high leverage to be influential. For many years, despite my intense efforts to clarify this issue, the confusion between these two concepts continued –
until I began using a simple but powerful analogy that has radically addressed this
confusion. In the analogy, high influence is aligned with a disease, while high leverage is aligned with a symptom of that disease. The analogy is strong because
along with the strong map between objects in the analogy, there is also a strong
map of structural relationships in the source domain to the target domain. In general, in human health, symptoms are fairly readily detected, just as leverage is easily calculated. Disease, on the other hand, may be hard to directly detect, and so it
is often the case with high influence. Diseases are often signaled by symptoms,
just as high influence is often signaled by high leverage, but as everybody knows,
a sneeze may be a sign of a cold, but to actually detect the virus causing a cold
would require a visit to a laboratory, and, in any event, not every sneeze is associated with a cold. In this way, the tendency I noticed for students to declare a point
as influential in many ways resembled the tendency for people to declare that they
had a cold when, in fact, they were simply sneezing! Even more complex concepts
such as masking are well accommodated within the analogy – the presence of a
disease may well be masked by the presence of additional symptoms beyond those
classically associated with the disease. Since I began using this analogy, the tendency for points to be routinely assessed as having high influence simply because
they have high leverage has dropped markedly. Just as the realization that symptoms and diseases are associated but not synonymous is part of what people commonly understand, now this realization has been transplanted to the context of leverage and influence. Even more powerfully, the context of both the source domain
and the target domain is diagnostic, the former medical and the latter statistical,
making this a very appealing analogy.
Analogy 4: Competition between sporting teams and combining p-values
A very common sight within journals in just about any field is a large table with
columns labeled “variable”, “estimate”, “SE”, “t-value” and “p-value” under
which sits row after row of figures, typically festooned down the right-hand edge
7
with an array of daggers and stars representing significance at 10%, 5% and 1%
for each of the listed variables. This table is typically the result of a model fitting
exercise, the ultimate intention being to make a judgment – and choice – of which
explanatory variables are important in describing the response variable, and in
many instances, this choice is made by simply retaining those variables “starred”
in the table and removing the others as extraneous. What this exercise does not
explicitly take into account, of course, is that the multiple tests on which this aggregate judgment is made cannot be simply combined to produce this outcome because each individual test is marginal – that is, each test is based on an assumption
that is incompatible with every other test. At first blush, most students find this
situation utterly confusing: if the first line in the table suggests that is plausibly
zero, and the second line in the table suggests that is plausibly zero, why can’t I
just take both of the corresponding variables out of the model? The answer lies, of
course, in considering the actual models being compared in each of the tests being
conducted (this discussion leaving aside for the moment the vexed problem of
multiple testing). For simplicity, suppose there are only two covariates, X1 and X2.
The test summarized by the first line (corresponding to X1 and ) in the table reflects a comparison between the two models
E(Yi ) = b0 + b1 X1i + b2 X2i
and
E(Yi ) = b0 + b2 X2i ,
while the test summarized by the second line in the table (corresponding to X2 and
reflects a completely different comparison, between
E(Yi ) = b0 + b1 X1i + b2 X2i
and
E(Yi ) = b 0 + b1 X1i .
Meanwhile, the proposed action suggested by combining the two tests is that of
removing both variables from the model, which amounts to making a comparison
between yet another pair of models, between
E(Yi ) = b0 + b1 X1i + b2 X2i
and
E(Yi ) = b0 ;
that is, between the full model and a model featuring only an intercept. You cannot combine the tests summarized by the first two lines, the separate tests for
and – as the underlying comparisons are inherently incompatible. Further,
8
you certainly cannot infer the result of the final comparison you wish to make by
considering the first two tests. By now, most students’ heads are spinning. Compare what model with what model? How can you say is plausibly zero and is
plausibly zero, but they are not both plausibly zero? Huh?? One of the difficulties
rests with the way in which null hypotheses are expressed, typically only explicitly referencing one parameter under an implicit assumption that all other parameters are present. But students often interpret these statements as absolute statements and ignore the implicit assumptions and underlying models, leading to the
misunderstanding related above.
Here, a simple analogy can help. Think of each of the competing models in the
above description as sporting teams engaged in a round-robin contest. Call the
teams Sydney ( E(Yi ) = b0 + b1 X 1i + b2 X 2i ), Melbourne ( E(Yi ) = b0 + b2 X 2i ),
Brisbane ( E(Yi ) = b0 + b1 X 1i ) and Canberra ( E(Yi ) = b0 ). Then run the student
through the following sequence of games for Sydney:
Game 1, Melbourne plays Sydney, and Melbourne wins
Game 2, Brisbane plays Sydney, and Brisbane wins
Game 3, Sydney plays Canberra, and Sydney wins.
Then ask, is there anything about this set of results that is inherently contradictory.
The answer is invariably no, sets of results like this are commonplace in sports.
Even if stronger teams always beat weaker teams, this set of results is completely
unsurprising, indeed expected, if Melbourne and Brisbane are strong teams, Sydney is a medium-ranking team, and Canberra is weak. But this system of games is
analogous to the sequence of tests described above, a series of hypothesis tests for
which students typically assume that the results of the first two tests imply the result for the third. The strong identification between teams and models is a useful
mapping, and the key to understanding in the target domain rests in the realization
that the set of results in the source domain is also unsurprising because the three
contests (tests) are not as related as the null hypothesis statements make them
seem.
Analogy 5: Choosing the “right” meal from the menu and model selection
Model selection is a process many students find difficult to understand, particularly when there are a large number of covariates. Having been warned off selecting
combinations of variables based on large tables of marginal p-values, they know
they cannot proceed that way, and in the presence of many covariates, the sheer
number of available models is formidable. Automatic procedures such as stepwise
procedures are a seductive alternative, but remembering the algorithm has proven
difficult for many students (particularly when the process is completely automatic
– “black box” – in software). To motivate the algorithm, the following analogy
has proven useful. Imagine a restaurant with a large, diverse menu. Obviously you
9
want the “optimal” meal. So, begin by selecting from the menu the food you most
like. Having eaten that morsel, you gaze again at the menu, at the next step choosing the food you like next best provided, of course, it goes well with what you
have already eaten. The process continues until either there is nothing on the menu
that complements what you have already eaten or you are full. This process is like
the forward selection method of model selection – each step is conditional on the
previous step, and the process cannot step backwards (since you eat the courses as
you progress).
In the model selection process, the same sequence is followed, with variables chosen at each step depending on their contribution to the model given what has already been added. The meal (model) is built one item at a time until the contributions from additional menu items (variables) has diminished below some
acceptable threshold. Refinements such as moving to a forward stepwise procedure that incorporates successive add-delete variable phases can be accommodated
by the analogy by simply removing the requirement that courses are eaten as they
arrive at the table – instead, the order is built sequentially with menu items added
– and potentially deleted – as their suitability is assessed in the context of what
else has already been ordered at the preceding step. The analogy is very simple but
in my experience students find it very motivating. The experience of ordering food
and thinking of pleasant combinations of food is both a common experience and,
generally, a positive and pleasant one. These factors, plus the strength of the maps
between objects and relationships between the source and target, create a positive
environment for understanding the new algorithm, and my experience has been
that this analogy is a particularly effective way to describe stepwise regression
procedures.
Analogy 6: The blind men and the elephant and understanding highdimensional data
Visualization is an incredibly useful tool in statistical modeling. Every student of
statistical modeling has to have seen Anscombe’s quartet (Anscombe, 1973), the
collection of four data sets that all yield identical numerical regression output but
which could scarcely be more different when plotted. This powerful example immediately convinces all students of the wisdom of visualizing data, but visualization is a seriously difficult problem when data is high dimensional. Explaining
why visualization in high dimensions is so problematic can be difficult – many
graphical displays, for instance scatterplot matrices and trellis displays, offer a
glimpse at high-dimensional data, but the truth behind the data can remain well
hidden.
One approach to demonstrating this truism is to carefully construct a multivariate
data set that effectively defeats all attempts to discover its real structure by looking in the obvious directions. This approach can work well, but it has a considerable downside – it casts the teacher as an illusionist, a trickster, even a huckster.
10
Yet here an analogy – the brilliant fable of the blind men and the elephant – illustrates the situation wonderfully. The history of this story is long, and it has been
used to teach a wide range of lessons, from the need for effective communication
to the idea of tolerance for those who have different perspectives. Perhaps the
best-known rendering of the tale is the poem by John Godfrey Saxe (1816-1887),
a work now in the public domain as Saxe has been deceased for over a century:
It was six men of Indostan,
To learning much inclined,
Who went to see the Elephant
(Though all of them were blind),
That each by observation
Might satisfy his mind.
The First approach’d the Elephant,
And happening to fall
Against his broad and sturdy side,
At once began to bawl:
“God bless me! but the Elephant
Is very like a wall!”
The Second, feeling of the tusk,
Cried, “Ho! what have we here
?
So very round and smooth and sharp?
To me ’tis mighty clear,
This wonder of an Elephant
Is very like a spear!”
The Third approach’d the animal,
And happening to take
The squirming trunk within his hands,
Thus boldly up and spake:
“I see,” quoth he “the Elephant
Is very like a snake!”
The Fourth reached out an eager hand,
And felt about the knee:
“What most this wondrous beast is like
Is mighty plain,” quoth he,
‘Tis clear enough the Elephant
Is very like a tree!”
The Fifth, who chanced to touch the ear,
Said “E’en the blindest man
Can tell what this resembles most;
Deny the fact who can,
11
This marvel of an Elephant
Is very like a fan!”
The Sixth no sooner had begun
About the beast to grope,
Then, seizing on the swinging tail
That fell within his scope,
“I see,” quoth he, “the Elephant
Is very like a rope!”
And so these men of Indostan
Disputed loud and long,
Each in his own opinion
Exceeding stiff and strong,
Though each was partly in the right,
And all were in the wrong!
MORAL
So, oft in theologic wars
The disputants, I ween,
Rail on in utter ignorance
Of what each other mean;
And prate about an Elephant
Not one of them has seen!
Apart from the teacher having the delightful experience of reciting a poem in a
statistics class (so you already have everyone’s attention), the reward is that the final line of the poem states exactly the critical problem with high-dimensional data
– it simply cannot be seen, at least not in the low-dimensional space in which humans live. The time to consider the scatterplot matrix for that trick data set is right
after the poem has been read. Even despite the enormous advantages conferred by
the use of small multiples allowing so many directions in the data to be assessed at
once, the students realize very quickly that they are no better off than the committee of blind men standing before the elephant. It is then that, as a class, the journey
to understand high-dimensional data begins, acknowledging that we all begin with
the same basic problem – we are all essentially blind.
It is interesting to note also that visualization of data and relationships within data
– a basic tool for statisticians – is itself a classic example of analogical thinking,
one that is so embedded that it is now a completely automatic process. Statistical
graphics all embed a very simple metaphor – the size of a visual element (e.g.
length, area, angle) must be proportional to the number it represents. As long as
this metaphor is satisfied – and, remarkably, this rule is broken very frequently –
then the simple analogy allows our visual comparisons of size to transfer seam-
12
lessly and quickly to an understanding of the difference between the underlying
numbers. The metaphor is extraordinarily powerful, and the effects when the metaphor fails can be catastrophic. Edward Tufte even has a name for the effect when
the metaphor breaks – he calls it the “lie factor” (Tufte, 2001, p.57). The effect on
decision-making when graphics misrepresent the numbers they are supposed to
communicate further demonstrates the power of analogical thinking – when the relationship map behind the analogy fails, the whole house of cards can come tumbling down.
Analogies are a potent bridge between what is familiar and comfortable and what
is new, uncharted territory. Analogical structure – mappings from the old to the
new, along with the preservation of critical relationship maps – can be used to acquire new knowledge, and thus explore new vistas. Once the new knowledge is
transferred from the source to the target domain, it becomes itself accessible.
Analogies are also evocative, so their use promotes students remembering concepts far better than rote memorizing of formulas ever could – as the folklore says
of elephants, they never forget1.
As a postscript, it is also prudent to remind students of that other lesson from the
fable of the blind men and the elephants: the value of considering differing perspectives. In that vein, I close with the following tale…
Six blind elephants gathered together and the discussion turned to what humans
were like. After a gentle discussion (elephants dislike heated argument), it was decided that they should each feel a human and then they could meet again to discuss their findings. After a careful examination of a human, the first blind elephant
returned to the group. One by one the elephants went and made their own assessments, and when the group assembled again, the first blind elephant announced
that she had determined what humans were like. A brief discussion ensued, with
each elephant describing its findings. The verdict was unanimous. Humans are
flat.
References
Anscombe, F. J. (1973), Graphs in Statistical Analysis, American Statistician 27 (1): 17–21.
Bangdiwala, S. I. (1989), The teaching of the concepts of statistical tests of hypotheses to nonstatisticians, Journal of Applied Statistics, 16, pp. 355-361.
Brewer, J. K. (1989), Analogies and parables in the teaching of statistics, Teaching Statistics, 11,
pp. 21-23.
Feinberg, W. E. (1971), Teaching the type I and II errors: the judicial process, The American
Statistician, 25, pp. 30-32.
1
http://www.scientificamerican.com/article.cfm?id=elephants-never-forget
13
Glynn, S. M. (1991). Explaining science concepts: A teaching-with-analogies model, in The Psychology of Learning Science, eds. S. M. Glynn, R. H. Yeany, and B. K. Britton, Hillsdale,
NJ: Lawrence Erlbaum Associates, 219-240.
Glynn, S. M. (1994). Teaching science with analogies: A strategy for teachers and textbook authors, Research Report No. 15, Athens, GA: University of Georgia and College Park, MD:
University of Maryland, National Reading Research Center.
Glynn, S. M. (1995). Conceptual bridges: Using analogies to explain scientific concepts, The
Science Teacher, December 1995, 24-27.
Glynn, S. M. (1996). Teaching with analogies: Building on the science textbook, The Reading
Teacher, 49(6), 490-492.
Glynn, S.M., Duit, R., and Thiele, R. (1995), Teaching science with analogies: a strategy for
transferring knowledge, in Learning Science in the Schools: Research Reforming Practice,
eds. S. M. Glynn and R. Duit, Mahwah, NJ: Lawrence Erlbaum Associates, 247-273.
Glynn, S. M., and Law, M. (1993). Teaching science with analogies: Building on the book [Video], Athens, GA: University of Georgia and College Park, MD: University of Maryland, National Reading Research Center.
Holyoak, K. J., and Thagard, P. (1995). Mental Leaps: Analogy in Creative Thought, Cambridge,
Massachusetts: MIT Press.
Martin, M.A. (2003) It’s like, you know – the use of analogies and heuristics in teaching introductory statistics. Journal of Statistics Education, Volume 11(2) online.
http://www.amstat.org/publications/jse/v11n2/martin.html
See also Letter to the Editor:
http://www.amstat.org/publications/jse/v11n3/lesser_letter.html
and response:
http://www.amstat.org/publications/jse/v11n3/martin_letter_response.html
Martin, M.A. (2008) What lies beneath: inventing new wheels from old. OZCOTS 2008, Proceedings of the 6th Australian Conference on Teaching Statistics, H.L. MacGillivray and
M.A. Martin, editors. pp. 35-52.
Moore, D. S. (1997), Statistics: Concepts and Controversies (4th ed.), New York: W. H. Freeman and Co., Inc.
Tufte, E.R. (2001), The Visual Display of Quantitative Information. Graphics Press, Cheshire,
CT.
Saxe, John Godfrey, Six Men of Indostan. Poem is in the public domain and the work is no longer in copyright.
Download