A Statistician`s Apology

advertisement
A Statistician's Apology
JEROME CORNFIELD*
*
Statistics is viewed as an activity using a variety of formulations to give incomplete, but revealing,
descriptions of phenomena too complex to be completely modeled. Examples are given to stress the depth and
variety of such activities and the intellectual satisfactions to be derived from them.
1. CRITIC'S COMPLAINT
As may be apparent from my title, which is taken from G.H. Hardy's little masterpiece
[17], this address is on a subject covered by numerous predecessors on this occasion-what
is statistics and where is it going? When one tries, spider-like, to spin such a thread
out of his viscera, he must say, first, according to Dean Acheson,
"What do I know, or think I know, from my own experience and not by literary osmosis?" An honest answer
would be, "Not much; and I am not too sure of most of it."
In that spirit, I shall start by considering a non- statistical critic, perhaps a
colleague on a statistics search committee, who is struck by what he calls the dependent
character of statistics. Every other subject in the arts and sciences curriculum, he
explains, appears self-contained, but statistics does not. Although it uses mathematics,
its criteria of excellence are not those of mathematics but rather impact on other
subjects. Similarly, while other subjects are concerned with accumulating knowledge
about the external world, statistics appears concerned only with methods of accumulating
knowledge but not with what is accumulated. I don't deny, he says, that such activity
can be useful (here he shudders), but why would anyone wish to do it? At this point he
begins to wonder whether statisticians have any proper place in a faculty of arts and
sciences, but being too tactful to say so, he tells, instead, one of those stories about
lies, damned lies and statistics. He ends by quoting from the decalogue of Auden's Phi
Beta Kappa poem for 1946 in which academics are warned
"...
Thou shalt not sit/With statisticians nor commit/a social science."
Such an attitude is not uncommon. Most statisticians have wondered at one time or
another about their choice of profession. Is it merely the result of a random walk or
are there major elements of rationality to it? Would one, in later years, in full
knowledge of one's interests and abilities, agree that the choice was correct? I must
admit that in my more Walter Mitty-like moments I have contemplated the cancer cures
I might have discovered had I gone into one of the biomedical sciences, and similar
marvels for other fields. But advancing years bring realism, and I am convinced that
the outcome of the random walk which made me a statistician was inevitable. The only
state that was an absorbing one was statistics, and I am pleased that the passage time
was so brief. To say why this is so, I must talk about both statistics and myself.
2. DIVERSITY OF INTERESTS
Considering myself first, I point to an affinity with the quantum theorist Pauli, between
whom and all forms of laboratory equipment, it has been said, a deep antagonism subsisted.
He had only to appear between the door-jambs of any physics laboratory in Europe for
*
Jerome Cornfield is chairman, Department of Statistics, George Washington University, and director, Biostatistics Center, 7979 Old Georgetown
Road, Bethesda, Md. 20014. This article is the Presidential Address delivered at the 134th Annual Meeting of the American Statistical Association,
August 27, 1974, St. Louis, Mo. Preparation of this article was partially supported by National Institutes of Health Grant HL15191.
1
some piece of glassware to shatter. Similarly, in my undergraduate class in analytic
chemistry, no matter what solution I was given to analyze, even if only triply distilled
water, I was sure to find sodium, a literal product, I was later told, of the sweat of
my brow. In biology, I retraced the path of James Thurber, the result of whose first
effort to draw what he saw in a microscope was, it will be remembered, a picture of his
own eye. Observational science was clearly not for me, unless I intended to publish
exclusively in the Journal of Irreproducible Results, the famed journal which first
reported the experiment in which one-third of the mice responded to treatment, one-third
did not, and the other was eaten by a cat.
Although I enjoyed mathematics, I enjoyed many other subjects just as much, and a
part-time mathematician ends by being no mathematician at all. This interest in many
phenomena and the sense that mathematics by itself is intellectually confining as Edmund
Burke said of the law, sharpens the mind by narrowing it is characteristic of many
statisticians. Francis Galton, for example, was active in geography, astronomy,
meteorology and of course, genetics, to mention only a few. I have always thought that
his investigation on the efficacy of prayer, in which he, among other things, compared
the shipwreck rate for vessels carrying and not carrying missionaries, exhibits the
quintessence of statistics [13]. Charles Babbage, the first chairman of the Statistical
Section of the British Association for the Advancement of Science and a leading spirit
in the founding of the Royal Statistical Society, was another from the same mold. He
is perhaps the earliest example of a statistician lost to the field because of the greater
attractions of computation. His autobiography [1] lists an awe inspiring number of other
subjects in which he was interested, and no doubt the loss to each of them because of
his youthful reaction to erroneous astronomical tables "I wish to God these calculations
had been executed by steam" was equally lamentable.
A wide diversity of interests is neither necessary nor sufficient for excellence in
a statistician, but a positive association does exist. Statisticians thus tend to be
hybrids, and it was my own hybrid qualities, together with a disinclination to continue
the drift into what Henry Adams called the mental indolence of history, that made the
outcome of my random walk inevitable.
3. COMPLEX SYSTEMS AND STATISTICS
Coming now to statistics, it clearly embraces a spectrum of activities
indistinguishable from pure mathematics at one end and from substantive involvement with
a particular subject matter area at the other. It is not easy to identify the common
element that binds these activities together. Quantification is clearly a key feature
but is hardly unique to statistics. The distinction between statistical and
nonstatistical quantification is elusive, but it seems to depend essentially on the
complexity of the system being quantified; the more complex the system, the greater the
need for statistical description.
Once, many people believed that the necessity for statistical description in a system
was due solely to the primitive state of its development. Eventually, it was felt, biology
would have its Newton, then psychology, then economics. But since Newtonian-like
description has turned out to be inadequate even for physics, few believe this anymore.
Some phenomena are inherently incapable of simple description. A number of years ago
I gave as an example of such a class of phenomena the geography of the North American
continent [9]. I was not suggesting that the Lewis and Clark expedition would have
benefited from the presence of a statistician if its objectives had been quantitative
it might very well have -but only that many scientific fields that are now descriptive
and statistical may remain so for the indefinite future.
2
4. LIFE SCIENCES AND STATISTICS
In the life sciences, the somewhat naive 19th century expectations as expressed most
forcibly perhaps by the great French physiologist, Claude Bernard, have not been
realized. Bernard loathed statistics as inconsistent with the development of scientific
medicine. In his Introduction to the Study of Experimental Medicine [2, pp. 137 ff.]
he wrote:
"A great surgeon performs operations for stones by a single method; later he makes a statistical summary
of deaths and recoveries, and he concludes from these statistics that the mortality law for this operation
is two out of five. Well, I say that this ratio means literally nothing scientifically and gives no certainty
in performing the next operation. What really should be done, instead of gathering facts empirically,
is to study them more accurately, each in its special determinism ... by statistics, we get a conjecture
of greater or less probability about a given case, but never any certainty, never any absolute determinism
... only basing itself on experimental determinism can medicine become a true science, i.e., a sure science
... but indeterminacy knows no laws: laws exist only in experimental determinism, and without laws there
can be no science."
It is ironic that more than a century later, scientific understanding of the
biochemical factors governing the formation of gallstones has progressed to the point
that their dissolution by chemical means alone has become a possibility [22], but one
whose investigation, even now, must include a modern clinical trial with randomized
allocation of patients and statistical analysis of results. In the view of many,
Bernard's vision of a complete nonstatistical description of medical phenomena will not
be realized in the foreseeable future. Sir William Osler was at least as close to the
truth when he said that medicine will become a science when doctors learn to count.
Although the exclusively reductionist philosophy of Bernard still dominates
preclinical teaching in most medical schools, many important biomedical problems not
amenable, in the present state of knowledge, to this form of attack have been and are
being successfully investigated. In his presidential address to the American Association
for Cancer Research [21], Michael Shimkin discussed one of the most well known of them,
smoking and health, and ended by suggesting that a Nobel Prize for accomplishments on
this problem be awarded and shared by Richard Doll and Bradford Hill, Cuyler Hammond
and Daniel Horn and Ernest Wynder, three of whom are members of either the ASA or the
Royal Statistical Society.
A less well-known biomedical activity, in which the impact of statistics is also
becoming considerable, is the evaluation of food additives. Even the most traditional
of toxicologists has been compelled to admit that an experiment on n laboratory animals,
none of whom showed an adverse reaction to an additive, does not establish, even for
that species of animal, the absolute safety of that additive for any finite n, much less
that the risk is less than 1 in n [18]. A more fundamental problem than either of these,
that of brain function, is, according to one investigator, unlikely to be resolved by
deterministic methods since its primary mode of function is believed by many to be
probabilistic [4]. For none of these problems, nor for many others that could be listed,
is a reductionist approach, let alone solution, anywhere in sight, and if the problems
are to be attacked at all, they must be attacked statistically.
This, of course, does not mean they must be attacked by statisticians. Osler was
questioning the unwillingness and not the inability of physicians to count, and many
of them have shown that when they become involved in a problem that requires counting
that they can do a pretty good job of it. No one, not even our hypothetical critic,
questions that statisticians have a useful auxiliary role to play in such investigations.
But the heart of his question and of my inquiry is why should any person of spirit, of
ambition, of high intellectual standards, take any pride or receive any real stimulation
and satisfaction from serving an auxiliary role on someone else's problem. To prepare
the ground for an answer, I shall first consider two examples taken from my own
3
experience.
5. EXAMPLE: TOXICITY OF AMINO ACIDS
The first example concerns a series of investigations on the toxicity of the essential
amino acids undertaken by the Laboratory of Biochemistry of the National Cancer Institute
[15]. They were motivated by the finding that synthetic mixtures of pure amino acids
could have ad- verse effects when administered to patients following surgery. An initial
series of experiments had determined the dosage-mortality curve in rats for each of the
ten amino acids. The next step, the investigation of mixtures, threatened to founder
at conception because there were over 1,000 possible mixtures of the ten amino acids
that could be formed, and to study so many mixtures was clearly beyond the bounds of
practicability.
At the time these experiments were done, several methods for measuring joint effects
had been proposed. Their common feature was that they did not depend on detailed
biochemical understanding of joint modes of action. The primitive but not very
well-defined notion to be quantified was that combinations may have effects that are
not equal to the sum of the parts and that knowing this may lead to better understanding
of modes of action. J. H. Gaddum, the British pharmacologist who was one of the inventors
of the probit method, discussed the quantification of the primitive notion in his
pharmacology text with a simplicity and generality that contrasted with most of the other
discussions then available [12]. He started with the idea that if drugs A and B were
the same drug, even if in different dilutions, their joint effects were by definition
additive. That definition implied that if amounts xA and xB of drugs A and B individually
led to the same response, e.g., 50 percent mortality, then they were additive at that
level of response if and only if the mixture consisting of some fraction, p, of xA and
fraction (1 - p) of xB led to that same level of response for all p between zero and unity.
This concept of additivity is different from that customarily used in statistics as may
be seen by noting that, in the customary concept, if responses are additive, then log
responses are not; but with Gaddum's concept, if drugs are additive with respect to a
response, they are additive with respect to all one-to-one functions of that response.
I shall refer to his concept of additivity as dosewise additivity to distinguish it from
the customary statistical concept which is of response-wise additivity.
An alternative, then, to investigating the 1,023 possible mixtures of amino acids
was to start with the concept of dosewise additivity. If the ten amino acids were in
fact dosewise additive in their effects on mortality, further investigation of mixtures
might be unrewarding, while large departures from additivity in either direction might
provide insights into possible joint modes of action. My biochemical colleagues were
initially quite unimpressed with this proposal. But since the dosage-mortality curves
for the individual amino acids were so steep that one-half an LD99.9 (the dose estimated
to be lethal for 99.9 percent of the animals) would elicit no more than a one to three
percent mortality, they did agree to test it in a small experiment. The experiment
consisted of using a mixture of one-half the LD99.9's of each of two amino acids. If
response-wise additivity held, this should have led to a two to six percent mortality,
but dose-wise additivity would have led to 99.9 percent mortality. Although they didn't
say so at the time, the biochemists regarded this experiment as worth the investment
of just two rats. But they did call me the next day to report that both rats had died,
and then added in an awe-stricken tone that would have gratified Merlin himself, "and
so fast."
Once the usefulness of the model of dose-wise additivity as a standard against which
to appraise actual experimental results was accepted, my role became that of passive
observer. But here the real biochemical fun began. The biochemists started with a single
experiment in which a mixture of the ten amino acids, each at a dosage of one-tenth their
4
individual LD50, was administered to a group of rats. On the hypothesis of dosewise
additivity, 50 percent should have died; in fact, none did. Clearly, the different amino
acids were not augmenting each other in the same way that each increment of a single
amino acid augments the preceding amounts of that amino acid. A 50 percent kill was
obtained only when the amount of each ingredient in the mixture was increased by about
70 percent.
To investigate further the nature of this joint effect, ten separate mixtures were
prepared, each containing nine of the amino acids at a dosage of one-ninth X 1.7 their
individual LD 50's. At this point a wholly un- expected result emerged. For all but one
of the ten mixtures, the departures from additivity were of about the same magnitude
as for the mixture containing all ten. But one of the mixtures, the one lacking
L-arginine, was much more toxic than the other nine, suggesting that, despite its
toxicity when given alone, in a mixture L-arginine was protective. A direct experiment
confirmed this. The hypothesis was now advanced that the toxicity of each of the amino
acids was due to the formation and accumulation of ammonia and that the protective effect
of L-arginine was a result of its abilityn previously demonstrated in isolated
biochemical systems, to speed up the metabolism of ammonia. A later series of experiments fully confirmed this hypothesis.
A major advance in the understanding of the metabolism of amino acids was thus
crucially dependent on a wholly abstract concern with appropriate ways to quantify the
joint effects of two or more substances.
6. EXAMPLE: THE ELECTROCARDIOGRAM
The other example is taken from ongoing work in the computerized interpretation of
the electrocardiogram, or ECG [8]. It is convenient to consider computerization as
involving two nonoverlapping tasks. The first is measurement of the relevant variables,
or in the specialized terminology of the field, wave recognition. The second is the
combination of the variables to achieve an interpretation, or as it is sometimes cailled,
a diagnosis. The function of the wave recognition program is to summarize the electrical
signal which shows voltage as a function of time for each of a number of leads. There
are characteristic features to this signal which take the form of maxima and minima and
which can be related to various phases of the heart cycle. These features are referred
to as waves and the wave recognition program locates the beginning and end of each wave.
From these, a series of variables is defined and their values computed. The signal has
thus been transformed into a vector of variables.
The wave recognition procedures of the different groups that have developed
computerized interpretations all agree in principle, although there are important
differences in detail, and hence, in accuracy. The interpretation phase of the different
programs does involve an important difference of principle, however. Traditional reading
of the ECG by a cardiologist involves the use of certain decision points. Thus, the ECG
is traditionally considered "consistent with an infarct" if it manifests one or more
special characteristics, such as a Q wave of 0.03 sec. or more in certain leads, but
normal if it manifests none of these characteristics. Most groups have simply taken such
rules and written them into their programs. Headed by Dr. Hubert Pipberger, the Veterans
Administration group, with which I have been involved, has elected, instead, to proceed
statistically by collecting ECG's on a large number of individuals of known diagnostic
status-normals, those having had heart attacks, etc., and basing the rules on the data
by using standard multivariate procedures rather than on a priori decision points. This
statistical approach distinguishes the Veterans Administration program from all the
others now available.
At this point, I shall consider just a secondary aspect of the general program, the
detection of arrhythmias, and a restricted class of arrhythmias at that those which
5
manifest themselves in disturbances of the RR rhythm. The time between the R peaks of
the ECG for two successive heart beats is termed an RR interval, and the problem was
to base a diagnosis on the lengths of the successive RU intervals. From a formal
statistical point of view, each patient is characterized by a sequence of values t1,
t2,… , tn which can be considered a sample from a multivariate distribution. If the
distribution is multivariate normal and stationary, the sufficient statistics for any
sequence of intervals are given by the mean interval, the standard deviation and the
successive serial correlations. Thus, the RR arrythmias should be classifiable in terms
of these purely statistical attributes.
Even before looking at any data, this formal scheme seemed to provide an appropriate
quantification for this class of arrhythmias. High and low mean values for a patient
correspond to what are called tachyeardias and bradyeardias. High standard deviations
characterize departures from normal rhythm; while the different types of departures,
the atrial fibrillations and the premature ventricular contractions which must be
distinguished because of their different prognoses could be distinguished by their
differing serial correlations. A statistician familiar only with the purely formal
aspects of stationary time series and multivariate normal distributions and entirely
innocent of cardiac physiology, as I am, is nevertheless led to an appropriate
quantification. When these ideas are tested by classifying ECG's from patients with known
rhythmic disturbances, about 85 percent are correctly classified [16]. Because of the
simplicity of the scheme, a small special purpose computer, suitable for simultaneous
bedside monitoring of eight to ten patients in a coronary care unit, seems feasible and
is under active investigation.
7. KEY ROLE OF INTERACTIONS
Having placed this much emphasis on the examples, I am a little embarrassed at the
possibility that they will have failed to show that there is stimulation and satisfaction in working on someone else's problem. Compared with, say, the revolution in
molecular biology and the breaking of the genetic code, the activities described are
admittedly pretty small potatoes. Nevertheless, both the scientific problems and the
statistical ideas involved are nontrivial and the essential dependence of the final product on the blending of the two is characteristic. The role of statistics was central,
not auxiliary. The same results could not have been attained without statistics simply
by collecting more data. Important though efficient design and minimization of the
required number of observations are, they are by no means all that statistics has to
contribute, and both examples highlight this. They also seem, to refute the common
statement, "If you need statistics, do another experiment," if yet another refutation
were needed.
The variety of mathematical ideas and applications in science and human affairs that
flows from the primitive notion of quantifying characteristics of populations as opposed
to explaining each individual event in its own "special determinism" would come as a
surprise to William Petty, John Graunt, Edmund Halley and the other founding fathers.
The steps from them to the idea of a population of measurements, described by theoretical
multivariate distributions, and then to the theory of optimal decisions based on samples
from these distributions all of which were needed to lay the foundations for the ECG
application resulted from the efforts of some of the best mathematical talents of our
era. With these developments at our disposal, we are, astonishingly, at the heart of
the problem of computerizing the interpretation of the ECG. For if we take as our goal
minimum misclassification error or minimum cost of misclassification, statistical
procedures of the type used by the Veterans Administration group can be shown to be
preferable to computerization of the traditional cardio- logic procedures [10]. A
statistician who finds himself at one of the many symposia devoted to this problem is
6
prepared to address the central and not the peripheral issues involved. And in that rapid
shift from theory to practice, so characteristic of statistics, he can also sketch out
the design of a collaborative study comparing the performance of the different programs,
which could con- firm or deny this claim [19].
I must, however, not exaggerate. Statistics, although necessary for this problem,
is certainly not sufficient. The problem requires clinical engineering and many other
skills as well. A test relates to the entire package and not just the statistical
component, and this is true for most statistical applications. It will be remembered
that a well-designed, randomized test of the effectiveness of cloudseeding foundered
on the fact that hunters used as targets the receptacles intended for the collection
and measurement of the amount of rainfall, thus fatally im- pairing the variables needed
for evaluation.
It is for this as well as for other reasons that no one has ever claimed that statistics
Bras the queen of the sciences-a claim which, even for mathematics, was dependent on
a somewhat restricted view of what constituted the sciences. I have been groping for
a more appropriate noun than "queen," something less austere and authoritarian, more
democratic and more nearly totally involved, and, while it may not be the mot juste,
the best alternative that has occurred to me is "bed-fellow." "Statistics bedfellow of
the sciences" may not be the banner under which we would choose to march in the next
academic procession, but it is as close to the mark as I can come.
8. EXAMPLES OF INTERACTIONS
How can a person so engaged find the time to maintain and develop his or her grasp
of statistical theory, without which one can scarcely talk about the stimulation and
satisfaction that comes from successful work on someone else's problem? The usual answer
is to distinguish between consultation and research and to insist that time must be
reserved for the latter. This answer comprehends only part of the truth, however, because
it overlooks the complementary relation between the two. Application requires
understanding, and the search for understanding often leads to, and cannot be
distinguished from, research.
The true Joy is to see the breadth of application and breadth of understanding grow
together, with the un- planned fallout the pure gravy, so to speak being the new research
finding. It sometimes turns out not to be so new, but aside from the embarrasing light
this casts on one's scholarship, should not be regarded as more than a minor misfortune.
It is hard to give examples of these interactions which don't become unduly technical
or too long, but some of the issues involved in sequential experimentation may give a
glimpse of the process. Many years ago in trying to construct a sequential screening
test which would require fewer observations than the standard six mouse screen then being
used by an NIH colleague, I was both crushed and baffled to find that, using the sequential
t test, more than six observations were required to come to a decision no matter what
the first six showed. This was crushing because the increase in efficiency promised could
not be realized, and baffling because inability to reach a decision after six
observations, no matter how adverse to the null hypothesis they might be, seemed not
only inefficient, but downright ridiculous.
Much fruitless effort was spent in the following years long after the problems at
which the six mouse screen had been directed had been solved-trying to understand the
difficulty; but only after I realized that the sequential test, unlike the fixed sample
size one, was choosing between two alternatives was the problem clarified. Observations
really adverse to the null hypothesis were almost equally adverse to the alternative
and, hence, because of the analytic machinery of the sequential test, considered
indicative of a large standard deviation, even if all the observations were identical.
In light of this large standard deviation, the test naturally called for more data before
7
coming to a decision. But if a composite rather than a simple alternative with respect
to the mean were considered, or, as I now prefer to say it, if prior probabilities were
assigned to the entire range of alternatives with respect to the mean, rejection of the
null hypothesis even after two sufficiently adverse observations was possible, and the
puzzle disappeared [6].
The problem was intimately connected with the role of stopping rules in the
interpretation of data a question of great importance for the clinical trials with which
I was becoming involved. From this line of investigation, I concluded, as had others
who had previously considered- the problem, that in choosing among two or more
hypotheses, stopping rules were largely irrelevant to interpretation -and, in
particular, that it made no difference whether the experiment was a fixed sample size
or a sequential one. Data is data, just as in the old story, "pigs is pigs."
But the problem does not end there. As Piet Hein puts it, "Any problem worthy of
attack/Proves its worth by hitting back…”
It is not always easy or even possible to cast all hypothesis testing problems in
the form of a choice between alternatives. Isolated hypotheses, for which alternatives
cannot be specified, also arise and for these the stopping rule does matter. Thus. the
original form of the VA algorithm for the computerized interpretation of the ECG computed
the posterior probability that an individual fell in one of a number of predesignated
diagnostic entities, and this probability provided the basis for a choice between these
alternative hypotheses. But a difficulty was posed by the occasional patient who did
not fall into any of the diagnostic entities built into the algorithm which then tended,
in desperation, to spread the posterior probabilities over all the entities that were
built in.
To avoid this, we might have tried to incorporate more entities into the algorithm,
but even in principle this appeared impossible. Nobody knows them all. What seemed
clearly required, and is now being added to the algorithm, is a classical tail area test
of the hypothesis that a patient falls in one of these predesignated diagnostic entities,
to be used as a preliminary screen. If this hypothesis is rejected, the posterior
probabilities, being inappropriate, would not be computed. But this commits one to a
dualism in which stopping rules sometimes matter and sometimes don't, which is
philosophically distasteful, but which seemingly cannot be avoided. There is perhaps
comfort in the story about the physicist, who, in a similar dilemma, considered light
a particle on Mondays, Wednesdays, and Fridays, a wave on Tuesdays, Thursdays, and
Saturdays and who, on Sunday, prayed.
However sketchy, this illustration of the interaction between theory and practice
should indicate that the usual dichotomy is a false one, detrimental to both theory and
practice. Critical attention to application ex- poses shortcomings and limitations in
existing theory which, when corrected, leads to new theory, new applications and a new
orthodoxy in a loop which may be never-ending, and whose convergence properties are in
any event unknown.
9. NONINFERENTIAL ASPECTS OF STATISTICS
This discussion has the disadvantage of giving more emphasis than I would wish to
the inferential and, in particular, to the hypothesis testing aspects of statistics.
Statistical theory, instruction and practice have tended to suffer from overemphasis
on hypothesis testing. Its major attraction in some applications appears to be that it
is admirably suited for the allaying of insecurities which might better have been left
unallayed, or, as Tufte puts it, for sanctification. I recall one unhappy situation in
which an investigator castigated the statistician whose computer-based P-values had not
alerted him to the possibility that his data interpretation had overlooked some
well-known, not merely hypothetical, hidden variables. Both investigator and
8
statistician might have been better off without the P-values and with a more quantitative
orientation to the actual problem at hand.
The concept of dosewise additivity of the first example provides another
noninferenltial example of quantification in statistics. Of course, multivariate
normality implies linear relations so that, even if additivity were not a constant theme
in most branches of science, an interest in it is a natural consequence of a
statistician's other interests. But it is doubtful if the concept of dosewise additivity
would ever have been developed abstractly as a natural consequence of other statistical
ideas, if not for its importance in interpreting joint effects in certain kinds of medical
experimentation.
There are many other problems in quantification which are noninferential in which
statisticians could nevertheless profitably be more active. The wave recognition aspects
of the ECG interpretation, referred to earlier, provides one such example. Many of the
computer-based image processing techniques, e.g., for chromosomes and for trans-axial
reconstruction of brain lesions, provide others. Engineers and computer scientists have
been more heavily involved in such problems than statisticians, largely, I am afraid,
because they are in closer contact with the real problems and more willing to pursue
non- traditional lines of inquiry. Besides computer-based applications, many problems
in modeling biochemical, physiological or clinical processes are mathematically
interesting and substantively rewarding. I was involved in a few such applications [20,
11, 7], as were other statisticians, but as a profession we could do more. If we don't,
others will.
10. NECESSITY FOR OUTSIDE IDEAS
It is much more respectable than it once was to acknowledge that statistical theory
cannot fruitfully stand by itself. There was a time when many statisticians would have
applauded George Gamow's account [14, p. 34) of David Hilbert's opening speech at the
Joint Congress of Pure and Applied Mathematics, at which Hilbert was asked to help break
down the hostility that existed between the two groups. As Gamow tells it, Hilbert began
by saying,
"We are often told that pure and applied mathematics are hostile to each other. This is not
true. Pure and applied mathematics are not hostile to each other. Pure and applied mathe- matics
have never been hostile to each other. Pure and applied mathematics will never be hostile to
each other. Pure and applied mathematics cannot be hostile to each other because, in fact,
there is absolutely nothing in common between them."
But I think that today most of us would be more sympathetic with Von Neumann's account
[23, p. 2063]:
"As a mathematical discipline travels far from its empirical source, or still more, if it is
a second and third generation only indirectly inspired by ideas coming from 'reality', it is
beset with very grave dangers. It becomes more and more purely aestheticizing, more and more
purely l'art pour l'art. This need not be bad, if the field is surrounded by correlated subjects,
which still have closer empirical connections, of if the discipline is under the influence
of men with an exceptionally well- developed taste. But there is a grave danger that the subject
will develop along the line of least resistance, that the stream, so far from its source, will
separate into a multiple of insignificant branches, and that the discipline will become a
disorganized mass of details and complexities. In other words, at a great distance from its
empirical source, or after much 'abstract' in- breeding, a mathematical subject is in danger
of degeneration. At the inception the style is usually classical; when it shows signs of
becoming baroque, then the danger signal is up.... Whenever this stage is reached, the only
remedy seems to me to be the rejuvenating return to the source: the reinjection of more or
less directly empirical ideas. I am convinced that this was a necessary condition to conserve
9
the freshness and the vitality of the subject and that this will remain equally true in the
future."
A rejuvenating return to the source will involve more, however, than sitting in on
a few consultations or leafing through a subject matter article or two in the hope that
they will suggest interesting mathematical problems. A definite expenditure of time and
intellectual energy in one or more subject matter areas is required, not because they
will necessarily generate mathematical problems but because the areas are of intrinsic
interest. This is well understood and would hardly be worth repeating if there were not
some corollaries that are sometimes overlooked.
The first is that the certainties of pure theory are not attainable in empirical
science or the world of affairs. More subtle or at least different critical faculties
are needed to appraise scientific evidence than are needed for evaluating a formal
mathematical proof. For example, in his Double Helix [24], Watson speaks scornfully of
those contemporaries who were unable to see that Avery's experiment made virtually
inevitable the idea that the nucleic acids rather than the proteins were the source of
the genetic material. However it may have seemed at the time, the subsequent course of
events has confirmed Watson's judgment. Ability to see where evidence is pointing before
the last piece is in, and, contrariwise, ability to see the numerous directions in which
it is pointing from only the first piece, requires as much cultivation by the statistician
interested in the field as by the specialists, perhaps more.
Taking an example a little closer to mathematics, the well-known set theorist Zermelo
was also interested in physics and, in fact, translated Willard Gibbs' work on
statistical mechanics into German. He was so over- whelmed with its logical deficiencies,
however, that his remaining contributions to statistical mechanics consisted of attacks
on Gibbs. It remained for other physicists, who, although also interested in logical
purity, were equally interested in the subject matter, to take the constructive step
of modifying Gibbs' formulation to eliminate the deficiencies [3].
In statistical applications one can detect this same search for purity, as manifested
in a reluctance to make assumptions, even though they can lead to important
simplifications when approximately true, or in an unwillingness to admit prior
probabilities even when, as in the diagnosis problem, they have a well-defined frequency
interpretation and are required for optimum properties. It also shows up in a certain
type of statistical criticism of scientific results, in which pointing to a potential
weakness is considered equivalent to demolition. Bross some effort to demonstrate the
reality as well as the potentiality of the weakness be required does not seem to have
dimmed this quest for purity, at least as manifested in some recent statistical
criticisms.
A second corollary is that any but a nominal movement towards applications will have
profound implications for graduate education. Students need a thorough grounding in
theory, but they need a perhaps equally thorough supervised exposure to the world of
applications, and this latter exposure will compete with time now devoted to theory.
Every one will have his own choice of dispensable subjects, but the major problem is
how to provide the applications. All too often we have behaved like a hypothetical medical
school which turned out physicians with no internship or residency training, and in some
cases with no training beyond the pre-clinical basic science courses. The better students
would no doubt overcome this deficiency in time, but the patients could scarcely be
expected to welcome their initial ministrations. A formal statistical internship with
a commitment of time comparable to that of the medical intern is a desirable step in
the direction of applications. But we need the equivalent of the teaching hospital as
a source of problems as well, and in most cases this will require arrangements with
government and/or industry. The ASA, I am pleased to report, has taken some significant
steps in this direction; but a great deal more remains to be done, not only by ASA, but
by everyone interested.
10
11. THE APOLOGY
But I have wandered a bit from my main theme the apology. I have no new ideas to add
to what I have already said and if my hypothetical critic, like some of my real ones,
is still unconvinced, he must remain so. As Justice Holmes said, "You cannot argue a
man into liking a glass of beer."
But I do want to amplify slightly my earlier expression of pleasure at having spent
so little time in finding out that I was destined to be a statistician and not any of
the other things that my friends and relatives thought I might be. I came to Washington
during the Great Depression, taking a job in the Bureau of Labor Statistics, at the
princely salary of $1,368 per year-not per month. It didn't take long, however, to become
interested in a variety of statistical problems embedded in the social and economic
problems of the time. The Department of Agriculture was then publishing what it called
an adequate diet at minimum cost, but I doubted, in the arrogance of youth, that anyone
involved knew how to find a minimum. It soon turned out that neither did I. The
minimization of a linear function subject to a set of linear constraints was not covered
in my calculus text and was beyond my powers to develop although it is now made clear
in texts for freshman. But there were plenty of other problems to keep one occupied,
some within my powers. Nobody knew how many unemployed there were, and sampling seemed
the way to find out. Learning, developing and applying sampling theory was my first really
exciting post-school intellectual experience. Statistics had me hooked, and the monkey
has been on my back ever since.
In later years, it came as a pleasant surprise to find out that other people were
watching appreciatively. It would have been less than human not to take pleasure in the
consultantships, memberships in review and advisory committees and, most of all, in the
very great honor you have done me by electing me President of the American Statistical
Association. But, of course, these are simply the visible symbols.
Whitehead [25, p. 225], at one point, speaks of
"... the
human mind in action, with its ferment of vague obviousness, of hypothetical formulation, of renewed
insight, of discovery of relevant detail, of partial understanding, of final conclusion, with its
disclosure of deeper problems as yet unsolved."
I can perhaps avoid presumption and still suggest that the portrait I have tried to
sketch is not wholly unlike Whitehead's image of a mind in action, and that the
statistician's apology and justification is neither more nor less than that the practice
of his profession requires exactly that.
REFERENCES
[1] Babbage, C., "Passages from the Life of Philosopher," re- printed in part in Morrison, P.
and Morrison, E., eds., Charles Babbage and His Calculating Engines, New York: Dover
Publications, Inc., 1961.
[2] Bernard, C., An Introduction to the Study of Experimental Medicine, New York: The MacMillan
Co., 1927.
[3] Born, M., Natural Philosophy of Cause and Chance, Oxford: Clarendon Press, 1949.
[4] Brazier, M.A.B., "Analysis of Brain Waves," Scientific American, 206 (June 1962), 142-53.
[5] Bross, I.D.J., "Statistical Criticism," Cancer, 13, No. 2 (March-April 1960), 394-400.
[6] Cornfield, J., "A Bayesian Test of Some Classical Hypotheses- With Applications to Sequential
Clinical Trials," Journal of the American Statistical Association, 61 (September 1966), 57794.
[7] , Steinfeld, J. and Greenhouse, S., "Models for the Interpretation of Experiments Using Tracer
Compounds," Biometrics, 16, No. 2 (June 1960), 212-34.
[8] -, Dunn, R.A., Batchlor, C.D. and Pipberger, H.V., "Multigroup Diagnosis of
Electrocardiograms," Computers and Biomedical Research, 6, No. 1 (February 1973), 97-120.
11
[9] , "Principles of Research," American Journal of Mental Deficiency, 64, No. 2 (September 1959),
240-52.
[10] ,"Statistical Classification Methods," in Jacquez, J.A., ed., Computer Diagnosis and
Diagnostic Methods, Springfield, Ill.: Charles C Thomas, 1972, 355-73.
[11] Folk, J.E., et al., "The Kinetics of Carboxypeptidase B Ac- tivity, III. Effect of Alcohol
on the Peptidase and Esterase Activities: Xinetic Models," Journal of Biological Chemistry,
237, No. 10 (October 1962), 3105-9.
[12] Gaddum, J.H., Pharmacology, London: Oxford University Press, 1953.
[13] Galton, F., Inquiries into Human Faculty and Its Development, London: MacMillan and Co.,
Ltd., 1883.
[14] Gamow, G., One, Two, Three ... Infinity, New York: New American Library, 1953.
[15] Gullino, P., et al., "Studies in the Metabolism of Amino Acids, Individually and in Mixtures,
and the Protective Effect of L-Arginine," Archives of Biochemistry and Biophysics, 64, No.
2 (October 1956), 319-32.
[16] Haisty, WI{., Jr., et al., "Discriminant Function Analysis of RR Intervals: An Algorithm
for On-Line Arrhythmia Diag- nosis," Computers and Biomedical Research, 5, No. 3 (June 1972),
247-55.
[17] Hardy, G.H., A Mathematician's Apology, London: Cambridge University Press, 1940.
[18] Panel on Carcinogenesis of the Advisory Committee on Protocols for Safety Evaluation of the
Food and Drug Administration, Toxicology and Applied Pharmacology, 20, 1971, 419-38.
[19] Pipberger, H.V. and Cornfield, J., "What ECG Computer Program to Choose for Clinical
Application," Circulation, 47 (May 1973), 918-20.
[20] Scow, R.O. and Cornfield, J., "Quantitative Relations Between the Oral and Intravenous
Glucose Tolerance Curves," American Journal of Physiology, 179, No. 3 (December 1954), 435-8.
[21] Shimkin, M., "Upon Man and Beast-Adventures in Cancer Epidemiology; Presidential Address,"
Cancer Research, 37, No. 7 (July 1974), 1525-35.
[22] Thistle, J.L. and Hofmann, A.F., "Efficacy and Specificity of Chenodeoxycholic Acid Therapy
for Dissolving Gallstones," New England Journal of Medicine, 289, No. 13 (September 27, 1973),
655-60.
[23] Von Neumann, J., "The Mathematician," reprinted in J.R. Newman, ed., The World of Mathematics,
Vol. 4, New York: Simon and Schuster, 1956, 2053-63.
[24] Watson, J.D., The Double Helix, New York: Athaneum, 1968.
[25] Whitehead, A.N., "Harvard, the Future," Science and Philosophy, New York: The Wisdom Library,
1948.
Journal of the American Statistical Association March 1975, Volume 70, Number 349, pp. 7-14, Presidential
Address.
12
Download