Data And Phenomena: A Distinction Reconsidered

advertisement
BRUCE GLYMOUR
DATA AND PHENOMENA: A DISTINCTION RECONSIDERED
ABSTRACT. Bogen and Woodward (1988) advance a distinction between data and phenomena. Roughly, the former are the observations reported by experimental scientists, the
latter are objective, stable features of the world to which scientists infer based on patterns in
reliable data. While phenomena are explained by theories, data are not, and so the empirical
basis for an inference to a theory consists in claims about phenomena. McAllister (1997)
has recently offered a critique of their version of this distinction, offering in its place a
version on which phenomena are theory laden, and hence on which the empirical support
for inferences to theories is also, unavoidably, theory laden. In this commentary I argue
that McAllister and Bogen and Woodward are mistaken in thinking that the distinction is
necessary, and that the empirical support for inferences to theories is not necessarily theory
laden in the way McAllister’s account entails they are.
Bogen and Woodward in their (1988) are concerned to address three problems about the nature of scientific evidence. (1) Observations are taken to
provide reason to accept theories because they provide evidence for those
theories. But theories do not in general explain any particular observational
datum. How then can any particular datum or set of such be evidence for a
theory, and if observations cannot be evidence for theories, how can they
provide reason for accepting a theory? Say that a datum or set of data count
as reason to accept or reject some theory provided it bears some particular
relation to the theory, and call that relation the evidential relation. What,
Bogen and Woodward ask, is the evidential relation? (2) While some data
are counted as evidence for or against theories, other data are not even
potential candidates for evidential status. Although these data appear to
bear the same logical relationship to relevant theories as do other data,
they are simply ignored as artifacts of the experimental design, as so much
experimental noise. What property or properties of a datum or a data set,
then, make it a candidate for evidential status, or, as I shall say, what are the
qualifying properties a datum or data set must exhibit if it is to be a candidate for evidential status? (3) Are the evidential relation and the qualifying
properties such that scientific evidence must necessarily be theory laden,
i.e. be infected by the theories employed by the scientists collecting data,
or inferring from data to theories?
Erkenntnis 52: 29–37, 2000.
© 2000 Kluwer Academic Publishers. Printed in the Netherlands.
30
BRUCE GLYMOUR
Bogen and Woodward seek to answer these questions by distinguishing between data and phenomena. The two sorts of entities are held to
differ in both their ontological and epistemological status. The two differ
epistemically because phenomena are, while data are not, explained by
theories. Hence phenomena can be direct evidence for theories, while data,
at most, can be only indirect evidence for theories. Further, knowledge
of data is non-inferentially justified, while knowledge of phenomena is
inferentially justified. To say that a scientist is wrong about the data she
reports is necessarily to say that she did not in fact see what she claims
to have seen, while to say that a scientist is wrong about the phenomena
she reports need only be to say that she has drawn incorrect inferences
from what she indisputably did see. The two differ ontologically in that
phenomena are stable, repeatable features of the natural world, while data
are not. Phenomena are ineliminable, bedrock elements of the furniture
of the world. Particular observations, however, are one time occurrences
that result from accidental collocations of, and hence causal interactions
between, a veritable legion of causes and conditions.
Using the distinction, Bogen and Woodward claim that scientific inference works as follows. Data are gathered, and sorted as to their reliability.
From patterns in reliable data, scientists infer to the existence of phenomena, which either are, or are not explained by theories. When one can
infer from reliable data to a phenomenon or its absence, the claim that the
phenomenon exists, or does not, counts as evidence for or against relevant
scientific theories. Reliability, consequently, is the qualifying property that
data must have if they are to be candidates for evidential status, though
Bogen and Woodward do not say, precisely, what it is for a datum or data
set to be reliable. The evidential relation between theory and data, insofar
as there is one, is then composed of some inferential relation between data
and phenomena (data are evidence for phenomena) and another between
phenomena and theory (phenomena are evidence for theories). Again, Bogen and Woodward do not say what sorts of inferential relations count here.
Thus, data are, at best, indirect evidence for a theory.
Bogen and Woodward then further argue that phenomena are not simply
theory-laden observations. Of course, if ‘observation’ or ‘perception’ are
defined loosely enough, then phenomena will so count. But, according to
Bogen and Woodward, any definition of these terms on which phenomena
do so count is simply too loose to be informative. If observations are taken
to be produced by distal causes which are, causally, relatively close to human perceptual experiences, i.e. bloges on screens or graph paper, clicks
of a Geiger counter, flashes on photographs, the position of pointers on
meters and so on, then claims about what phenomena exist are not simply
DATA AND PHENOMENA: A DISTINCTION RECONSIDERED
31
theory laden observations. Rather they are claims justified by inferences
from reliable data, i.e. reliable observations of bloges, clicks, flashes and
positions.
Such inferences can of course be more or less reliable, more or less
cogent, quite independently of the reliability of the data on which they are
based. And one way in which these inferences can go awry is that they
assume one or another background theory which happens to be fallacious.
While this makes inferences to phenomenal claims theory dependent, it
does not make claims about the existence of phenomena necessarily theory
laden in the relevant sense, for the sense in which inferences go awry here
depends on the possibility of the inferred phenomenal claims being false
even though the investigator believes them to be true. The existence or
non-existence of any particular phenomenon cannot, then, be determined
simply by what the investigator believes about the phenomenon. Phenomena are therefore objectively real, in some strong sense of those terms,
though of course we may be mistaken about the claim that any particular
phenomena does, or does not, exist.
In a recent paper McAllister (1997) claims against Bogen and Woodward that while the distinction between data and phenomena is both useful
and cogent, (a) Bogen and Woodward have not managed to draw that distinction in the right way, and (b) on the correct distinction, phenomena
are themselves investigator relative features of the world. His case is this.
Every data set can be understood as being composed of some signal, i.e.
stable pattern, and a certain level of noise. Indeed, for any given (non-zero)
level of noise, any data set will exhibit an infinite number of patterns.
So any data set you please will exhibit an infinite number of patterns.1
McAllister then claims that no objective facts about the empirical evidence constrain which of the infinite patterns exhibited by a data set one
ought to take to be phenomena. Since phenomena are supposed to be basic
ontological features of the world, they cannot be infinite in number, so not
all patterns can count as phenomena. The scientist must therefore privilege
some but not all of the patterns as phenomena, and the choice of which to
so privilege is unconstrained by the evidence.
It then follows from McAllister’s account that the choice about which
patterns to recognize as phenomena can only be made by the investigator on subjective grounds, grounds that presumably cannot but include
her prior theoretical commitments. For McAllister, one important of these
commitments is the level of noise the investigator is willing to tolerate, but
this choice alone will not be enough to identify the phenomena among the
plethora of patterns, since for any noise level there are an infinity of patterns exhibited by the data. Thus, phenomena can be identified only against
32
BRUCE GLYMOUR
some commitment over and above one’s choice about how much noise to
tolerate. Phenomena, therefore, cannot be objective features of the world:
while their status qua pattern in the data is clearly objective, they are distinguished from other patterns as phenomena only by investigator relative
commitments, commitments that are not themselves subject to objective
criticism, since the recognition of any pattern whatsoever as a phenomenon
requires that similarly non-empirical commitments be employed.
McAllister apparently thinks the distinction between data and phenomena is an essential element in any clear philosophic account of scientific
practice. And on the understanding of phenomena he defends, such entities are investigator relative, rather than objective, features of the world.
Hence, on his account, the empirical support for any inference to a theory
is necessarily theory laden.
I am unconvinced. While I think McAllister has recognized a serious
flaw in the distinction advanced by Bogen and Woodward, and that their
account simply does not work very well, I argue that no such distinction
between data and phenomena is needed, and that the distinction which
is needed, and is already well established in the relevant literature, does
not entail the sort of relativism required by McAllister’s version of the
distinction between data and phenomena. Perhaps the place to begin is by
making somewhat more precise the notion of ‘pattern’ employed both by
Bogen and Woodward and by McAllister.
Consider a standard, though certainly not the only, method for discovering causal relations among variables (cf. Spirtes et.al. 1993). One
constructs a sample of data by measuring, in observational or experimental
contexts, the joint distribution of values among these variables. One treats
the sample as a sample from a population of data with a particular statistical structure, given by a probability density function on joint values
for the variables. This function entails certain conditional independence
relations among the variables. One then performs a double inference. One
first calculates various sample statistics, and infers from the values of these
sample statistics to a model of the population of data. From the population
model and the conditional independencies entailed by it, one infers a causal
model, or class of such, that accounts for the conditional independence
relations in the population model. While not all statistical inferences have
exactly this form (not all, for example, are inferences to causal structure),
the distinction between sample and population structure is essential, and
in particular statistical inferences always move from a claim about sample
statistics to the inferred proposition, whatever this may be or be about.
So while I do not know exactly what is meant by ‘pattern’ in Bogen and
Woodward’s work, I take it their various accounts are meant to be general,
DATA AND PHENOMENA: A DISTINCTION RECONSIDERED
33
and hence that by ‘pattern’ Bogen and Woodward and McAllister anyway
ought to mean at least to include the values of sample statistics.
The Bogen and Woodward version of the distinction between data and
phenomena relies heavily on supposed differences in the epistemic status
of data and phenomena. Claims about data are not subject to certain kinds
of epistemological challenges that claims about phenomena are, but claims
about phenomena are explicable in ways that claims about data are not,
according to the distinction advanced by Bogen and Woodward. This supposed difference is illusory: certain entities have both the epistemically
foundational status of data and are susceptible of explanation by theory in
just the way phenomena are.
If we are certain, at least in the relevant sense, of the observations comprising a data set, then the mean value of a variable in the data, or its variance, the shape of the distribution, correlations between variable values,
and so on, are no less certain. So sample statistics have the same epistemic
status as the observation reports comprising the data in the sample. But
it is precisely this sort of statistical feature of data sets that are explained
by scientific theories. While a correlation between variables A and B is
no guarantee that there is a causal relation between the two, such a causal
relation does explain an observed correlation between the two variables.
Moreover, it is just such statistical features of distributions of data that
are repeatable, and indeed it is these that one expects to recover if one
repeats an experiment whose results one seeks to verify. Hence, data and
phenomena are, minimally, not exclusive of one another. At the very least,
then, the distinction Bogen and Woodward draw is not as sharp as it ought
to be.
Nothing I have said so far challenges either McAllister’s critique of
Bogen and Woodward, or his preferred account of the distinction between
data and phenomena. But unlike McAllister and Bogen and Woodward,
I am not convinced that any such distinction is necessary, nor that the
empirical warrant for inferences to theories is necessarily theory laden in
the way that McAllister’s account suggests it must be.
Suppose the scientific inferences of interest are statistical inferences
with the structure suggested above. We can either take the distinction between
data and phenomena to correspond exactly to the distinction between sample
and population structure, or not. Suppose we take the distinction between
data and phenomena to involve something over and above the distinction
between sample and population structure. Then statistical inference procedures, and methodological justifications for them, will not require the
distinction between data and phenomena, and hence the distinction will
be unnecessary. Suppose we deny that the distinction between data and
34
BRUCE GLYMOUR
phenomena involves something over and above the distinction between
sample and population structure, i.e. we take the statistical structure of
data samples, i.e. sample statistics, to correspond to data and population
structure, i.e. population parameters, density functions and conditional independencies, to correspond to phenomena. Then the distinction between
data and phenomenon simply gives a new name to a distinction which
is already deeply embedded in the literature on statistical inference. The
terminological reform is unnecessary and in some respects misleading,
and hence should be avoided. Moreover, since on some statistical inference
procedures, e.g. Bayesian scoring procedures, one infers directly from sample
structure to theory, the distinction between data and phenomena will not
play any essential role in these sorts of inferences or the justification of
these inferential methods.
This leaves only the question of whether statistical methods are necessarily subjectivist in the sense suggested by McAllister. Some methods
clearly are, e.g. Bayesian methods when employed with subjectivist account of probability. Others, however, are not. If we assume, with Bogen
and Woodward and McAllister, that the observation reports comprising the
data are not themselves theory laden in the relevant sense, then inferences
from the data to causal structure can be theory dependent in two ways.
First, the inferences may rely on causal assumptions about which variables
are causally connected, about the functional form of the connection (are the
equations linear or quadratic?), and about the level of noise in the data (is
the noise generated by probabilistic or indeterministic dependencies, or by
some unmeasured deterministic cause?). So, for example, one might well
assume that the value of variable 2 at t2 cannot exert a causal influence on
the value of variable 1 at t1 if t1 is prior to t2. Not all such assumptions are
so innocuous, and even innocuous assumptions may be mistaken. Hence
inferences employing such assumptions are theory dependent.
But this sort of theory dependence is essentially dissimilar from that
required by McAllister’s version of the distinction between data and phenomena. There is, on his account, no objective reason to prefer recognizing
one pattern among many as a phenomenon, and for this reason there is no
possibility of offering a cogent, objective criticism of any given choice
about which patterns are to be taken to correspond to phenomena. On
his account such choices are arbitrary because they simply can be nothing other than arbitrary. Clearly, however, there can be cogent reasons,
founded on objective empirical or conceptual resources, for objecting to
the set of causal assumptions that underwrite any particular inference from
data to causal structure (cf. Spirtes et.al. 1993). Hence the assumptions
about physical theory that underwrite an inference from data to causal
DATA AND PHENOMENA: A DISTINCTION RECONSIDERED
35
structure need not be, and often are not, arbitrary. Assumptions about physical theory on which statistical inferences depend are subject to cogent
critique, whereas the assumptions that underwrite an inference from data
to phenomena, on McAllister’s account, are not.
Second, the causal structures to which one infers on the basis of a given
set of data depend on the statistical methods one adopts. Not all methods
are appropriate in all contexts, and there is an ongoing controversy about
which methods are best used in which contexts (see, for example, Hellman
1997a and 1997b; Kelly et. al. 1997; Korb and Wallace 1997; Sprites et.
al. 1993). A claim about the appropriateness of a given method in a given
context is itself a theoretical claim, and so the choice of statistical method
is itself a theoretical assumption of a sort. Hence there is this second way
in which inferences to causal structure are theory dependent. But again,
this theory dependence is quite different than the theory ladenness under
which claims about phenomena suffer on McAllister’s account. First, the
theories in question here are in general not physical theories, but rather
mathematical theories. On some Quineian or Millian accounts, the difference in epistemological warrant for empirical and mathematical theory is
illusory. On others, however, it is not. But second, even if one does not
think there is some essential difference in the epistemological warrant we
can have for mathematical as opposed to physical theories, the statistical
methods we adopt are subject to criticism in a way that inferences to phenomena are not. There are well defined theories of reliability, on which the
reliability of the various methods are assessable (cf. Kelly 1996). If one
adopts, in a given context, a method which is unreliable in a given sense,
then one cannot also endorse that notion of reliability. Those who both
adopt the method and endorse the notion of reliability are committed to an
inconsistency, and hence subject to cogent, objectively grounded criticism,
as inferences to phenomena, on McAllister’s account, are not.
The naive ontological distinctions between (1) observations of events,
(2) the causes, conditions, and properties that produce the observed events,
and (3) the natural kinds to which such events belong, are certainly cogent.
More, the distinctions are essential if one is to clearly delimit the epistemological difficulties scientists confront. So too the conceptual distinctions
between the variable values comprising a data set, the sample statistics
characteristic of the data set, and population parameters characteristic of
a population of data are essential for describing and justifying various
methods for scientific inference. What is not necessary is a sophisticated
distinction between data and phenomena on which these concepts play
some essential role in describing the structure of scientific inferences, or
in justifying inferences with that structure.
36
BRUCE GLYMOUR
The work of Bogen and Woodward is interesting and for various historical reasons important (it was, for example, among the first work by serious
analytic philosophers that was attentive to the intricacies of experimental
science). But I think that McAllister is right in claiming that the central distinction between data and phenomena offered there is inadequate, though
he and I have slightly different reasons for thinking this. Unlike McAllister,
however, I do not see why the distinction need play any important, much
less indispensable, role in our philosophic account of scientific practice.
No such distinction is necessary because the inference from observation
to causal theory or classification is invariably statistical. The description
and justification of statistical inferences requires the above mentioned statistical concepts, but not those of data and phenomena. Further, if data
points are not theory laden, then neither are sample statistics, and the
latter are both theoretically explicable and replicable. Given a set of the
former, the latter are constructable, and given the latter, non-theory laden
inferences to both population models and causal models are possible. As a
consequence, I do not see why the empirical basis for inferences to theories
(phenomena for McAllister, sample statistics and/or population models for
me) need be essentially subjectivist or theory-laden in the way McAllister’s
account entails they must be.
NOTES
∗ Thanks are owed to James McAllister for many helpful comments on previous versions
of this paper.
1 Indeed, any finite data set is guaranteed to exhibit an infinite number of patterns even
with zero noise. McAllister is not unaware of this familiar point; he simply takes some data
sets, e.g. the continuous pen trace left by a seismograph, to be of infinite size. McAllister
is exactly right that any such data set exhibits exactly one pattern with zero noise, but I
think this is not a relevant data set. One does not often infer from any single such trace,
but rather from a finite set of such traces. Unless the traces agree exactly about the values
of each variable at each time, no deterministic relationship between measured variables is
exhibited with zero noise, since for each pattern there will be at least one data point on
at least one trace that is inconsistent with the pattern. The difficulty can be resolved in
either of two ways. One can describe a deterministic pattern between measured variables
and at least one unmeasured variable, provided the value of this variable differed during
the experimental runs recorded by inconsistent traces in the data set. In this case, the data
set exhibits an infinity of patterns with zero noise, since the data set includes values of the
measured variables for only a finite set of values for the unmeasured variable. Differently,
one can solve the difficulty by taking the relationship between measured variables to be
essentially probabilistic. In this case no set of observations is logically inconsistent with
any pattern whatsoever, and so again an infinity of patterns are exhibited with zero noise.
DATA AND PHENOMENA: A DISTINCTION RECONSIDERED
37
REFERENCES
Bogen, J. and J. Woodward: 1988, ‘Saving the Phenomena’, Philosophical Review 97,
303–352.
Hellman, G.: 1997a, ‘Bayes and Beyond’, Philosophy of Science 64, 191–221.
Hellman, G.: 1997b, ‘Responses to Maher, and to Kelly, Schulte and Juhl’, Philosophy of
Science 64, 317–322.
Kelly, K.: 1996, The Logic of Reliable Inquiry, Oxford University Press, Oxford.
Kelly, K., O. Schulte and Cory Juhl: 1997, ‘Learning Theory and the Philosophy of
Science’, Philosophy of Science 64, 306–316.
Korb, K. and C. Wallace: 1997, ‘In Search of the Philosopher’s Stone: Remarks on
Humphreys and Freedman’s Critique of Causal Discovery’, British Journal for the
Philosophy of Science 48, 543–553.
McAllister, J.: 1997, ‘Phenomena and Patterns in Data Sets’, Erkenntnis 47, 217–228.
Spirtes, P., C. Glymour and R. Scheines: 1993, Causation, Prediction and Search,
Springer-Verlag, New York.
Department of Philosophy
Kansas State University
Manhattan, KS 66506
U.S.A.
Download