An example of a poster - CSSCR

advertisement
Through a Filter, Darkly: Signal, Interference, and Noise in Demographic
Temporal Frequency Analysis
William A. Brown, University of Washington, Department of Anthropology
Introduction
Our ability to archaeologically identify changes in past population
sizes is relevant to diverse research topics, including the identification of
colonization processes; causality in sociopolitical and economic change
and cultural transmission; and determinants of ecological sustainability.
Over the last four decades,1,10,14,18 this need has increasingly been met by
using temporal frequency distributions (tfds) – time series data describing
temporal variation in the abundance of archaeological deposits – as proxy
census records. All else being equal, larger populations discard a greater
abundance of materials than smaller ones, implying that tfds should
register temporal variation in population size.
Yet, even as interest in temporal frequency analysis (TFA), especially
its paleodemographic application, grows, archaeologists are also
becoming increasingly aware of the limitations constraining this
approach. Heated disputes have recently flared up surrounding
overextensions of the approach, though the potential for such abuses was
already anticipated over two decades ago.7,10 Factors confounding
straightforward demographic interpretations of tfds can be divided
between those that introduce interference into the signal, systematically
violating the assumption of proportionality between the population size
curve and the archaeological record:
• Diachronic change in the per capita deposition rate;
• Time-transgressive taphonomic bias;14,15
• research bias, resulting from the application of particular survey
methods and research agendas
and those that introduce noise:
• random variability around the deposition rate;
• random variability in deposit survival/destruction;
• survey sampling error;
• sufficiency of sample hygiene criteria.
Figure 1. spd (gray) plotting the temporal
distribution of 166 archaeological site
occupations from the Kodiak Archipelago
in the Gulf of Alaska. The rug plot below
marks the median age estimate for each
individually dated site occupation. Vertical
dashed lines demarcate the timespan 850750 cal BP/A.D. 1100-1200, a century
purportedly marked by a dearth of
archaeological deposits.6
The problem
While many sources of interference and noise (I/N) are generic to
demographic TFA, a handful are specific to TFA supported by radiocarbon
(14C) age estimation. Instrument measurement error and secular variation in
the concentration of atmospheric 14C impose unique computational demands
on tfd construction, these being met by 14C age calibration and the summed
probability distribution (spd; see figure 1), a class of tfd in which the
probability density of each probabilistic age estimate in the sample is
summed at each location along the timeline: 𝑠𝑝𝑑 𝑡 = 𝑛𝑖=1 𝑓𝑖 𝑡 .
Though working with spds is best practice in 14C-supported TFA, this
does not resolve all demographic confounders arising from 14C age
estimation. On the contrary, 14C-based spds frequently exhibit anomalous
and extreme peak-and-trough structures arising from several nondemographic transformations:
1. sampling error, introducing structural disparities between the
underlying TFD and the sample tfd (figure 2);3,18
2. the dispersive mapping of calendric ages into 14C ages;2
3. additional dispersion resulting from instrument measurement error;2,7
4. disparities in the expression of calendric age estimates resulting from
the mapping of probabilistic 14C age estimates through calibration
curves characterized by varying slopes (figure 3 lower panel, and
figure 4)2,5,7,8,11,12,13,16,17,18.
These sources of non-demographic I/N create equifinality between real
demographic structures and artificial ones, and more than once such
structures have been prematurely identified as demographic by the unwary.
In response, a handful of simulation experiments have been conduct to
expose the influence of one,3 two,7,18 or three2 of these factors on tfd,3,7
including spd,2,18 morphology. Experimental results have variously lead
these researchers to cautionary tales2 and to the prescription of best-practice
protocols to mitigate them.7,18
However, the sequential interaction of all four factors remains to be
explored. In addition, a preoccupation with the structuring influence of
calibration interference has overshadowed our understanding of that of the
other three.2,7,18 Finally, the robustness of interpretation of these simulation
experiments has been limited by unfavorably small simulation sizes, usually
involving either the characterization of a single simulation run2,3,7 or the
comparison of two.2 The most ambitious simulation experiment to date,
Williams’,18 involves statistical comparisons of thirty iterations per
experiment – still a small simulation by most counts. The insightfulness of
Williams’ experiments is additionally limited by the fact that they involve
resampling, disallowing comparison between each simulation run and the
unknown underlying TFD.
Figure 2. Sample tfd describing the
temporal distribution of 200 observations,
randomly sampled from a truncated
exponential TFD (blue line). The TFD is
proportional to a population undergoing
constant growth between 4000 and 1000 cal
BP, with a doubling time of 1000 years.
The tfd is represented in three ways: as a
histogram, rug plot, and kernel density
estimate (red line).
figure 4. Simulation of 10,000 randomly
generated 14C age measurements, specified
as in figure 3, upper panel. The sample
distribution is expected to approximate a
normal distribution specified as N 𝜇 =
Figure 3. Upper panel: refractiondispersion paths for ten simulated estimates
of a single calendric age (12000 cal BP),
each mapped through IntCal13,9 with an
instrument measurement offset following a
normal distribution specified as N 𝜇 =
Figure 5. two hypothetical 14C age
estimates with identical precisions on the
14C age axis (vertical): 475±30 rcyBP (blue
curve) and 875±30 rcyBP (red curve).
IntCal13 is shown in the main panel (solid
black lines),9 along with an ideal 1:1
relationship between calendric and 14C time.
The different mapping relationships
between the two 14C estimates and the
calibration curve has led to very different
expressions of these two age estimates on
the calendric timeline.
Methods
Because the first three transformations operate stochastically (the latter two entailing a propagation of error; figure 3 upper panel and figure 4),
the challenge of exploring the sequential influence of all four on spd morphology recommends a Monte Carlo (MC) simulation approach,
involving a large simulation size (e.g., 1000 runs) and statistical evaluations thereof. Parameters to consider in such an approach should include (a)
the influence of sample size on sampling error (process 1); (b) TFD location and shape (process 2); (c) degree of and variation in instrument
measurement error (process 3); (d) selection of calibration curve (processes 2 and 4); and (e) calibration algorithm (process 4).17
The MC experiments presented here were implemented in R using code written by the author. Results are preliminary, focusing primarily on
the influence of sample size on non-demographic I/N in spds. Six sample sizes approximating characteristic archaeological samples were analyzed
(n=30, 50, 100, 200, 500, and 1000). The underlying TFD was the uniform distribution, owing to this distribution’s featureless morphology and
thus the ease of detecting non-demographic structures in spds. The TFD was spread over the interval 8000-1000 cal BP. Simulated offset from
instrument measurement error followed a normal distribution specified as N 𝜇 = 0, 𝜎 = 50 . Only one calibration curve was considered,
IntCal13.9 My calibration algorithm reproduces the output of OxCal and CALIB (though not CalPal17). For each sample size, 1000 simulation
runs were generated, and each set of 1000 runs was summarized according to five percentiles, determined for each five-year interval along the
timeline (t): Minimum, 2.5%, median, 97.5%, and maximum. Critical tests measuring the coherence between simulated spds and the underlying
TFD will be conducted in the future. The present analysis is limited to the exploration of the patterning observed in the percentiles, both over time
and between sample sizes.
Figure 6. Results of Six
MC simulation
experiments, summarizing
variability between 1000
simulated spds. Variability
is depicted as the mid-95%
range (pink shaded area)
and min-max range (dashed
red lines). The median
(solid red line) and
underlying TFD (solid blue
line) are also shown. All
parameters except sample
size are held constant
between simulations. Each
simulation considers a
different sample size: n=30,
50, 100, 200, 500, and
1000.
Results and discussion
Two distinct patterns emerge from the comparison of simulations between sample sizes (figure 6):
1. The magnitude of variance between runs decreases as sample size increases, converging toward the underlying TFD as expected. However,
for small and even moderate sample sizes, the minimum and 2.5% boundaries are at or slightly above 0, while the median falls below the
TFD. Medians are still occasionally noticeably below the TFD at n=100 but are negligibly different by n=200. The lower 2.5% boundary
similarly begins to lift away from 0 by n=200, while the minimum boundary has only done so by n=500. Future MC experiments could
further elucidate the sample sizes necessary to achieve these “liftoffs,” but even at this coarse grain, the cautionary tale bares out that small
to moderate samples should be expected to generate artificial depressions and gaps in the record with greater frequency than desired. By
extension, these will then be counterbalanced by anomalous peaks, which can be quite severe.
2. While past discussions of calibration interference have asserted that steep calibration curve slopes generate extreme peaks and that low
slopes and plateaus depress them,2,7,18 the results of the simulations shown in figure 6 suggest a more nuanced understanding of the
problem. Here, certain intervals along the timeline are characterized by less variability both above and below the model expectation than
others. Furthermore, while the magnitude of variance at these locations contracts as sample size increases, their location persists between
sample sizes, suggesting that their locationality is an artifact of interference, specifically of the calibration curve slope, whereas their
direction and magnitude are artifacts of noise. What this implies for paleodemographic practice is that, if two researchers were to
independently investigate the occupation history of an identical study region, they should expect to reach consensus more quickly for those
intervals characterized by low calibration curve slopes than for those characterized by high. By implication, when only one sample spd is
available for a given region and the size of this sample is small, the volatility characterizing steep-sloped stretches of the calibration curve
should convince us to suspend judgment regarding any extreme spd structure falling within these regions, peak and trough alike.
Possible solutions to these problems include
1. Increase sample size, either by conducting extensive field surveys to identify a greater number of datable archaeological deposits, or by
combing museum collections to identify undated but dateable deposits (or both).
2. When calculating growth rate estimates from spds,4 avoid comparison between volatile intervals; identify those intervals characterized by
the lowest inter-run variance and compare spd values within or between these intervals.
3. Keeping in mind that spds are sample distributions, subject them to kernel density estimation, an estimation technique in inferential
statistics intended to approximate the probability density function underlying a sample distribution. Kernel density estimation involves the
application of moving, distance-weighted averages to sample distributions. Defining appropriate distance-weighting functions that are
ideally suited for TFA awaits further simulation investigation. To date, the leading proposal is that of Williams,18 who recommends the use
of 500- to 800-year moving averages. His recommendation focuses on the neutralization of calibration interference, though it should also
act to mitigate the effect of sampling error. Future work should explore kernel shapes aside from the rectangular kernel implicit in
Williams’ prescription, as well as optimization algorithms for kernel bandwidth that are responsive to both sample size and features of the
sample distribution.
Future MC experiments will explore the additional influence of variable instrument measurement error on inter-spd variability. They will also
explore TFDs with more complex structures than the featureless uniform distribution. The latter will be especially important for our ability to
distinguish between real demographic structures and I/N structures.
Sources cited: [1] Ammerman, Albert J., L. L. Cavalli-Sforza, and Diane K. Wagener. (1976). Toward the Estimation of Population Growth in Old World Prehistory. In Demographic Anthropology: Quantitative Approaches, edited by Ezra B. W. Zubrow, pp. 27-61. University of New Mexico Press, Albuquerque.; [2] Bamforth, Douglas B., and Brigid Grund. (2012). Radiocarbon Calibration Curves, Summed Probability Distribution, and Early Paleoindian Population Trends in North America. Journal of
Archaeological Science 39: 1768-1774; [3] Bartlein, Patrick J., Mary E. Edwards, Sarah L. Shafer, and Edward D. Barker Jr. (1995). Calibration of Radiocarbon ages and the Interpretation of Paleoenvironmental Records. Quaternary Research 44: 417-424; [4] Collard, Mark, Kevan Edinborough, Stephen Shennan, and Mark G. Thomas. (2010). Radiocarbon Evidence Indicates that Migrants Introduced Farming to Britain. Journal of Archaeological Science 37: 866-870; [5] Guilderson, Tom P., Paula J. Reimer, and
Tom A. Brown. (2005). The Boon and Bane of Radiocarbon Dating. Science 307: 362-364; [6] Maschner, Herbert, Bruce Finney, James Jordan, Nicole Misarti, Amber Tews, and Barrett Knudsen. (2009). Did the North Pacific Ecosystem Collapse in AD 1250? In The Northern World AD 900-1400, edited by Herbert Maschner, Owen Mason, and Robert McGhee, pp. 33-57. The University of Utah Press, Salt Lake City; [7] McFadgen, B. G., F. B. Knox, and T. R. L. Cole. (1994). Radiocarbon Calibration Curve
Variations and Their Implications for the Interpretation of New Zealand Prehistory. Radiocarbon 36(2): 221-236; [8] Pazdur, M. F., and D. J. Michczynska. (1989). Improvement of the Procedure for Probabilistic Calibration of Radiocarbon. Radiocarbon 31(3): 824-832; [9] Reimer, P. J., E. Bard, A. Bayliss, J. W. Beck, P. G. Blackwell, C. Bronk Ramsey, C. E. Buck, H. Cheng, R. L. Edwards, M. Friedrich, P. M. Grootes, T. P. Guilderson, H. Haflidason, I. Hajdas, C. Hatté, T. J. Heaton, D. L. Hoffmann, A. G. Hogg,
K. A. Hughen, K. F. Kaiser, B. Kromer, S. W. Manning, M. Niu, R. W. Reimer, D. A. Richards, E. M. Scott, J. R. Southon, R. A. Staff, C. S. M. Turney, and J. van der Plicht. (2013). IntCal13 and Marine13 radiocarbon age calibration curves 0–50,000 years cal BP. Radiocarbon 55(4):1869-1887; [10] Rick, John W. (1987). Dates as Data: An Examination of the Peruvian Preceramic Radiocarbon Record. American Antiquity 52(1): 55-73; [11] Steier, P., W. Rom, and S. Puchegger. (2001). New Methods and Critical
Aspects in Bayesian Mathematics for 14C Calibration. Radiocarbon 43(2A): 373-380; [12] Stuiver, Minze, and Paula J. Reimer. (1989). Histograms Obtained from Computerized Radiocarbon Age Calibration. Radiocarbon 31(3): 817-823; [13] Stuiver, Minze, and Paula J. Reimer. (1993). Extended 14C Data Base and Revised CALIB 3.0 14C Age Calibration Program. Radiocarbon 35(1): 215-230; [14] Surovell, Todd A., and P. Jeffrey Brantingham. (2007). A Note on the Use of Temporal Frequency Distributions in
Studies of Prehistoric Demography. Journal of Archaeological Science 34: 1868-1877; [15] Surovell, Todd A., Judson Byrd Finley, Geoffrey M. Smith, P. Jeffrey Brantingham, and Robert Kelly. (2009). Correcting Temporal Frequency Distributions for Taphonomic Bias. Journal of Archaeological Science 36: 1715-1724; [16] Weninger, Bernhard. (1986). High-Precision Calibration of Archaeological Radiocarbon Dates. Acta Interdisciplinaria Archaeol 4: 11-53; [17] Weninger, Bernhard, Kevan Edinborough, Lee
Clare, and Olaf Jöris. (2011). Concepts of Probability in Radiocarbon Analysis. Documenta Praehistorica 38: 1-20; [18] Williams, Alan N. (2012). The Use of Summed Radiocarbon Probability Distributions in Archaeology: A Review of Methods. Journal of Archaeological Science 39: 578-589.
Download