Through a Filter, Darkly: Signal, Interference, and Noise in Demographic Temporal Frequency Analysis William A. Brown, University of Washington, Department of Anthropology Introduction Our ability to archaeologically identify changes in past population sizes is relevant to diverse research topics, including the identification of colonization processes; causality in sociopolitical and economic change and cultural transmission; and determinants of ecological sustainability. Over the last four decades,1,10,14,18 this need has increasingly been met by using temporal frequency distributions (tfds) – time series data describing temporal variation in the abundance of archaeological deposits – as proxy census records. All else being equal, larger populations discard a greater abundance of materials than smaller ones, implying that tfds should register temporal variation in population size. Yet, even as interest in temporal frequency analysis (TFA), especially its paleodemographic application, grows, archaeologists are also becoming increasingly aware of the limitations constraining this approach. Heated disputes have recently flared up surrounding overextensions of the approach, though the potential for such abuses was already anticipated over two decades ago.7,10 Factors confounding straightforward demographic interpretations of tfds can be divided between those that introduce interference into the signal, systematically violating the assumption of proportionality between the population size curve and the archaeological record: • Diachronic change in the per capita deposition rate; • Time-transgressive taphonomic bias;14,15 • research bias, resulting from the application of particular survey methods and research agendas and those that introduce noise: • random variability around the deposition rate; • random variability in deposit survival/destruction; • survey sampling error; • sufficiency of sample hygiene criteria. Figure 1. spd (gray) plotting the temporal distribution of 166 archaeological site occupations from the Kodiak Archipelago in the Gulf of Alaska. The rug plot below marks the median age estimate for each individually dated site occupation. Vertical dashed lines demarcate the timespan 850750 cal BP/A.D. 1100-1200, a century purportedly marked by a dearth of archaeological deposits.6 The problem While many sources of interference and noise (I/N) are generic to demographic TFA, a handful are specific to TFA supported by radiocarbon (14C) age estimation. Instrument measurement error and secular variation in the concentration of atmospheric 14C impose unique computational demands on tfd construction, these being met by 14C age calibration and the summed probability distribution (spd; see figure 1), a class of tfd in which the probability density of each probabilistic age estimate in the sample is summed at each location along the timeline: 𝑠𝑝𝑑 𝑡 = 𝑛𝑖=1 𝑓𝑖 𝑡 . Though working with spds is best practice in 14C-supported TFA, this does not resolve all demographic confounders arising from 14C age estimation. On the contrary, 14C-based spds frequently exhibit anomalous and extreme peak-and-trough structures arising from several nondemographic transformations: 1. sampling error, introducing structural disparities between the underlying TFD and the sample tfd (figure 2);3,18 2. the dispersive mapping of calendric ages into 14C ages;2 3. additional dispersion resulting from instrument measurement error;2,7 4. disparities in the expression of calendric age estimates resulting from the mapping of probabilistic 14C age estimates through calibration curves characterized by varying slopes (figure 3 lower panel, and figure 4)2,5,7,8,11,12,13,16,17,18. These sources of non-demographic I/N create equifinality between real demographic structures and artificial ones, and more than once such structures have been prematurely identified as demographic by the unwary. In response, a handful of simulation experiments have been conduct to expose the influence of one,3 two,7,18 or three2 of these factors on tfd,3,7 including spd,2,18 morphology. Experimental results have variously lead these researchers to cautionary tales2 and to the prescription of best-practice protocols to mitigate them.7,18 However, the sequential interaction of all four factors remains to be explored. In addition, a preoccupation with the structuring influence of calibration interference has overshadowed our understanding of that of the other three.2,7,18 Finally, the robustness of interpretation of these simulation experiments has been limited by unfavorably small simulation sizes, usually involving either the characterization of a single simulation run2,3,7 or the comparison of two.2 The most ambitious simulation experiment to date, Williams’,18 involves statistical comparisons of thirty iterations per experiment – still a small simulation by most counts. The insightfulness of Williams’ experiments is additionally limited by the fact that they involve resampling, disallowing comparison between each simulation run and the unknown underlying TFD. Figure 2. Sample tfd describing the temporal distribution of 200 observations, randomly sampled from a truncated exponential TFD (blue line). The TFD is proportional to a population undergoing constant growth between 4000 and 1000 cal BP, with a doubling time of 1000 years. The tfd is represented in three ways: as a histogram, rug plot, and kernel density estimate (red line). figure 4. Simulation of 10,000 randomly generated 14C age measurements, specified as in figure 3, upper panel. The sample distribution is expected to approximate a normal distribution specified as N 𝜇 = Figure 3. Upper panel: refractiondispersion paths for ten simulated estimates of a single calendric age (12000 cal BP), each mapped through IntCal13,9 with an instrument measurement offset following a normal distribution specified as N 𝜇 = Figure 5. two hypothetical 14C age estimates with identical precisions on the 14C age axis (vertical): 475±30 rcyBP (blue curve) and 875±30 rcyBP (red curve). IntCal13 is shown in the main panel (solid black lines),9 along with an ideal 1:1 relationship between calendric and 14C time. The different mapping relationships between the two 14C estimates and the calibration curve has led to very different expressions of these two age estimates on the calendric timeline. Methods Because the first three transformations operate stochastically (the latter two entailing a propagation of error; figure 3 upper panel and figure 4), the challenge of exploring the sequential influence of all four on spd morphology recommends a Monte Carlo (MC) simulation approach, involving a large simulation size (e.g., 1000 runs) and statistical evaluations thereof. Parameters to consider in such an approach should include (a) the influence of sample size on sampling error (process 1); (b) TFD location and shape (process 2); (c) degree of and variation in instrument measurement error (process 3); (d) selection of calibration curve (processes 2 and 4); and (e) calibration algorithm (process 4).17 The MC experiments presented here were implemented in R using code written by the author. Results are preliminary, focusing primarily on the influence of sample size on non-demographic I/N in spds. Six sample sizes approximating characteristic archaeological samples were analyzed (n=30, 50, 100, 200, 500, and 1000). The underlying TFD was the uniform distribution, owing to this distribution’s featureless morphology and thus the ease of detecting non-demographic structures in spds. The TFD was spread over the interval 8000-1000 cal BP. Simulated offset from instrument measurement error followed a normal distribution specified as N 𝜇 = 0, 𝜎 = 50 . Only one calibration curve was considered, IntCal13.9 My calibration algorithm reproduces the output of OxCal and CALIB (though not CalPal17). For each sample size, 1000 simulation runs were generated, and each set of 1000 runs was summarized according to five percentiles, determined for each five-year interval along the timeline (t): Minimum, 2.5%, median, 97.5%, and maximum. Critical tests measuring the coherence between simulated spds and the underlying TFD will be conducted in the future. The present analysis is limited to the exploration of the patterning observed in the percentiles, both over time and between sample sizes. Figure 6. Results of Six MC simulation experiments, summarizing variability between 1000 simulated spds. Variability is depicted as the mid-95% range (pink shaded area) and min-max range (dashed red lines). The median (solid red line) and underlying TFD (solid blue line) are also shown. All parameters except sample size are held constant between simulations. Each simulation considers a different sample size: n=30, 50, 100, 200, 500, and 1000. Results and discussion Two distinct patterns emerge from the comparison of simulations between sample sizes (figure 6): 1. The magnitude of variance between runs decreases as sample size increases, converging toward the underlying TFD as expected. However, for small and even moderate sample sizes, the minimum and 2.5% boundaries are at or slightly above 0, while the median falls below the TFD. Medians are still occasionally noticeably below the TFD at n=100 but are negligibly different by n=200. The lower 2.5% boundary similarly begins to lift away from 0 by n=200, while the minimum boundary has only done so by n=500. Future MC experiments could further elucidate the sample sizes necessary to achieve these “liftoffs,” but even at this coarse grain, the cautionary tale bares out that small to moderate samples should be expected to generate artificial depressions and gaps in the record with greater frequency than desired. By extension, these will then be counterbalanced by anomalous peaks, which can be quite severe. 2. While past discussions of calibration interference have asserted that steep calibration curve slopes generate extreme peaks and that low slopes and plateaus depress them,2,7,18 the results of the simulations shown in figure 6 suggest a more nuanced understanding of the problem. Here, certain intervals along the timeline are characterized by less variability both above and below the model expectation than others. Furthermore, while the magnitude of variance at these locations contracts as sample size increases, their location persists between sample sizes, suggesting that their locationality is an artifact of interference, specifically of the calibration curve slope, whereas their direction and magnitude are artifacts of noise. What this implies for paleodemographic practice is that, if two researchers were to independently investigate the occupation history of an identical study region, they should expect to reach consensus more quickly for those intervals characterized by low calibration curve slopes than for those characterized by high. By implication, when only one sample spd is available for a given region and the size of this sample is small, the volatility characterizing steep-sloped stretches of the calibration curve should convince us to suspend judgment regarding any extreme spd structure falling within these regions, peak and trough alike. Possible solutions to these problems include 1. Increase sample size, either by conducting extensive field surveys to identify a greater number of datable archaeological deposits, or by combing museum collections to identify undated but dateable deposits (or both). 2. When calculating growth rate estimates from spds,4 avoid comparison between volatile intervals; identify those intervals characterized by the lowest inter-run variance and compare spd values within or between these intervals. 3. Keeping in mind that spds are sample distributions, subject them to kernel density estimation, an estimation technique in inferential statistics intended to approximate the probability density function underlying a sample distribution. Kernel density estimation involves the application of moving, distance-weighted averages to sample distributions. Defining appropriate distance-weighting functions that are ideally suited for TFA awaits further simulation investigation. To date, the leading proposal is that of Williams,18 who recommends the use of 500- to 800-year moving averages. His recommendation focuses on the neutralization of calibration interference, though it should also act to mitigate the effect of sampling error. Future work should explore kernel shapes aside from the rectangular kernel implicit in Williams’ prescription, as well as optimization algorithms for kernel bandwidth that are responsive to both sample size and features of the sample distribution. Future MC experiments will explore the additional influence of variable instrument measurement error on inter-spd variability. They will also explore TFDs with more complex structures than the featureless uniform distribution. The latter will be especially important for our ability to distinguish between real demographic structures and I/N structures. Sources cited: [1] Ammerman, Albert J., L. L. Cavalli-Sforza, and Diane K. Wagener. (1976). Toward the Estimation of Population Growth in Old World Prehistory. In Demographic Anthropology: Quantitative Approaches, edited by Ezra B. W. Zubrow, pp. 27-61. University of New Mexico Press, Albuquerque.; [2] Bamforth, Douglas B., and Brigid Grund. (2012). Radiocarbon Calibration Curves, Summed Probability Distribution, and Early Paleoindian Population Trends in North America. Journal of Archaeological Science 39: 1768-1774; [3] Bartlein, Patrick J., Mary E. Edwards, Sarah L. Shafer, and Edward D. Barker Jr. (1995). Calibration of Radiocarbon ages and the Interpretation of Paleoenvironmental Records. Quaternary Research 44: 417-424; [4] Collard, Mark, Kevan Edinborough, Stephen Shennan, and Mark G. Thomas. (2010). Radiocarbon Evidence Indicates that Migrants Introduced Farming to Britain. Journal of Archaeological Science 37: 866-870; [5] Guilderson, Tom P., Paula J. Reimer, and Tom A. Brown. (2005). The Boon and Bane of Radiocarbon Dating. Science 307: 362-364; [6] Maschner, Herbert, Bruce Finney, James Jordan, Nicole Misarti, Amber Tews, and Barrett Knudsen. (2009). Did the North Pacific Ecosystem Collapse in AD 1250? In The Northern World AD 900-1400, edited by Herbert Maschner, Owen Mason, and Robert McGhee, pp. 33-57. The University of Utah Press, Salt Lake City; [7] McFadgen, B. G., F. B. Knox, and T. R. L. Cole. (1994). Radiocarbon Calibration Curve Variations and Their Implications for the Interpretation of New Zealand Prehistory. Radiocarbon 36(2): 221-236; [8] Pazdur, M. F., and D. J. Michczynska. (1989). Improvement of the Procedure for Probabilistic Calibration of Radiocarbon. Radiocarbon 31(3): 824-832; [9] Reimer, P. J., E. Bard, A. Bayliss, J. W. Beck, P. G. Blackwell, C. Bronk Ramsey, C. E. Buck, H. Cheng, R. L. Edwards, M. Friedrich, P. M. Grootes, T. P. Guilderson, H. Haflidason, I. Hajdas, C. Hatté, T. J. Heaton, D. L. Hoffmann, A. G. Hogg, K. A. Hughen, K. F. Kaiser, B. Kromer, S. W. Manning, M. Niu, R. W. Reimer, D. A. Richards, E. M. Scott, J. R. Southon, R. A. Staff, C. S. M. Turney, and J. van der Plicht. (2013). IntCal13 and Marine13 radiocarbon age calibration curves 0–50,000 years cal BP. Radiocarbon 55(4):1869-1887; [10] Rick, John W. (1987). Dates as Data: An Examination of the Peruvian Preceramic Radiocarbon Record. American Antiquity 52(1): 55-73; [11] Steier, P., W. Rom, and S. Puchegger. (2001). New Methods and Critical Aspects in Bayesian Mathematics for 14C Calibration. Radiocarbon 43(2A): 373-380; [12] Stuiver, Minze, and Paula J. Reimer. (1989). Histograms Obtained from Computerized Radiocarbon Age Calibration. Radiocarbon 31(3): 817-823; [13] Stuiver, Minze, and Paula J. Reimer. (1993). Extended 14C Data Base and Revised CALIB 3.0 14C Age Calibration Program. Radiocarbon 35(1): 215-230; [14] Surovell, Todd A., and P. Jeffrey Brantingham. (2007). A Note on the Use of Temporal Frequency Distributions in Studies of Prehistoric Demography. Journal of Archaeological Science 34: 1868-1877; [15] Surovell, Todd A., Judson Byrd Finley, Geoffrey M. Smith, P. Jeffrey Brantingham, and Robert Kelly. (2009). Correcting Temporal Frequency Distributions for Taphonomic Bias. Journal of Archaeological Science 36: 1715-1724; [16] Weninger, Bernhard. (1986). High-Precision Calibration of Archaeological Radiocarbon Dates. Acta Interdisciplinaria Archaeol 4: 11-53; [17] Weninger, Bernhard, Kevan Edinborough, Lee Clare, and Olaf Jöris. (2011). Concepts of Probability in Radiocarbon Analysis. Documenta Praehistorica 38: 1-20; [18] Williams, Alan N. (2012). The Use of Summed Radiocarbon Probability Distributions in Archaeology: A Review of Methods. Journal of Archaeological Science 39: 578-589.