Supplement: Population Size Predicts Technological Complexity in Oceania Michelle A Kline* and Robert Boyd* * Department of Anthropology, University of California, Los Angeles, CA, 90095 Sampling from Electronic Human Relations Area Files (eHRAF) We limited our data set to ethnographies found in the eHRAF (2009), because of the strict criteria of inclusion for that database. The HRAF database contains published books and articles, and unpublished dissertations. Authors include anthropologists and other scholarly researchers, as well as missionaries and explorers. Most publications are in the form of ethnographies done in situ, but the collection also includes publications resulting from research in museums, and ethno-histories. The publications included in our sample were overwhelmingly ethnographic accounts. For details of how we controlled for potential variation in ethnographic coverage, see the section on control variables, below. For further details on the eHRAF collection, see their online guides (World Cultures Database, 2009). Our sample contains societies from near and far Oceania (also known as Micronesia, Melanesia, and Polynesia). Figure S1. Figure shows the geographical locations of the ten societies in our sample. Geographical distance between societies is not always indicative of rates of contact. Map from GoogleMaps (2008). Collection of Tool Data We collected data in multiple stages. Here we explain in detail. 1. Collection of ethnographic excerpts from the electronic Human Relations Area Files database, and calculation of total number of tools. We selected the societies to be included in this sample by limiting ourselves to (a) societies in Oceania, as defined by eHRAF (b) societies for which it is possible to obtain a population size estimate. We did not use groups located on Papua New Guinea and mainland Australia, so that our sample would include only relatively isolated groups for which we could estimate rates of contact. After examining the organization of eHRAF indexing codes, we used the advanced search function to retrieve all paragraphs indexed by eHRAF as having content on “fishing,” “marine hunting,” or “fishing gear.” The coder then copied and pasted all retrieved paragraphs into a word document for each society, for further coding. The coder read through the word document, simultaneously using an excel sheet to assemble a list of all technologies mentioned (whether in passing or as the focus) in these ethnographic excerpts, and a record of in which paragraph(s) the technology is mentioned. These excel sheets formed separate technology indexes for each society in our sample. After completing the index, the coder reviewed notes made during the indexing process and double-checked that each tool indexed was in fact an independent tool. Tools were granted this status if any of the following was true: tools were described as performing a unique purpose, tools were described as having a unique structure or production technique, or tools were given a unique name by informants. Tool names sometimes differed according to the author of the ethnography, and in these cases the coder gave priority to the structural and functional descriptions of tools. The coder indexed every technology mentioned in the collected excerpts, even if they were not expressly marine foraging technologies. After verifying that each tool was in fact an independent tool type, we removed technologies that were not directly involved in procuring marine resources (for instance, canoe houses or cooking pots). We also removed technologies like canoes, because while often mentioned in fishing-related excerpts, they were not reliably sampled by the eHRAF codes we used. We retained tools or tool parts that were used in marine foraging, but seemed to be primarily decorative or supernatural, because they are nonetheless part of the tool kit from the perspective of the producers, so that they too are subject to the dynamics of cultural evolution. The resulting list of tools composed our “total number of tools” data. In the next step, we gauged the complexity of each tool, by revisiting the ethnographic excerpts according to the excel spreadsheet index created here. 2. Coding of Techno-units. We defined techno-units just as Oswalt (1976) did: “an integrated, physically distinct, and unique structural configuration that contributes to the form of a finished artefact (p. 38).” The coder used the index and any notes created in phase one to revisit paragraphs on each tool, and to estimate the number of techno-units that composed each tool. The coder did this individually for each tool, for each group. Techno-unit counts are based on verbal descriptions, illustrations, and photographs from the eHRAF. Thus, our study differs from Oswalt’s (1976) in that he had access to museum collections and was able to handle some actual specimens. Since we were limited to using ethnographies only, we sometimes found that there was not enough information available on a given technology to rate its techno-units. We do not include these tools in our techno-unit measures. Many of the same tools were present in a number of groups in our sample, with varying degrees of techno-unit information available in the ethnographic excerpts collected for each group. In these cases, we took the average of all estimates we were able to generate, and used this new mean value as the techno-unit data for that tool across all groups, regardless of whether or not information on techno-units was available in that specific instance. The proportion of tools without techno-unit ratings does not predict the estimate of mean techno-units for a group (ß= .142, p=.696, R2=.0201). See Figure 1 for an example of techno-unit coding. Figure 1. A sink-net used by Santa Cruz Islanders; made up of 7 techno-units. (From Speizer 1958, p71, Fig 29: in eHRAF 2008) Calculation of Control Variables Most of the control variables we used are self-explanatory. Here we discuss four types of control variables , ethnographic coverage, effective temperature (ET), and relative importance of fishing. 1. Ethnographic Coverage One alternate explanation of our result is that it is an artifact of the way that data were collected by ethnographers, across the different societies in our sample. By this account, larger populations might draw more ethnographers, or ethnographers might write more about them, so that there is more information available on their tool kits. To control for this possibility, we used the following data on the eHRAF collection: (1) number of publications, (2) number of authors, and (3) number of pages published on a particular society. None of these variables predicts tool kit variation when included in a regression with log-transformed population size, in terms of number of tools or complexity of tools as measured by mean techno-units. None of these variables predicts tool kit variation in terms of number of tools, according to regression analyses (see below). In an Akaike model selection analysis, the strongest model here (number of publications regressed on number of tools) is ranked 15 out of 23 and so is not a preferred model. Likewise, none of these variables predict tool complexity, according to regression analyses. According to Akaike analyses, the model using number of publications is the least preferred of 23 models as a predictor of tool complexity. Table S1. Rows give the standardized regression coefficients and significance values for regressions in which the dependent variable is the logarithm of the total number of tools and the independent variables are log-transformed population size, and a log-transformed measure of ethnographic coverage. IV ß Sig BS sig R2 Population 0.733 0.014 0.058 0.6853 # Publications 0.205 0.396 0.581 Population 0.744 0.023 0.460 # Authors 0.120 0.653 0.856 Population 0.809 0.014 0.023 # Pages -0.008 0.975 0.979 0.6593 0.6486 Table S2. Rows give the standardized regression coefficients and significance values for regressions in which the dependent variable is the logarithm of the mean number of technounits per tool, per society and the independent variables are log-transformed population size, and a log-transformed measure of ethnographic coverage. IV ß Sig BS sig R2 Population 0.721 0.039 0.154 0.5008 # Publications -0.044 0.883 0.936 Population 0.814 0.030 0.072 # Authors -0.212 0.503 0.546 Population 0.717 0.047 0.124 # Pages -0.023 0.939 0.956 0.5324 0.4995 2. Effective Temperature Effective temperature (ET) is a measure of ecosystem abundance that is based on the amount of solar energy available in a particular location. It has been used previously for this purpose (Bailey 1960, Binford 2001, Collard et al 2005). It is calculated by the following formula, where MWM is the average temperature in centrigrade during the mean warmest month, and MCM is the average temperature in centigrade during the mean coldest month: ET = (18 * MWM – 10 * MCM) / (MWM – MCM + 8) We also used latitude as another proxy of solar energy available. We did not use variation in growing season length, since all groups in our sample had year-round growing seasons. 3. Vulnerability to Catastrophic Storms We also collected data on the threat of cyclones and tropical storms, since these influence risk of resource failure in the Pacific. Cyclones may tear the roofs from houses, uproot crops, and prevent fishing and marine collecting for days at a time. To estimate the threat of cyclones and tropical storms for each group, we gathered data using cyclone path maps for the Pacific region, available at Australian Severe Weather (2009). Each map covers one cyclone season. We used maps from the 1998-1999 to 2008-2009 seasons. We used recent data, because historical data was not available for all groups in our sample, and did not seem to be as reliable. For any cyclone path coming within approximately 50 miles of our target group’s island, we recorded the maximum wind speed of the entire storm and the maximum actual windspeed of the storm at the point nearest the island. From these data we calculated for each group: (a) the total number of storms across all seasons from 1998 to 2009, (b) the total maximum windspeeds of all storms, (c) the mean windspeed for all storms, (c) the maximum windspeed of any storm. None of these measures has an affect on a group’s total number of tools, or their average tool complexity (in techno-units). 4. Threat of Drought Another major source of risk of resource failure is drought. We controlled for this using weather data from Weatherbase (2009) and the National Weather Service for Hawaii (2009), including (a) number of rainy days per year, (b) total annual rainfall, (c) mean annual rainfall, (d) standard deviation in rainfall per year. None of these measures has an affect on a group’s total number of tools, or their average tool complexity (in techno-units). 5. Importance of Fishing In order to measure the importance of fishing’s contribution to the subsistence of each of the groups in our sample, we obtained data from Ember (2008) on the subsistence “types” of each group. These data were themselves compiled by Ember for use in the Human Relations Area Files database. Some of these sources specified the contribution of fishing in the group’s diet by percent, rounded to the nearest ten; others provided rough estimates. As a result, we were unable to obtain interval data on the importance of fishing. Instead, we used the best available data for all groups in order to rank the groups in order of how much fishing contributed to their subsistence. We allowed for ties, so that groups with equal percentages of their subsistence coming from fishing received the same score. See the table below with the data provided by Ember, and our conversion of partial interval to ordinal data. Table S3. Data on the importance of fishing for subsistence of groups in our sample. Data are in two forms: raw percentages obtained from Ember (2008), and a conversion of those percentages into ordinal data, with 1 being least important and 7 being most. Culture % Fishing (Ember) Fishing Rank Malekula Min: 56 Max: 85 6 Tikopia 20 2 Santa Cruz Min: 56 Max: 85 6 Yap 40 4 Fiji 50 5 Trobriand Isl. 10 1 Chuuk 40 4 Manus 90 7 Tonga 30 3 Hawaii 40 4 We analyzed the data on reliance on fishing for subsistence in three forms. First, we converted Ember’s percentage data into ordinal data, which we call fishing rank. We present these analyses in the paper. To test the robustness of these results, we also analyzed the data in two other forms: one using the lowest estimates for fishing subsistence for Malekula and Santa Cruz, the other using the highest estimates. The results support our conclusions using the ordinal data—fishing importance does not predict toolkit breadth or complexity. Table S4. The first two rows give the standardized regression coefficients and significance values for regressions in which the dependent variable is the logarithm of the total number of tools and the independent variable is percent contribution of fishing toward subsistence, (1) using the low-end estimates for Malekula and Santa Cruz, and (2) the high-end estimates for both groups. The last two rows combine each of those independent variables in turn with the logarithm of population size, for a multiple regression. The coefficients for both measures of importance of fishing are low and in the opposite of the predicted direction, and the effects are not significant according to asymptotic and bootstrap significance values. The coefficients for population size are large and mostly significant, a result supported by bootstrap values as well IV ß Sig BS sig R2 % fishing (low est.) -0.065 0.859 0.898 0.0042 % fishing (high est.) -0.298 0.402 0.499 0.0890 Population 0.806 0.008 0.012 0.6546 % fishing (low) -0.078 0.737 0.833 Population 0.777 0.010 0.190 % fishing (high) -0.149 0.523 0.677 0.6699 Table S5. The first two rows give the standardized regression coefficients and significance values for regressions in which the dependent variable is the logarithm of the mean number of technounits per tool and the independent variable is percent contribution of fishing toward subsistence, using (1) the low-end estimates for Malekula and Santa Cruz, and (2) the high-end estimates for both groups. The last two rows combine each of those independent variables in turn with the logarithm of population size, for a multiple regression. The coefficients for both measures of importance of fishing are low, and the effects are not significant according to asymptotic and bootstrap significance values. The coefficients for population size are large and significant, a result mostly supported by bootstrap values. IV ß Sig BS sig R2 % fishing (low est.) 0.260 0.468 0.587 0.0677 % fishing (high est.) -0.0593 0.871 0.886 0.0035 Population 0.702 0.026 0.049 0.5610 % fishing (low) 0.249 0.354 0.498 Population 0.722 0.032 0.101 % fishing (high) 0.080 0.777 0.823 0.5052 Checking the Robustness of Results 1. Bootstrap Resampling Analyses We checked the robustness of our regression analyses for our original sample using a bootstrap resampling to calculate significance for the regression coefficients. Table S6. Each row gives the standardized regression coefficients and significance values for a multiple regression in which the dependent variable is the logarithm of number of tool types. The coefficients for the control variables are smaller and none are close to significant, equally for asymptotic and bootstrap values. The AICc value for a regression with only the constant is 0.63. Independent Variable ß Significance Boot-strap Significance R2 AICc AICc weights Population 0.805 0.005 0.001 0.649 -2.41 .03997 Mean rainfall/yr. -0.474 0.166 0.119 0.225 -1.62 .02691 Publications 0.464 0.176 0.213 0.216 -1.61 .02677 Standard Dev Rain/yr. -0.442 0.201 0.325 0.195 -1.58 .02641 Sum of max wind speeds for all cyclones 0.360 0.306 0.588 0.130 -1.51 Effective Temperature -0.344 0.331 0.315 0.118 -1.49 .2523 Latitude 0.270 0.450 0.450 0.073 -1.44 .02461 Total cyclones 0.241 0.502 0.610 0.058 -1.43 .02441 Fish Genera 0.192 0.594 0.713 0.037 -1.40 .02415 Mean rainy days/yr. -0.153 0.673 0.777 0.023 -1.39 .02398 Mean maximum cyclone wind speed -0.128 0.724 0.724 0.017 -1.38 Importance of fishing -0.104 .02541 .02389 0.773 0.824 0.011 -1.38 .02383 Table S7. Each row gives the standardized regression coefficients and significance values for a multiple regression in which the dependent variable is the logarithm of number of tool types and the independent variables are the logarithm of population size and the one of the alternative variables. The coefficients for population size are large and mostly significant, while the coefficients for the control variables are smaller and none are close to significant, equally for asymptotic or bootstrap values. The AICc value for a regression with only the constant is -2.91. R2 AICc 0.015 0.383 0.028 0.528 0.020 0.409 0.762 -3.39 .06513 0.677 -3.08 .05592 0.691 -3.12 .05711 0.014 0.396 0.069 0.979 0.007 0.476 0.012 0.510 0.015 0.550 0.012 0.873 0.012 0.694 0.020 0.793 0.685 -3.11 .05663 0.652 -3.01 .05381 0.674 -3.07 .05570 0.671 -3.06 .05539 0.010 0.628 0.010 0.112 0.006 0.239 0.019 0.822 0.014 0.908 0.024 0.786 0.130 0.844 0.054 0.376 0.142 0.881 0.023 0.935 0.661 -3.03 .05454 0.661 -3.03 .05451 0.716 -3.21 .05956 0.651 -3.00 .05378 0.649 -3.00 .05362 Independent Variable ß p Population Fish Genera Population Mean rainfall/yr. Population Mean max cyclone wind speed Population Publications Population Importance of fishing Population Mean # rainy days/yr. Population Sum max wind speeds for all cyclones Population Total # cyclones Population Latitude Population Contact Population SD Rainfall per Year Population Effective Temperature 10.045 -0.413 0.988 0.249 0.824 -0.206 0.002 0.110 0.017 0.455 0.006 0.361 0.733 0.205 0.724 0.008 0.871 0.175 0.918 -0.188 0.858 -0.123 0.783 0.635 0.792 0.259 0.844 0.065 0.819 0.030 Bootstrap Significance AICc weight Table S8. Each row gives the standardized regression coefficients and significance values for a multiple regression in which the dependent variable is the logarithm of average number of techno-units per tool. The coefficient for population size is large and significant, while the coefficients for the control variables are smaller and none are close to significant, equally for asymptotic or bootstrap values. The AICc value for a regression with only the constant is -2.91. Independent Variable Population ß p Bootstrap Significance R2 AICc AICc weight 0.706 0.022 0.018 0.499 -3.60 .03827 -0.629 0.051 0.124 0.396 -3.42 .03486 0.495 0.146 0.200 0.245 -3.19 .03116 Mean rainfall/yr. -0.390 0.265 0.170 0.152 -3.08 .02942 Mean rainy days/yr. -0.348 0.324 0.340 0.121 -3.04 .02890 Sum of max wind speeds for all cyclones 0.292 0.413 0.610 0.085 -3.00 .02831 Total cyclones 0.202 0.576 0.678 0.041 -2.95 02765 Effective Temperature -0.163 0.652 0.655 0.027 -.294 .02745 Mean maximum cyclone wind speed -0.136 0.708 0.745 0.019 -2.93 .02734 Importance of fishing 0.084 0.818 0.833 0.007 -2.92 .02717 Latitude 0.022 0.952 0.961 0.001 -2.91 .02709 Publications 0.212 0.557 0.599 0.045 -1.65 .01439 Standard Dev Rain/yr. Fish Genera Table S9. Each row gives the standardized regression coefficients and significance values for a multiple regression in which the dependent variable is the logarithm of average number of techno-units per tool and the independent variables are the logarithm of population size and the one of the alternative variables. The coefficients for population size are large and mostly significant, while the coefficients for the control variables are smaller and none are close to significant. Significance values based on bootstrap analysis are larger, but show a similar pattern. Independent Variable ß p Population Publications Population # Fish Genera Population Effective Temperature Population SD Rainfall per Year Population Latitude Population Mean max cyclone wind speed Population Total # cyclones Population Sum max wind speeds for all cyclones Population Mean # Rainy days per Year Population Mean rainfall per Year Population Contact Population Importance of fishing 0.722 -0.044 0.632 0.128 0.798 0.201 0.039 0.883 0.093 0.705 0.029 0.511 Bootstrap Significance 0.136 0.920 0.207 0.831 0.045 0.567 0.514 -0.321 0.732 -0.127 0.727 -0.205 0.143 0.337 0.030 0.652 0.026 0.453 0.757 -0.120 0.828 -0.203 R2 AICc 0.500 AICc weight -4.19 .05137 0.510 -4.21 .05186 0.531 -4.25 .-5301 0.305 0.557 0.279 0.845 0.027 0.518 0.565 -4.33 .05504 0.515 -4.22 .05209 0.541 -4.27 .05355 0.036 0.694 0.038 0.551 0.100 0.790 0.104 0.791 0.511 -4.21 .05189 0.526 -4.24 .05270 0.670 -0.096 0.052 0.747 0.086 0.840 0.507 -4.20 .05171 0.907 0.274 0.048 0.494 0.188 0.661 0.534 -4.26 .05317 0.715 -0.144 0.702 -0.103 0.030 0.600 0.033 0.710 0.258 0.746 0.049 0.784 0.520 -4.23 .05238 0.513 -4.22 .05215 2. Coding Reliability In order to check the accuracy of our data, we obtained second measures for number of tools for half of our sample. We did this by using Oswalt’s (1976) data on the Chuuk, and had research assistants recode four randomly selected groups in our sample. RAs were given very little training, and were told the same rules of thumb for deciding what constitutes a “tool” that are provided in the paper (see section on techno-unit coding above). We did this only for number of tools and not technounits because our criteria for what “counts” as a techno-unit intentionally differs from Oswalt’s (1976), and because of the time-intensive nature of training to code and actually coding techno-units. Table S10. Compares first and second coder ratings for half of our sample (n=5). *Data for the Chuuk are taken from Oswalt (1976, Trukese), counting only marine foraging tools. Culture # Tools (1st coder) # Tools (2nd coder) Chuuk 40 *33 Tikopia 22 29 Manus 28 30 Trobriands 19 23 Santa Cruz 24 24 Table S11. Each row gives the standardized regression coefficients and significance values for a multiple regression in which the dependent variable is the logarithm of number of tool types and the independent variables are the logarithm of population size and the contact, coded as a dummy variable. The results here replicate our main findings using our first-coder ratings, suggesting that coder accuracy is not influencing our results. Independent variable ß Significance Bootstrap Significance R2 Population .794 0.006 0.005 0.6303 Population .782 0.008 0.077 0.6795 Contact .222 0.335 0.475 3. Sample Representativeness We do not mix data sources in our primary dataset, because of differences between our coding scheme and Oswalt’s. However, here we reanalyze our dataset and include two groups from Oswalt’s study (the Tiwi and the Pukapuka). These groups were not available on electronic HRAF at the time of the study. Using Oswalt’s data, we counted the number of tools for marine foraging: 30 for the Pukapuka and 4 for the Tiwi. We reanalyzed our original data after adding these two groups (n=12). The results from this altered dataset support our initial findings. Table 10. Each row gives the standardized regression coefficients and significance values for a multiple regression in which the dependent variable is the logarithm of number of tool types and the independent variables are the logarithm of population size and the contact, coded as a dummy variable. The results here replicate our main findings using the original data set plus two groups from Oswalt (1976), suggesting that our sample is robust. ß Population 0.600 0.039 0.045 0.3598 Population 0.543 0.066 0.261 0.4218 Contact 0.255 0.352 0.422 Significance Bootstrap Significance R2 Independent variable References Australian Severe Weather. 2009. South Pacific Tropical Cyclones:JWTC Data. 1 Dec 2009. http://www.australiasevereweather.com/cyclones/index.html Ember, C. 2008. Classification of the Major Forms of Subsistence (for the cultures in eHRAF World Cultures). 15 January 2010. www.yale.edu/hraf. National Weather Service. 2009. NWS: Severe Weather. 1 Dec 2009. http://www.nws.noaa.gov/ Weatherbase. 2009. Weatherbase: Oceania. 1 Dec 2009. http://weatherbase.com/. World Cultures Ethnography Database. Human Relations Area Files, Inc.; http://ehrafWorldCultures.yale.edu; 2008.