STATISTICAL ANALYSES OF THE DISTRIBUTION OF CERAMICS AT KOM EL-HISN Robert J. Wenke As noted in the introduction to this chapter, the primary objectives in our analyses of the Kom el-hisn ceramics are to derive a relative seriation suitable for inference of a relative chronology, so that we can address various kinds of change over time at Kom el-Hisn; and also, to define spatial associations of ceramics and other artifacts in order to analyze various kinds of functional variability of the Kom el-Hisn community, both as it existed at any one time and as it changed over time. Some of the statistical and other problems associated with these objectives have already been discussed in this chapter, but it is worth repeating that the constant re-use of occupational debris for building materials--particularly in the form of the numerous pottery sherds found in most mudbricks--created at Kom el-Hisn an archaeological site in which there are repeated violations of the "law of superposition," in the sense that the ceramic contents of a particular volume of excavated materials cannot necessarily be assumed to be older than the materials on which they lie. Other disturbances and problems mean that we shall have to use large samples and conservative analytical techniques to search for general patterns in the spatial distribution of artifacts at this site. Stratigraphic and other difficulties aside, a more fundamental concern involves the analytical units constructed from these ceramics and other artifacts. When confronted with a collection of Egyptian pottery sherds or stone tools, most analysts have classified or grouped these objects on the basis of principles and categories that have been in use for many decades. Not only are these traditional taxonomic systems well-established in Egyptian archaeology, they have repeatedly proved their usefulness in the standard archaeological tasks of description, relative dating, and functional analysis. Petrie's pioneering efforts at seriating ceramics (1899) and Tixier's systematization of lithics (1971) remain extremely influential in Egyptian artifact analyses. Both Petrie's and Tixier's systems were created by arranging artifacts into groups of objects that seemed similar to one another and dissimilar to other groups on the basis of gross characteristics of size, shape, and "style." To a great extent, virtually all schemes for categorization and analysis of Egyptian artifacts, even the most recent (e.g., Bourriau 1981, Arnold and Bietak in press, Jacquet-Gordon ed. 1986) are logically similar to Petrie's and Tixier's methods. Despite the widespread and productive use of these and other traditional methods of Egyptian artifact typology, the procedures and assumptions on which they are based remain extremely controversial. Indeed, numerous scholars have argued that the kinds of classificatory and typological systems traditionally and currently applied to Egyptian artifacts are, at best, incomplete (e.g., Read 1982, Whallon 1982), and at worst, incorrectly and inefficiently formulated for their intended purposes (Dunnell 1971, 1986). 1 These negative evaluations are based on theoretical and methodological considerations that appear only rarely in the Egyptian archeological literature (but see Kemp 1977, Adams 1963, Hassan 1980, Close 1977, 1980a, 1980b, Wendorf and Schild 1980: 8-11). Instead, the emphasis in recent studies of Egyptian artifacts seems to be on: (1) chemical analyses of artifacts (e.g., neutron activation analysis [David ed. 1986]); (2) increasingly precise and technical descriptions of the composition of ceramic wares (Arnold and Bietak in press); and (3) applications of multivariate statistical techniques to tabulations of traditional Egyptian ceramic and lithic types (Wenke, Long, and Buck n.d., Close 1980a). These technical studies and other approaches are useful. But they do not in and of themselves meet the criticisms directed at traditional methods of Egyptian artifact classification and typology. To understand why the kinds of typologies applied to Egyptian artifacts have been judged inadequate, it is necessary to consider the objectives of analyses and also the criteria by which one approach can be said to be better than another. Most archaeologists have in common some basic research objectives, in that the use of artifacts to infer chronologies and to reconstruct the activities of ancient peoples may be said to be common to all students of ancient material culture. But in the final analysis, many anthropologically-trained archaeologists differ from some of the scholars working in Egypt in that the former retain the aspiration of making archaeology a scientific discipline--at least in the sense of a discipline that can explain history. Currently, there is widespread disenchantment with the 1960s-era hopes of making archaeology a formal predictive science based on the model of physics (e.g., Salmon 1982, Dunnell 1984, Hodder 1986), but there also appears to be general agreement that, whatever kind of discipline archaeology can become, significant improvement in its explanatory power must be based on reconsiderations of how archaeologists formulate and use artifact-based analytical units. Rejection of traditional anthropological and archaeological typologies has been widespread. Generally, many specialists in problems of archaeological classification assume that archaeology cannot progress substantially until several issues of method and theory with regard to classification are resolved (Spaulding 1982, Dunnell 1986, Read 1982, Wenke 1981, 1987); indeed, the issue of unit formation has become the center around which debates about the possibilities of historical and cultural analysis revolve. Some anthropologically-inclined archaeologists, in particular, entertain ideas of scientific unit formation that are fundamentally at odds with the idea--which is the norm in Egyptian studies-that archaeological analysis is in essence a historical and humanistic enterprise in which one may use scientific methods (e.g., radiocarbon dating) but one's goal is fundamentally not scientific-one's goal is the description of an historical process, and the explanation of that process is in terms of events, personalities, etc., and the common-sense kinds of interpretations of historical 2 processes (Hawkes 1968). This is to be contrasted with the kinds of explanations of history envisioned by anthropologically-trained archaeologists, such as Binford (1983), Flannery (1972), Watson, Redman, and LeBlanc (1986), Dunnell (1984), and Salmon (1982). It seems evident that archaeologists of many disparate theoretical persuasions have in common some basic notions of using artifacts for these descriptive and comparative purposes, and for inferring relative chronologies and reconstructing trade patterns, social arrangements, etc. Given this, most archaeologists must address two questions: (1) how do we construct archaeological analytical units, and (2) how can we evaluate these units' usefulness? With regard to the first question, one's assumptions and theoretical framework obviously determine the kinds of units one constructs. In archaeology we have only a few primitive theoretical notions to direct us in the creation of analytical units, and the questions to which these units have been applied have been relatively simple. In devising artifact types and categories, most archaeologists have simply been trying to describe their finds, or have been trying to measure aspects of stylistic and functional variability--two concepts with no universally accepted definition. With regard to stylistic variability, archaeologists have been particularly concerned with relative chronologies and inferences about cultural interaction. Archaeologists assume that people have always made some artifacts with characteristics that are not related to the function of these objects and that, thus, these characteristics can be considered "style"; further, to be considered stylistic, variability must be distributed through time continuously and unimodally, in the sense that a stylistic artifact attribute or complex of attributes is defined as those that are invented in a given area, begin to be used by increasingly more people, reach a peak of popularity, and then die out. Since the distribution of artifact styles is affected by distance (e.g., styles can reach their ultimate point of dispersion long after they have died out at their point of origin), methods of relative seriation for purposes of inferring chronologies require that the units to be seriated come from a small enough area that spatial variability is not a factor. Summaries of the finer points of the seriation method can be found in Ford (1954), Dunnell (1970), and Marquardt (1978). Seriation and related issues of artifact classification and typology continue to be actively debated, in part because many archaeologists have concluded that some of the most important problems of historical analysis will only be resolved at a regional scale-that is, on the basis of archaeological surveys and analyses of many sites in large areas, as opposed to research focussed on single sites. This has made comparisons between sites a particularly important analytical step, and such comparisons have often taken the form of establishing the relative chronology of occupations at many different sites. Also, relative seriations have become increasingly important in analyses of surface collections (Johnson 1972, Adams 1981, Kemp 1977, Wenke 1987) and in dealing with materials excavated at sites where stratigraphy is complex or so obscured that it is no certain guide to sequence through time--a common situation in Egyptian archaeology (Wenke 1986, Kemp 1977, Hassan 1984, Hoffman 1984). The use of artifact styles to infer community and societal patterns and interactions is based, obviously, on the spatial distribution of stylistic variability. We may find, for example, that a 3 particular kind of red-slipped bowl is common in a "rich" tomb of a 5th Dynasty noble but rare at a rural site of this same period--implying, perhaps, differences of rank and wealth. Within the confines of a single site, patterns in the distribution of pottery styles have been taken as indications of social differentiation (e.g., Hoffman 1982), and there is a long tradition in Egypt of using various wares as evidence of interregional and international commodity exchange (e.g. Bourriau 1981:121-39). Regarding functional variability, archaeologists traditionally have used analogy and inference to link "functional" artifact types to ancient activities. When we find apparently smoke-smudged "cooking wares" in profusion in midden deposits near mud-brick buildings but rarely or never in tomb assemblages, we infer their use as cooking utensils. Pottery vessels have to meet certain conditions of permeability, thermodynamic expansion, etc. to perform various functions, and variables presumably reflecting these characteristics can be formulated and measured. There is also a large literature on the mathematical problems of determining and measuring the spatial clustering and patterns of co-occurrence of functional artifact types (e.g., Carr 1984), and on the use of high-power microscopy to identify edge-damage characteristics in lithics (e.g., Vaughn 1985, Shipman 1986). With regard to both stylistic and functional units of archaeological variability, the criteria by which we judge the efficacy of our units are derived from these simple ideas about the distribution through time and space of stylistic and functional variability. To judge, for example, whether one method or another is better for purposes of inferring a relative chronological seriation, we have as performance criteria only their relative fit to the seriation model, or their relative agreement with independent criteria (e.g., dendrochronological evidence); in the case of a functional analysis, we generally evaluate the adequacy of our units in terms of whether they seem to be in agreement with other categories of evidence: whether or not lithic blades with "sickle sheen," for example, appear in association with floral remains and other artifacts and features indicative of an agricultural economy. In contemporary archaeology there are currently four influential, current, and to some extent, opposing schools of thought on archaeological artifact arrangement: (1) intuitive, "traditional" methods of unit formation; (2) statistical approaches to nominal-level attribute analysis; (3) multivariate statistical analyses of metric data as a means of grouping and classifying artifacts; and (4) paradigmatic classification. It is beyond the scope of this paper to review in detail these several approaches, and as yet the Kom el-Hisn ceramics have only been analyzed in terms of the traditional typological methods. But we hope to analyze the whole corpus of Kom el-Hisn ceramics using each of these methods and then to use the different units created to test them against our theoretical expectations and against the basic assumptions of the seriation method. Thus, it is relevant to outline each of these approaches. (1) Traditional Typologies The artifact classifications and typologies commonly used in Egyptian studies have been employed for various purposes, including: (1) description, often simply for the purpose of 4 comparing artifacts found at different sites (e.g., Bourriau 1981); (2) studies of stylistic variation, especially for purposes of relative seriation (e.g., Petrie 1899, Kemp 1977, Wenke 1984:34-38), or for inferring cultural interactions (Close 1980a); and (3) studies of physicochemical composition, usually for the purpose of reconstructing trade patterns or identifying manufacturing sites (e.g., David ed. 1986). Methods A common preliminary step in pursuing these research objectives has been the simple grouping of objects by their obvious attributes of size, shape, and decoration. That is, most analysts sort lithics or pot sherds into groups of objects that look alike, and they do so on the basis of a simultaneous and complex visual consideration of many different characteristics of shape, size, etc. Common examples of such categories in Egyptian studies are "Epipaleolithic backed bladelet" and "Predynastic black-top ware." Studies (e.g., Berlin 1968) reveal that the mental processes by which such artifact arrangements are made--to the extent that they can be verbalized--involve a shifting hierarchy of criteria, so that variously weighted combinations of shape, size, decoration, etc. are used to place objects in groups. So subtle and complex are the mental processes involved in such taxonomies that attempts to duplicate them with multivariate statistical analyses and high-speed computers have not been particularly successful (Doran and Hodson 1975). These kinds of intuitive groupings of Egyptian artifacts have recently been combined with more precise and detailed descriptive procedures. Arnold and Bietak (in press), Nordstrom (1972), Bourriau (1981), Adams (1962) and others, for example, have provided lists of characteristics on the basis of which pottery may be grouped into wares, forms, styles, etc., by reference to such variables as type of clays and silts used, Munsell color-chart values, size and type of tempering particles, ratios of size measurements, etc. In analyzing the Kom el-Hisn ceramics, we have begun by sorting most of the "diagnostics" (i.e., rims, bases, handles, spouts, and decorated sherds) into the types illustrated in Figures x*yy. There is considerable variability within some of these types (e.g., Figure x), but these types seem quite consistent in that they appear in substantial numbers in different areas of the site and are sufficiently distinct that different analysts, working independently, reliably identify them. As noted in the first section of this chapter, some of these types have been reported at various other Old Kingdom sites, and some vessels are virtually identical to examples from tombs of 5th and 6th Dynasty nobles at Saqqara and Giza. Given the widespread recognition of these traditional types, their ease of application, and the common language of comparison they provide, what, if anything, is wrong with them as analytical units? Although our analyses are in their early stages, these Kom el-Hisn types appear to be reasonable reflections of changing styles over times (as inferred from stratigraphy); and their spatial distributions fit some of our ideas about the functional composition of the site, as well. But, as noted above, such types must be assumed to have various limitations. Perhaps the most 5 important of these is that they do not allow reliable comparability between Kom el-Hisn materials and those found elsewhere. Because they are based on unspecified procedures of combining and differentially weighting different variables, the pottery from two different sites cannot be precisely compared nor can the similarities and differences be precisely tabulated and expressed. Also, if we intended to use these intuitive types for chronological or functional seriations, we would have to suppose that the groups into which we have arranged the Kom el-Hisn ceramics reflect a mixture of stylistic and functional variability--at least until we have tested these groups against the expectations of our chronological and functional models. Vessels like those in Type 31 (Figure xx) may be found in a certain frequency in a given level of occupation in part because they were used for certain economic functions and in part because that particular style of bowl was at a certain point in its "popularity" trajectory in time. In such a case, the most precise indicator of chronology may be some complex combination of lip angle and radius, rather than simple counts of the melange of variability embodied in Group 2 (Figure 2). In short, these groupings are imprecise, in that the exact considerations that went into their construction can never be entirely verbalized or expressed in precise measurement procedures. Moreover, the research objectives that determined the creation of these units were simply the assumption that by sorting these objects into groups that looked alike, descriptive categories and units useful for seriation would be produced. In summary, the perceived faults with traditional methods of categorizing Egyptian materials include these elements: (1) Egyptian classifications and typologies have been established without a clear expression of the objectives of the research for which they have been constructed, or the research objectives specified are inadequate; (2) they are usually based on blends of size, shape, and decoration--in other words, of both stylistic and functional variability, and thus they must be assumed to be less than optimal for measuring either style or function; (3) these typologies are summations of considerable variability, and the variability within the types may be particularly important for precise analyses; (4) because of these assorted limitations, traditional Egyptian types do not allow effective comparisons between assemblages from different areas; and (5) because these traditional units are usually based entirely on physical groups and observed objects, taxonomies of Egyptian artifacts are entirely bound to specific data sets and thus are not suitable for conversion to the kinds of scientific units with which some scholars still hope to build a scientific archaeology. Because of all these limitations on intuitive methods of traditional type formation, archaeologists in the last two decades have continually reassessed the basis on which they make these intuitive groupings and have sought better ways in which to categorize the archaeological record. In our work at Kom el-Hisn we have just begun the process of applying alternative methods of artifact categorization, and only our preliminary descriptive typology has been applied to enough of the corpus of ceramics that we can analyze the distribution of these types statistically. The frequencies of the most numerous and "stable" types (in the sense of reliability of 6 identification) are given by excavation SU in Table x. Almost all of the ceramics from our first two seasons have been tabulated in terms of our typology, but only the ceramics from the units listed in Table have been sufficiently studied that they can be analyzed statistically. Our intention is to combine the ceramics from the anticipated third season with those from the first two in a "final" typology, so that in our final analyses we can analyze the whole corpus in terms both of the final typology and the alternative methods of artifact arrangement described below. Figures x-y illustrate the most common and stable types so far defined, and in pp. of this chapter some of these have been related to finds from other sites. Considering both the illustrations in Figures x-y and their distribution by excavation unit, as well as the additional illustrations in Figures xx-*yy, it is apparent that most of these ceramics are from the kinds of vessels one would expect from an Old Kingdom agricultural settlement. The numbers of fine-wares, particularly the relatively high-fired, red-slipped bowls like those of Types 31a-d is perhaps somewhat unexpected, in that these vessels were also commonly included in tombs of nobles at Giza and Saqarra. At Kom el-Hisn these vessels seem to be the common utensils of everyday life--though they are rare or absent in the best preserved areas of mud-brick architecture (e.g., 1202S-1070E - 1213S-1074E, Figure x). The trays and plates illustrated as Types 3a-d and Type 13, too, although called "offering trays" by some, are so common at Kom el-Hisn that they must be presumed to be objects of everyday domestic use. In general, the types we have defined seem to be distributed throughout the site, in the sense that few or none of these types can be said to come mainly from specific strata or areas of the site. Thus, none of these types--as they are now defined--appears to be a good "index fossil" that marks a distinct time period or social class. The distribution of these types is certainly not random, however. Some types are much more likely to be found in association than other combinations, and these associations may mark functional, chronological, or other depositional patterning. As noted previously, the determination of spatial associations is an enormously complex statistical problem. Here, too, only when we have much larger samples from Kom el-Hisn and a much refined system of artifact categorization can we expect to determine with considerable precision the associations between these kinds of ceramics and other artifacts. Despite all the qualifications imposed on our analyses by limited sample sizes, a primitive typology, and substantial redeposition of materials, we can at least take as a working assumption the notion that the kinds of ceramic artifacts used together at about the same time and for related purposes will tend to be found together if the site is excavated by cultural stratigraphy. Our preliminary attempt at analyzing patterns of spatial association of ceramic types involves some complex statistical procedures, but the underlying assumptions and principles are quite simple. We began by tabulating the frequency of the 37 different types (and some combinations of types) that were represented by at least 15 individual sherds of that type. We then formed a data matrix comprising all the basic excavation units (SUs, see Chapter II) that had at least 10 7 sherds identifiable as to type in them (thereby eliminating those SUs that represent brick walls, small areas of soil discoloration, and other volumes of occupational debris that cannot be assumed to reflect in their ceramics specific patterns of use). The next analytical step was to calculate a coefficient of similarity that expresses how similar every pair of types is in their spatial distribution: that is, to calculate a number such that this number is high when two types are often found together in some excavation units and are also both absent in other excavation units, but this number is low when one of these types is frequently found where the other is not. Such a similarity coefficient can be computed in many different ways, using for example the actual frequency of occurrence or just the presence and absence of occurrence. We took a conservative approach in which we converted the actual frequency of each type in each excavation unit to simple presence or absence, thereby losing some information but reducing--it is hoped--the effect on these coefficients of the different sizes of excavation units, sampling error, etc. We used Sokal and Sneath's Similarity Measure 1 (SPSSX 1986: 739), which gives double weight to cases in which both types are present or absent in a given excavation unit, compared to cases in which one type is present and the other is not. After having computed a matrix of SS1 coefficients expressing the similarity of occurrence of all possible pairs of pottery types, we subjected this matrix to a non-metric-multidimensional scaling analysis (or MDS analysis). The mathematical basis of MDS is beyond the scope of this report, but MDS has been extensively used in archaeological analyses (Kendall 1969, LeBlanc 1975, Wenke 1975-76, Drennan 1976). Multidimensional scaling is a method where by artifacts or assemblages are measured on their characteristics (e.g., size and shape variables of objects, frequencies of artifact types in excavation units), then a measure of similarity is computed among the set of objects or assemblages. With MDS one tries to find the fewest number of dimensions in which the proximities of these points can be expressed, while maintaining the distances (or the "ranks" of these distance) between these points as measured by their similarity coefficients. The usual example is a table of driving distances between cities, say New York, San Francisco, and 10 or 12 others. From a table of the driving distances between these cities-which is analogous to the matrix of coefficients of similarity among the 37 pottery types--an MDS computer program can plot the location of each city in such a way that the information contained in the matrix of distances is precisely expressed in terms of the distance of these cities from each other as located as points on a two-dimensional plot--in other words, on a standard map. In this example there are two main dimensions of variability -- longitude and latitude. In archaeological applications the main dimensions of variability usually sought are change over time or some functional dimension: for mathematical reasons, if excavation units or some other unit of analysis exactly fit the "battle-ship" shaped curve of a perfect chronological seriation, when analyzed with MDS they can be plotted as a horse-shoe shape in a twodimensional space. And the sequence through time of these units can be read around the arc of the horse-shoe in such a way that, with adequate data, the distances between the points on this horse-shoe can be exactly translated into differences of years (LeBlanc 1975; Drennan 1976). 8 The results of the MDS analysis of the 37 Kom el-Hisn pottery types is presented in Figure x, with the statistics normally used to interpret such data. So many cautions and qualifications attend these data that the sequence in Figure x cannot be interpreted necessarily as a chronological or a functional sequence. In fact there is some evidence in these data that one primary dimension of variability is simply the relative frequency of these ceramics, and statistics associated with this analysis (stress and RMS) indicate that the variability among these types cannot be expressed with considerable precision in a space of only two dimensions. Nonetheless, the pattern illustrated in Figure x can be taken as a working hypothesis about the patterns in which these ceramics co-occur, and we will investigate the significance, if any, of these groupings in our future excavations and our reanalyses of these data. In our future analyses of the Kom el-Hisn ceramics we shall apply several specific methods of artifact categorization, and it is appropriate here to explain briefly these alternative methods and some of our our preliminary results in applying them. (2) Statistical Analyses of Nominal-level Attribute Associations One of the most influential attempts to replace or supplement traditional typologies is that promulgated by Albert Spaulding (1953, 1982) and applied by him and others (e.g., Sackett 1982) in the form of statistical analyses of artifact attributes. Research Objectives Spaulding has consistently noted the utility of traditional methods of artifact classification and taxonomy for purposes of seriation and functional analysis (1976), but he considers the ultimate goal of archaeological analysis to require the construction of archaeological units of a kind quite different from traditional units: "Presumably the primary task of archaeology is to discover and describe whatever structure (or order or pattern or predictability) there may be in the data of archaeology. The data of archaeology consist of artifacts and other evidences of past human activity together with observations on the circumstances in which they were found" (1982:1). Spaulding argues that we should attempt to construct units that reflect behavior: "A good type is a material reflection of more or less discrete culturally patterned segmentation of human activities. This segmentation may be connected with the physical requirements of kinds of tasks . . . or it may be a stylistic reflection of social patterning . . . or it may reflect some combination of physical requirements and stylistic habits. In any case, the good type is a summary expression or index of the jointedness of cultural nature, of the distinctive kinds of activities performed by the participants in cultural systems. In fact, I suppose that understanding 9 a cultural system means identifying these distinctive kinds of activities and exploring their interrelationships with the aid of ethnographic analogy, provenience data, chronological information, environmental reconstruction, and anything else that seems potentially relevant" (1983:19). Methods Spaulding suggests that we can search for this patterning in various ways, but that any scientific and powerful analysis of the cultural principles that produced the archaeological record must eventually focus on nominal variables: that is, variables that have mutually exclusive states, such as "long" and "short," and "shell-tempered" and "sand-tempered." Nominal variables are to be distinguished from ordinal variables, such as an ordering of pottery vessels from largest to smallest, or ratio and interval levels of measurements, which imply an exact degree of difference between two measurements (e.g., a lithic 4.2 cm long is twice as long as one 2.1 cm long). Spaulding argues that we should focus on nominal variables because, "If I can distinguish readily between long and short projectile points in some groups, so could the makers and users of the points. And in attempting to infer why this distinction was made, I search for non-random relationships between the attributes short and long and other variables" (1982:6). In his later papers Spaulding (1982) has used a rather complex form of statistical analysis--loglinear and hierarchical log-linear models--to investigate complex combinations of attributes, but these are extensions, not modifications, of his basic method. Spaulding's approach remains a central issue in contemporary debates on artifact arrangement. The major criticisms of his approach have been that: (1) its over-all objective is the reconstruction of ancient behavior, and the potential--or even possibility--of such reconstructions in higher-level analyses of culture has not been demonstrated (Dunnell 1971); (2) the chi-square statistical method used to establish attribute co-occurrence is inappropriate (though this criticism has been blunted by Spaulding's adoption of log-linear models), and by compressing all variability into nominal categories, significant variability is lost or obscured (Doran and Hodson 1975); and (3) Spaulding's approach is focused exclusively on attributeclustering, whereas the most productive archaeological units possibly are to be formed by objectclustering (Doran and Hodson 1975; Cowgill 1982). (3) Multivariate Statistical Methods of Archaeological Classification and Typology During the past two decades the use of computer-based multivariate statistical techniques to group and classify artifacts has become very popular. Many scholars (Dunnell 1971, Whallon 1972, 1982, Spaulding 1977, Christensen and Read 1977, Vierra 1982) have pointed out the problems associated with some of these techniques, but these methods remain an active area of research. It is beyond the scope of this article to summarize the many abstruse mathematical points that 10 underlie these and other statistical methods. The MDS analysis presented in Figure x is an example of a multivariate statistical approach. In terms of artifact analyses, the most widely used multivariate statistical techniques are various forms of cluster analysis and principal components analysis. They have been used together (Doran and Hodson 1975), but there is considerable controversy about their relationship and the ultimate utility of either. Some of those who advocate multivariate clustering of archaeological data do so in part on the assumption that it is possible--even probable--that the significant patterning in large archaeological assemblages is of such a complexity and on such a scale that it will only be identified using complex mathematical analyses (Doran and Hodson 1975, Hodson 1982). The human mind can arrange groups of objects in categories of similarity and ddissimilarity with great virtuosity, but no one can make reliable comparisons between thousands of precise measurements on tens of thousands of objects. Multivariate statistical clustering and other methods can be applied to attributes, objects, or assemblages--in other words to any numbers derived from analyses of the archaeological record at various scales. Cluster analysis as applied to artifacts generally has three stages: (1) a collection of objects is measured on a large set of interval or ratio level variables; (2) on the basis of these measurements a single number--usually a similarity coefficient--is calculated that expresses the similarity of each object to every other object in the collection; and (3) on the basis of a computerized method, all the objects are grouped into sub-groups (or clusters) of objects that are similar to each other and different from members of other sub-groups. An example of cluster analysis as applied to the Kom el-Hisn ceramics in presented in Figures 5-9. These sherds were measured on 12 variables, such as radius, maximum thickness, the angle formed by the long axes of the neck and body, etc. These measurements were then standardized so that they had a comparable mean and standard deviations. Then the "distance" between each sherd and every other sherd was calculated, using a common statistical measure (euclidean distance). Finally, the sherds were rearranged on the basis of euclidean distances such that-insofar as the program could do so--each sherd was placed next to the sherds with which it is most similar, in a dendrogram form, as illustrated in Figure 6. Note that there is some similarity between the "intuitive" groups produced in Figure 2 and these computer-generated groups in Figure 6. This form of clustering (Figure 6) is just one of many alternative methods in which different coefficients, methods of forming groups, variable scalings, etc., could have been used. These forms of cluster analysis have been applied to assemblages of Egyptian materials, but not to measurements of objects. Close (1980a), for example, clustered Terminal Eastern Saharan Paleolithic and Neolithic sites on the basis of stylistic attributes of stone tools. Lubell, Sheppard, and Jackes (1984) grouped Epipaleolithic sites in the Maghreb on the basis of their relative frequencies of tool types. I have used these same methods to cluster pottery types for purposes of chronological seriation for Late Period pottery (Wenke 1984). By using these techniques on assemblages rather than artifact attributes one escapes some of the limitations of the method, but most of these comparisons are based on units and frequencies of units that must be assumed to mix stylistic and functional variability. Also, as is discussed below, Read has pointed out (1982) 11 that most clustering analyses are based on the questionable assumption thatevery variable has equal importance in the creation of groups. Read notes that there is really no reason why this should be true in most archaeological analyses, but he is basing this suggestion on the idea that it is the cognitive categories of artifact makers and sorters that are the ultimate criteria. In part to avoid the problem in cluster analysis of each variable having equal weight in defining groupings, some archaeologists have turned to data-reduction and data summarization techniques of multivariate analysis. Read (1982), in fact, argues that multivariate analyses of artifact attributes in many cases should include as a preliminary stage the use of principal components analysis (hereafter PCA), and many computerized methods of cluster analysis offer the option of clustering on the basis of statistical summaries of variables rather than the original variables taken individually. PCA is a method of determining the extent to which each of the variables comprising a data set are measuring the same general components or dimensions. As an example of PCA, consider Figures 7-8. The 25 Kom el-Hisn sherds have been analyzed using principal components analysis2, the implicit assumption being that the 17 measurements made on these sherds are really measurements of some smaller set of underlying components or dimensions, such as general size and shape. Mathematically, PCA involves calculating a measure of how closely two variables co-vary. We can calculate, for example, the extent to which the radius co-varies with the maximum thickness of the neck. In PCA these measures of covariation are manipulated and summarized using matrix algebra. If, for example, these 17 measurements of pot sherds are mainly measuring over-all size of the vessel, PCA will show us in the form of "factor loadings" precisely the extent to which each of our variables is measuring this composite sense of "size." But if there are two dimensions of variability being measured by these variables--size and shape--it will not be possible to reduce the variability in the correlation matrix to a single dimension. PCA involves calculating how many significant dimensions of variability exist in a data set and how each of the variables is related to these dimensions. The PCA analysis of the 25 Kom el-Hisn sherds indicates that--based on the measurements made on these sherds--there are at least seven underlying dimensions of variability, seven "components," on which these sherds differ significantly and independently. "Independently" is important here in that PCA finds components of variability that are uncorrelated--"orthogonal," statistically speaking. Extended discussion of the methods of PCA are available (e.g., Tabachnik and Fidell [1986]), but for our purposes the principal question is, how can these techniques be used to classify and categorize artifacts, assemblages, etc.? Read (1982) has suggested that we could use PCA and cluster analysis in combination. Each of our 25 sherds, for example, could be given a score on each of these seven components, and that score used in seriation. Indeed, LeBlanc (1975) applied a similar approach to ceramics from the American Southwest and found that PCA could identify clusters of attributes that proved to be excellent units for inferring chronological seriations--at least as tested with dendrochronological and stratigraphic evidence. 12 Even if one does not use the groupings provided by PCA, various scholars (Whallon 1982, Read 1982) have argued that PCA can be used to identify those variables that are critical in terms of forming groups of artifacts for purposes of chronological seriation, functional analysis, or simply description. And the principal components extracted in PCA can be used in histogram form to provide the nominal categories used in the kinds of analyses Spaulding employs. A PCA analysis of the 37 Kom el-Hisn ceramic types is presented in Figure x. In many cases the criterion used to evaluate multivariate statistical methods of artifact grouping has simply been how accurately they were able to duplicate the results of intuitive groupings (Doran and Hodson 1975: ). In such cases one may legitimately wonder why, then, one should employ such complex methods. Hodson (1982) and others have claimed that such groupings are more precise because they involve the same measurements, but this really does not address the problem of comparability. To compare the Kom el-Hisn ceramics to those from the Old Kingdom site at Bhuto, for example, we could measure samples from both areas on these same variables and then do these same kinds of multivariate statistical procedures on each, but in the end we would get groups of sherds and groups of variables that would be slightly changed each time a new specimen was included in the analysis. Certainly, these groups would offer some measure of comparison between these sites, but is this the best such measure? As discussed below, at least some archaeologists think not. Also, some apparently important kinds of variability in artifacts are difficult to measure with the quantitative variables central to multivariate statistical methods of artifact grouping. This problem is particularly severe with regard to shape (Read 1982). Whallon (1982), working with Swiss Neolithic pottery, found that eleven different measurements of dimensions of these pots failed to produce adequate measurement of shape--all were mainly measures of size. He found that shape could only be measured with his variables if they were converted to ratios and other composite measurements. Similarly, Read (1982) found that shape in lithic artifacts could only be measured by a complex summary mathematical expression. Generally, in those cases in which cluster analysis, PCA, and MDS do seem to have worked well--principally in problems of relative seriation--it is not at all clear that different methods would necessarily have given inferior results. (4) Paradigmatic Dimensional Classification The fourth and last method of arrangement that I will consider here is one whose basics were established by Rouse (1960) and by Dunnell (1971, 1978, 1986). This approach is quite different from these others, although there are points of convergence that may be particularly relevant to the issues discussed here. 13 Dunnell's version (1971) of this approach is the most explicit, yet it is complex and defies easy summation; it has also been rather controversial (Benfer 1975, Doran and Hodson 1975, Spaulding 1972). However, Dunnell's method is like Spaulding's, in that although few archaeologists have adopted it directly, many have established their own approaches in reference to issues raised by Dunnell (e.g., Vierra 1982, Read 1982, Voorrips 1982). Dunnell's objective is to use artifacts to create analytical units that are maximally useful in the standard pursuits of relative seriation and functional analysis, but he is also concerned with creating units that have a specific "scientific" character and utility. His notion of science is "a systematic study deriving from a logical system which results in the ordering of phenomena to which it is applied in such a manner as to make them ahistorical and capable of explanation. . ." (1971: 199-200). In his view, explanations will derive from the articulation of analytical units and laws--in other words, fromtheory, which he defines as "a system of units (classes) and relationships (laws) between units that provides the basis for the explanation of the phenomena" (1971: 200). Methods Dunnell makes a major distinction that determines the whole structure of the rest of his method: this is the difference between groups and classes (see also Rouse 1960). Groups are real collections of phenomena, such as the 25 Kom el-Hisn sherds described above. One can construct groups by simply sorting similar objects into sets, by multivariate statistical methods, and by various other methods; the way in which these groups are formed is not relevant to their definition as a "group" -- they are groups because such arrangements are sortings of physical objects and are thus bound to the set of phenomena that comprise them: in the case of the Kom el-Hisn ceramics, for example, somewhat different groups would be formed by any of the above procedures if other sherds were added to the sample. Classes, in contrast, have no objective existence; they are definitional. One can construct classes relevant to artifacts by selecting a number of dimensions, such as length, or color, or type of material, and then breaking these dimensions into segments, or modes. For the 25 Kom elHisn sherds we can produce classes by choosing as dimensions such variables as color, temper, and radius, breaking each of these into segments, or modes, and then intersecting the dimensions to produce the paradigm illustrated in Figure 10. Such classes need not describe any particular artifacts, but once formed they can be used to tabulate the frequency of combinations of attribute states. The contents of an excavation unit, surface collection, or whatever, thus can be described by tallying the combinations of numbers representing the intersections of the various dimensions that have been divided into modes (Figure 10). These class frequencies can be tested against the seriation model, applied to functional analyses, or manipulated by multivariate statistical means 14 for various purposes. A complete discussion of the many complexities of Dunnell's approach is beyond the scope of this paper, but the general point of relevance here is that he sees classes, not groups, as the most useful and powerful analytical units in seriation and other analyses. A form of paradigmatic analysis has been applied to Egyptian ceramics (Kroeper, personal communication; Arnold and Bietak in press). In these analyses, ceramics are scored on many different dimensions of variability, but unlike Dunnell's version of this method, these dimensions have been ordered hierarchically. Wares, shapes, methods of forming, fabrics, and kinds of decoration are ordered from most important to least, based on assumptions about how people make pottery. From Dunnell's point of view, such a hierarchy is wrong, in that the only method of determining that one variable is more important than another is by comparison to models explaining the distribution through time and space. Also, there is nothing gained by such a hierarchy when assemblages are being compared, since a non-hierarchical paradigm identifies the same patterns of similarity. Evaluation Unlike grouping procedures, such as cluster analysis, paradigmatic classifications do not necessarily change if one considers new data; thus--assuming the measurements are made precisely--objects from any number of places can be meaningfully compared with each other. The intuitive typology of the Kom el-Hisn sherds presented in Figure 2 may be useful in analyzing style and function, in the sense that subsequent excavations and analyses may show that these groups belong to different chronological periods, or that these groups co-occur with other kinds of artifacts, or animal bones, or architecture. From Dunnell's point of view, however, such intuitive groups--or even their statistical distillates, such as PCA component scores or cluster analysis arrangements--would be expected to be less than optimal for purposes of analyzing stylistic or functional variability. He argues that for these purposes, one should instead: (1) select dimensions on the basis of what one is trying to study, whether style or function and some explanatory model of how this variability should be distributed through time and space; (2) construct a multidimensional paradigm for these objects; and (3) count the frequencies of these attribute combinations and compare the variation in frequency of occurrence of these classes across space or in relation to specific kinds of faunal remains. If, for example, we were particularly interested in stylistic variability, we might consider that rather subtle variations on the angle of the neck to the rim (Figure 5) might vary more directly with the passage of time than might, say, radius, which may be tied to the vessel's function. We could select other dimensions of variability likely to be time-dependent, and then arrange our analysis in the form of a count of frequencies of intersecting sets of these dimensions. These frequencies then, could be tested against stratigraphy, absolute dating (e.g., association with inscribed sealings), etc. As noted, the advantages of such an approach are several: it makes it possible to compare the Kom el-Hisn pottery directly to pottery from, say, Hierakonpolis, in an extremely precise way-15 in fact with a precision limited only by the ability of the respective analysts to make the same measurements. Such comparisons cannot be made with the same exactitude using the groups produced by cluster analysis or the other statistical approaches, because in each case the groupings change with each new sherd and each new variable, and--more important--in the case of traditional types, one would be trying to compare units that were constructed without an explicit explanation of how they were produced. Critics of Dunnell's method have suggested that it offers little increase in analytical power over intuitive typologies in relation to the much greater time required, that it ignores the "natural" groupings evident in the archaeological record, that it is inefficient in selecting the best dimensions for a given form of analysis (in comparison to PCA, for example), and that dimensional paradigms are incomplete, in that they do not identify multivariate interactions between attribute states (Benfer 1975, Spaulding 1972, Read 1982, Doran and Hodson 1975). As an example of the time-costs of the paradigmatic method, note that to construct a paradigm that would distinguish the groups pictured in Figure 2, and then to distinguish these groups from the many other "types" of ceramic artifacts found at Kom el-Hisn would require a dimensional paradigm with many dimensions and modes. In comparison, the traditional intuitive types are quickly identified and sorted, and they are in any case reflections of "primitive" or unstated paradigms, in the sense that they can be easily reduced to dimensional paradigms and the counts of dimensional intersections may not vary much from type counts in some collections of artifacts. Nonetheless, dimensional paradigms can be applied quite efficiently and rapidly once the paradigm is constructed. In any case, the important point is that we shall never really know for any given data set whether or not a paradigmatic classification produces better units for seriation, etc. than other approaches, unless performance tests are made using appropriate archaeological data and models. Summary and Conclusions Given these different methods of classification and typology, with their perceived strengths and weaknesses, what general lessons or conclusions can we draw? With regard to research objectives, it seems only sensible that, in the absence of definitive testing, archaeologists evaluate methods of classification and typology at least in part in terms of the possibility that archaeology can someday become a powerful scientific discipline. As noted above, even those who reject this possibility cannot ignore the criticism described here of traditional Egyptian artifact categories, even for simple purposes of description, comparison, and seriation. To some extent, the development of a more powerful explanatory form of archaeology will probably be built on simple improvements in constructing units of artifact variability for relative seriations and functional analyses. With regard to methods of classification and typology, it is important to recognize that the variant approaches described here have not been adequately tested, so we have no adequate basis 16 for saying one approach usually works better than another. In the case of the Kom el-Hisn ceramics, for example, an adequate test of just the "fit" of the analytical units defined by these various methods would require that these tens of thousands of ceramics be analyzed in terms of these different methods, and then the different units compared to the seriation model or to stratigraphic evidence and other measures of chronology. In the end, the "best" method of unit formation, in such an experiment, would simply be the one that made the most "sense" in some composite conception, based on all these different lines of evidence. Although no such large scale tests of different methods have been done, to my knowledge, if the evaluations presented here are valid, certain conclusions follow. The intuitive traditional typologies of Egyptian artifacts, for example, can be expected to have a role as simple descriptive devices, but they may have few virtues other than ease of application. If I report, for example, that our excavations at Kom el-Hisn revealed groups of red-slipped bowls and jars in the proportions like those in Figure 2, the person who wishes to compare our ceramics with those from Giza, or Bhuto, or some other site can only look at drawings of representative sherds of our groups and come to some approximate idea of how similar these drawn specimens and group frequencies are to those at these other sites. Similarly, computerized cluster analysis in which the defining criteria are equally-weighted quantitative variables would seem to offer few improvements on the the intuitive groupings, except perhaps in the comparisons of assemblages, as opposed to artifacts. Even here there is doubt about the utility of cluster analysis and similar approaches. Spaulding's approach is interesting, but the units it produces do not seem ideally suited to archaeological chores like relative seriation (Whallon 1972). Here too, however, no definitive testing has been done, and the development of effective log-linear models has made it at least possible to determine if the complex interactions of nominal level variables provide useful analytical units. It is not at all clear, however, that Spaulding is correct in asserting that the artificer's conceptualization of artifact variability can be monitored by our artifact categorizations. In the first place, we'll never know what these conceptualizations were; in the second place, not all distinctions that may be important would be evident to the ancient maker and user of artifacts. If we were to study agricultural economies, for example, the appearance of a stone tool assemblage used increasingly for harvesting of cereal stems may make itself evident only or principally in "luster" patterns on edges, such that one can only measure them microscopically. Currently in anthropologically-oriented archaeology the major philosophical division in studies of artifact categorization involves the relative analytical priority of multivariate statistical procedures and dimensional paradigms. The merits and limitations of both these approaches are widely recognized. Read (1982), for example, has argued that all the methods of artifact analysis described here can be--and should be--combined in most analytical frameworks. The sequence he suggests is to begin with some form of cluster analysis, in which principal components analysis has been used to deal with redundant and irrelevant variables. Using the 17 groups formed by cluster analysis and associated statistics one should then construct a paradigmatic classification. Finally, Read suggests, a Spaulding-style chi-square and log-linear analysis should be done to identify significant co-occurrences of attributes. A crucial decision in all these processes of artifact categorization is how we select the dimensions that we eventually use in our analyses. In traditional methods one selects obvious variability in size, shape, and decoration, without specifying exactly what combinations and weightings of these three dimensions are to be applied to every object--and generally with considerable variation from object to object and observer to observer in how these criteria are applied. In Spaulding's approach one does statistical analyses of these obvious dimensions of variability and pursues in the analysis only those combinations that are statistically significant in their patterns of co-occurrence. In the multivariate statistical methods, we measure as many variables as we can think of or have time for, and either use these measurements directly or distill them with PCA, MDS, etc. In a paradigmatic approach, the question of how one selects dimensions of variability is particularly crucial, since there are strict practical limits to how many precise measurements one we can make of multi-modal dimensions. Again the critical question for all these methods of artifact arrangement is, how do we determine that we have concentrated on the appropriate dimensions of variability and that we have measured these dimensions adequately? If we try, for example, to analyze stone tool function by considering the physics of percussion and abrasion of brittle solid, how can we have confidence that we are considering every relevant dimension or even the most important dimensions? Our understanding of the physics of the process of edge alteration is incomplete. Moreover, we cannot hope to reconstruct the functional considerations an individual might have had in mind when he made tools: he may have wanted a particular kind of edge for shaping arrows, but he might have wanted a certain over-all weight to apply that edge with a certain force. So how can we know the relevant dimensions? And even if we have confidence that we can pick out the 15 or 20 most likely dimensions, what about the possibility that the most effective units of comparison between two assemblages in terms of their use is some abstruse and complicated mathematical function involving three or four modes? There is also the problem of the information lost by breaking up a continuous variable into modal classes. How can we determine the proper scale at which to divide our dimensions? It is not evident that we can solve these problems by using cluster analysis or principal components analysis to identify dimensions or variables that have non-random distributions across microenvironments or significant co-occurrences with other artifact or faunal types. To do this allows a particular group of artifacts to determine what variables we use, and this could be misleading because accidental or other kinds of associations not causally connected with the phenomena we are trying to analyze will always be found. These same considerations of how one chooses variables to measure also apply in the case of 18 the selection of variables for stylistic analyses. It is the essence of stylistic variability that it is random, in the sense that no one can predict whether this slight turn of lip angle, or that subtle color will be the element that changes unimodally through time. So, in trying to isolate potential stylistic variables, one might sort sherds in piles from different sites, or from different levels, and see if there is anything evident that distinguishes them. These dimensions then could be used to construct the paradigm that will separate them in a chronologically meaningful manner. Alternatively, we could measure a great many variables, expressing them also as ratios, combinations, and transformed values, and then use principal components analysis or other multivariate statistical techniques to define attributes or combinations of attributes to test against the seriation method. One cannot conclude that because seriations must address conceptualized categories--significant differences apparent to the artisan--that to identify chronologically significant variability may not involve ratios or other mathematically more complex measurements than "red" or "black," or "long" or "short." Considering all these issues, it seems reasonable that anyone analyzing Egyptian artifacts should: (1) define as precisely as possible the objectives for which a given arrangement is to be made; (2) state explicitly why the attributes being measured are reasonable links to the kinds of objectives specified in the first step; (3) try alternate forms of grouping and classification, and check to see if in fact one or another is more in line with expectations, based on corroborative information, such as stratigraphy or documentary evidence; and (4) publish fully exact measurements, counts, etc., so that eventually we can determine what kinds of arrangements best serve the needs of the analyst for a given purpose. The increased time required to do these analyses may require major changes in the ways resources for field work--especially time--are allocated, but the alternative seems be the continued limitation of Egyptian archaeology to a simple, descriptive exercise. We hope to follow our own recommendations in our future analyses of the Kom el-Hisn ceramics, once we have increased the diversity of our ceramics samples and--most important-related the different strata of the areas excavated. 19