Networks of Variables Modeling (NoVA) with DEf for Cross-Cultural Research File: WileyCh5CCRNetsofVarsModels2blackDRW.pdf This is a preprint of a Chapter whose final and definitive form will be published in the Wiley Companion to Cross-Cultural Research. Please refer to the 2014 version when published. Eff’s revisions preceded by *+* (edit: not causal but functional v621 Hu/Wi equality affected by DQC vars) Douglas R. White, Malcolm M. Dow, Anthon Eff? Sept 6 2013 New Table 6 ABSTRACT. Cross-Cultural Research (CCR) has a wealth of new canonical databases. One is an ensemble of the Standard Cross-Cultural Sample that includes principal components used for imputation of missing data, controls for autocorrelation using spatial and linguistic proximities, a fully documented codebook,1 and the new analytical R software “Dow-Eff factors” (DEf). The DEf/SCCS ensemble comprises a Standard Traditional Cultures Sample that offers new results and can correct previously flawed research results, given imputation and Galton’s problem solutions along with checks for exogeneity, where error terms are uncorrelated with independent variables. Toolset/dataset ensembles comparable to DEf/SCCS now exist for the Ethnographic Atlas but also for Binford’s Forager Dataframes and other new observational, survey and longitudinal datasets. They have massive advantages that hold both for new research and restudies of published findings. The research tools in DEf/SCCS include R scripts that are: fast, using OLS in two stages; free, like the open access CCR databases; and packaged to simplify data entry. The software comes as R scripts for PCs and Macs that are of great use in teaching, easy to use and learn, amenable to classes, online courses and intensive seminars, and have proven their worth in many prior studies by Dow, Eff, their students as well as others who have taught with these tools.2 To enhance the use of DEf/SCCS and similar tools/database ensembles in a simpler form, the Complex Social Science (CoSSci)3,4 Supercomputer Gateway was designed to make DEf usage in large classes and intensive seminars even easier with entry windows for dependent and independent variables, autocorrelation options, and strength of imputations. Both the PC and Gateway methods return *.csv results to the user and provides hints such as “totry” variable lists. Finished programs (DEf testing) can be uploaded and saved as sharable *.csv files at the CoSSci Gateway. Successful models can be saved as part of a CCR documentation project (or removed on request). A user can rerun a sharable Gateway file or view an expanded NoVA in the archive that contains a specific DEf model or set of variables. This chapter shows how to (1) learn from a negative example (a free online book by Reiss 1986) how to think about the logical and theoretical ordering of variables in a NoVA network (and to see how to detect a mistaken hypothesis and show where a primitive NoVA model went wrong), and understand why not to use correlations alone or the usual regression analysis to build NoVA diagrams. (3) do DEf regressions (enhanced with tests for exogeneity and corrected for autocorrelation), and (3) go further to combine related models into valid NoVA models or Path Analysis (with new adjustments for exogeneity). While (1) concerns gender issues of inequality, beliefs and role behaviors, approached incorrectly, (3) provides DEf, “Structural” and “Integral” NoVA studies on these topics using the data of DEf/SCCS and a useful way to do NoVA diagrams. 1 The SCCS codebook and codebook index are at: http://eclectic.ss.uci.edu/~drwhite/courses/SC-CCodes.htm and http://eclectic.ss.uci.edu/~drwhite/courses/stdsvars.html. 2 For completed models in R script format: http://intersci.ss.uci.edu/wiki/index.php/Dow_&_Eff_Functions_1. 3 For use in class http://socscigate.oit.uci.edu & http://intersci.ss.uci.edu/wiki/index.php/Classroom_R_scripts. 4 v 0 Purpose. The purpose of this chapter is to explore the benefits of extending DEf two-stage least squares (2SLS) network lag modeling to Networks of Variables Analysis (NoVA). Instead of a commercial package, DEf runs models in R and on a supercomputer gateway (CoSSci) for open access research and courseware. Dow-Eff functions (DEf) extend the means to learn from cross-cultural research and other other survey databases with multiple imputation of missing data (using auxiliary factor-analyzed component variables derived from fully coded variables), controls for autocorrelation (Galton’s problem), and tests for robustness of statistical inferences. The use of DEf to construct NoVA models opens further foundational possibilities, such as Bayesian analysis of goodness-of-fit, systemfit of networks of variables, path analysis of observed variables, and changes in observed time panels of sample units (Henningsen and Hamann 2007:24), using a data frame from the R plm package (Croissant and Millo 2007, 2008). The specific DEf R package used in this Chapter extends the Standard CrossCultural Sample database (SCCS, Murdock and White 1969, White 2009) to an evolved SCCS that includes the Dow-Eff functions and accompanying dataset extensions needed for new kinds of analysis (the Standard Traditional Cultures Sample or STCS database). Other variants of DEf with commonly used cross-cultural databases are available. The novelty of the NoVA approach illustrated for CCR in this chapter is that it provides the advantages of considering networks of variables as a linked series of contexts that lead to refinement of theoretic insights and testing of theory. Estimates of models with different dependent variables can allow creation of a graph in which nodes are variables and directed edges go from proposed independent variables to a dependent variable. A NoVA approach thus provides a way of organizing and displaying estimation results, and can evolve into path analysis as an estimation procedure for systems of variables, including sources of measurement bias. DEf output can enable adjustment for endogeneity among the error terms and variables of linked DEf equations. Given theoretical precepts that can provide justification for complexity in the relationships and varying contexts among variables, networks of variables have direct and indirect effects that can clarify understanding of how variables are nested in different systemic relationships. The NoVA approach to survey data applies not only to future research but also to the thousands of past CCR studies using SCCS or other datasets. While judged as valuable, many of these cross-cultural studies failed to inspire confidence among ethnographers and readers from other fields. Often this is because Galton’s problem of the nonindependence of human cultures was ignored, amplifying the perception of spurious correlations when the interactions of direct and indirect effects are complex. If a statistical model is flawed it is often very easy for ethnographers familiar with a range of different societies to identify the flaws in cross-cultural research and react by rejecting CCR in its entirety. Consequently, ethnographers today tend to see CCR as chasing after correlated traits that have little to do with what is learned by systematic fieldwork using observation and participation of sufficient intensiveness and scope to allow us to refine what constitutes important dimensions and processes of social organization, beliefs, and systematic or interacting process that simple correlations of traits cannot identify. The problematics of success or failure in this approach are exemplified here in a series of reanalyses starting with a classic in flawed analyses (“Learning from Reiss...”), a dissection of Reiss’s Diagram 1 and new theorizing and modeling practices that illustrate how to avoid problems in CCR. Because many other researchers have focused gender equality a full NoVA approach is exemplified. v 1 Studies and restudies of this sort exemplified here subscribe to a fairly standard view that “The core of the process of science is generating testable explanations, and the methods and approaches to generating knowledge are shared publicly so that they can be evaluated by the community of scientists” (Carpi and Egger 2011). The focus on advanced NoVA modeling is intended to illustrate how to use R to construct advanced regressions models that reach their maximum potential given existing STDS/SCCS data and the DEf functions. The methods described here build on the Chapters 1-3 of this book, and focus on multiple interrelated models brought together through NoVA at multiple levels both of larger overview and greater detail. Once these models are stabilized through replication or modifications by other authors, the four models of gender equality variables studied, along with others, will be entered into the CoSSci Supercomputer Gateway site as a way to generate a database of finished studies. Replicability of findings and cross-validation of results are key objectives facilitated by open access databases and access to widely accessible free software, both of which can be closely mentored by a CCR research community. Additional software at the CoSSci model-registration site can be developed and used to test whether additional NoVA models can be generated from prior models. Overview of Cross-Cultural Research (CCR) and Networks of Variables A highly productive way to approach cross-cultural research is in terms of networks of observed variables that measure interactive ethnographically situated aspects of networks of interlinked local contexts and larger cultures. Networks of observed variables are studied in most fields of science and are fundamental to scientific practices in other fields. In bioinformatics, for example, seemingly distant from anthropology, there is a similar underlying complexity, often expressed mathematically: “In Markov models and Bayesian network [BN] models ... where for each position a fixed subset of the remaining positions [observations of varying units in genetic material] is used to model dependencies” whereas variable-order [VO] “Bayesian network (VOBN) models ... generalize the position weight matrix” ... according to the context of the subsets observed.” Here, the Markovian property assumes that local interactive processes are relatively invariant in time while non-Markovian processes are modeled as more complex history-driven alternatives, while context denotes that the localized processes also vary. The Dow-Eff functions (DEf) for CCR defined in Chapter 1 are based on somewhat similar ideas, altered to fit the field of anthropology. The misuse of correlations and proper use of NoVA to identify contextual variables: A Cautionary Example of Networks of Variables The first half of the Chapter analyzes the errors of a misguided effort to explain “Causes of Female Inferiority” through a Network of Variables Approach (NoVA), one that does not take context into account in a constructive manner. Although offered by distinguished sexologist Reiss (1978) it gives a radically incomprehensible understanding of the sources of beliefs in gender equality versus subordination of women. He undertakes this effort in his Chapter 4 (“The Power Filters: Gender Roles”) of his on-line book, Journey into Sexuality (1986; see also 1983). The book is the result of several years searching through the studies of sexuality in developing societies, including the data of the SCCS. His Chapter develops a model that can be crosschecked and tested with six SCCS variables (v3 v51 v54 v270 v570 v626 and v664) using the SCCS codebook (2012) in correspondence with variables in Reiss’s Diagram 4.1, shown below as Diagram 1. v 2 Diagram 1: Riess’s Proposed NoVA for Causes of female gender inequality: The SCCS variables are “Belief in Female Equality” (v626) and “Mother-Infant Involvement Minimal” (v51). These are reversed by Reiss to “Belief in Female as Inferior” and “Mother-Infant Involvement,” the latter with R=-0.30 and p <0.001 with “Father-Child Involvement.” This diagram reproduces Reiss’s Diagram 4.1, which he explains in his Chapter 4: “Now return to the connection of mother-infant involvement to machismo and the belief in female inferiority. It seems apparent that, given the pathway leading to mother-infant involvement, the reason for this variable relating to machismo and to the belief in female inferiority is that this measure of mother-infant nurturance is in part an indirect index of a male-dominated society. In such a society the mother will nurture the infant and will likely socialize the child in values and behaviors that will maintain the dominance of males. In this sense the mother-infant nurturance will emphasize machismo attitudes and promote a belief in female inferiority. Of course, not all societies that stress mother-infant nurturance are high on male dominance. Here, as elsewhere, the parts of a society can be put together in many ways. However, on the average, our statistical results indicate that such a custom of motherinfant involvement reflects male dominance in a society.” (italics added) A source of confusion in this diagram derives from Reiss’s decision regarding labeling of variables (noted in his Appendix A): “All variables have been arranged so that the categories are coded from low on the variable to high on the variable when applying my label for the variable.” This is neither necessary nor recommended. He reversed the label and ordering of v626 from SCCS (1=Yes, 2=No such belief) to the following entry in Table 1. Table 1: Reiss’s reverse definitions of variables R626 IS THERE A CLEARLY STATED BELIEF THAT WOMEN ARE GENERALLY INFERIOR TO MEN? (This is Reiss’s reversal of the original SCCS v626 variable) 1. No such belief 66 71.0 2. Yes 27 29.0 R51 TO WHAT EXTENT ARE CARETAKERS OF INFANTS NON-MATERNAL OR MATERNAL? (SCCS R51 NON-MATERNAL RELATIONSHIPS, INFANCY Barry & Schlegel 1980:171-179 13a) (This is Reiss’s reversal of the original SCCS v51 variable) R Cross-tab table(7-sccsA$v51,3-sccsA$v626) with variables reversed Yes 1__2 Belief that Women are Generally Inferior to Men (Reversed from SCCS) 1 0 1* Mother minimal except for nursing 2 2* 0 Mother minor but significant Fisher Exact Tests 1,2-tailed 3 5* 0 Mother < ½ care, i.e., Mother has helpers 0 8 p < .02 4 27* 8 Principally Mother, others important roles 8 35 24 28 5 20 14* Principally Mother, others minor roles 16 21 p < .03 1,2-tailed 6 1 2* Almost Exclusively Mother v 3 For instructors or students to trace out the mistakes of past studies, replicate their findings, or make improvements, short R commands are occasionally interpolated here within the text, as done in Table 1 for SCCS v51, NON-MATERNAL RELATIONSHIPS, INFANCY, with Reiss renaming his R51 as “Mother/Infant Involvement” (low Nonmaternal care), without specifying the identities of alternative caretakers. Thus codes 16 in Table 1 are reversed. R51 is positively correlated with R626, calculated by corr.test(7-sccsA$v51,3-sccsA$v626), giving r=0.31 and significance p=.01. R51 in his diagram is negatively correlated with V54 “Father infant involvement” and corr.test(7-sccsA$v51, sccsA$v54) gives r= -.30 and p < .001. There is nothing wrong with Reiss’s correlations. Reiss’s crosstabs and correlations of mothers’ and fathers’ roles with their offspring raise the following question: Did he forget about the possibility that a father might be a mother’s helper, as in row 4, column 4 of Table 1? That a clear possibility supported by the crosstab in Table 2: Table 2:Cross-tab table(7-sccsA$v51,sccsA$v54) corr.test(7-sccsA$v51,sccsA$v54) r=-.30 p < .001 1 2 3 4 involved 1 0 0 0 1 2 0 0 2 0 3 0 0 3 5 4 0 6 14 34 5 <- 5=Father helps with children 1=Father not 0 0 2 5 Mother minimal except for nursing Mother minor but significant Mother < ½ care, i.e., Mother has helpers Principally Mother, other important roles; father a helper 5 6 4 0 8 22 27 4 1 0 1 Principally Mother, others minor roles; father a helper 0 Almost Exclusively Mother, no help from father In Table 2, the more father helps in childcare, the more his wife is able, logically, to shift her efforts to him. Reiss made the mistake of asserting that the arrows in his diagram show that “male groups lower the likelihood that fathers will be involved in the care of their newborn [a logical possibility], and that in turn [less fathering] increases the likelihood that mothers will be involved in the care of the newborn.” That is an incorrect reading of his Diagram (in a correct reading of the diagram, Low Father Involvement —> Low Mo Involvement). Table 2, however, shows a probable egalitarian parental division of labor (an hypothesis that is more likely if there is further supportive evidence). These data, then, with closer examination, conflict with Reiss’s statement that “The diagram indicates that the more the mother nurtures infants [v51]..., the stronger the belief in female inferiority.” Table 2 and the diagram itself (ignoring Reiss’s confusion) reflect that the less the mother nurtures infants [v51], up to half their care but not more, the more father is active with children. This type of role of father is congruent with male large fraternal interest groups (v570, corr.test(sccsA$v570,sccsA$v54) having a negative correlation with father’s involvement), probably because high levels of polygyny in fraternal interest groups usually wives and co-wives in charge of infants. Reiss ignores the social contexts of the variables he investigates, misinterprets his own diagram, and fails to realize that correlations are not causation. It also fails to recognize that if v570 correlates negatively with father’s uninvolvement with infants, then his involvement with infants might correlate with monogamy and a higher degree of help for his wife in care of infants. v 4 Each of these logical defects in Reiss’s arguments represent mistakes that are widely found in cross-cultural studies: inferring causality from correlations; cherrypicking the most significant correlations to formulate hypotheses; inferring time sequences from correlations. For Reiss, a better use of a NoVA model, with attention to additional variables that might provide a context for interpretation of SCCS variables, might have led him to more accurate conclusions. Instead, he was overzealous in drawing empirical conclusions from a slender thread of correlational findings. His description of how he constructed his Diagram is flawed, constructed so that “The lines in this diagram represent all the significant correlations among variables found in the statistical analysis of the data,” whereas what he has done is to cherry-picking opportunistic correlations that support a limited model and bizarre theory not supported by contextual evidence that he would have uncovered had he constructed Table 2. We should regard with great suspicion his assertion that “The order of the variables represents ... a time sequence of events starting with the advent of agriculture and ending with a belief in female inferiority.” Finally, his notion that a relatively minor variable (mother-infant bonding, given many other variables of a similar nature or larger scope) could be the “cause” of major variable like beliefs in women’s gender inequality represents an imaginative leap that is patently absurd. Without trying to infer causality, as does Reiss, NoVA analysis – adding more contextual variables – would be helpful is dispelling Reiss’s “pet hypotheses.” For example, corr.test(sccsA$v53,sccsA$v626) is positive, and shows increases in father’s care for infants concomitant with increasing beliefs in female gender inferiority (p=.05) although corr.test(sccsA$v54,sccsA$v626) is positive but not significant (p=.12) for concomitance with father’s care for older children. It is hard to “build” a theory based on correlations and unsupported inferences such as these although such is the fare of many cross-cultural studies. For any given correlation there many be many “third factor” effects that when controlled vis-à-vis alternatives in as simple a model as ordinary multiple regression, may be found to be spurious. Random error and missing data may also produce spurious correlations. Galton’s problem of nonindependence of cases may produce spurious correlations. DEf regression gives the possibility of correcting for Galton’s problem (using an STSC enhanced-SCCS database instead of raw data), for missing data. Heteroscedastic errors in DEf regression may be a source of spurious correlations, and so can endogenous correlations between model variables and error terms. Reiss did not get help from statisticians or ethnographers on his CCR project, apparently did not read many ethnographies or prior studies, and his conclusions and explanations illustrate the kinds of mistakes in analysis and interpretation easily made in a course or student project without access to STSC and DEf. It is unfortunate that Reiss’s mistakes in labeling and understanding SCCS variables led him to a strongly misogynist conclusion that beliefs in female gender inferiority is promoted by the presence of mothers who fail to provide full care for their infants in allowing help from others (assuming the helpers would always be women!) and that this misunderstood variable indexes a male-dominated society to which mothers are complicit to the extent of socializing their children in machismo attitudes! To make matters worse, Reiss argued that, among the explanations he could think of, scientific method supports the simplest explanations as likely to be the best, on v 5 the pretext that Occam’s razor is a valid scientific principle from which to choose among alternative possible explanations: 5 “I believe the simpler explanation is that such relationships are part of the overall pattern that makes up a male-dominated society. I do not believe that the involvement of the mother in infant care, per se, is what is causally important. The key is that mother-infant involvement is one way of freeing males for more powerful and prestigious positions in society. Therefore, in male-dominated societies that type of female gender role will be encouraged. At the same time that type of society will encourage the mother to pass along that society's male-dominant values even if they deprecate the female gender role.” [This is askew even in Diagram 1 where the involvement is less not more]. The model in Diagram 1 is used by Reiss to justify an incredibly mistaken crosscultural conclusion arising from mistakes in his research processes. Reiss (1960:136143) describes his Wheel Theory of Love, a theory which motivated sociological research by Reiss and associates from 1960 over two decades (Borland 1975), as “pretty much universal” in terms of four stages of love: (1) rapport; (2) self-revelation; (3) mutual dependency; and (4) personality need fulfillment. Like spokes on a wheel, the cycle can turn or repeat many times. His review of ethnographic examples (p21-30), citing standard sources like Ford and Beach (1953), Murdock (1954) and uncontroversial ethnographies. It is only partly clear how, out of cross-cultural data for the SCCS acquired from HRAF, Reiss could draw out a new theory that, as opposed to his Wheel theory and ethnographic summaries in 1960, was so strongly misogynistic as to non-Western peoples. The clue is that Raoul Naroll replaced Clellan Ford as President of HRAF in 1973 and Reiss was probably influenced by Naroll’s views in the early 1980s that “correlation is causation” (personal communication with White at UC Irvine). The methodology exhibited in his Diagram 4.1 (Diagram 1 above) is one where the highest correlations among his variables of interest are shown as if they were directed causal links with a concocted causal order. This approach has long been abandoned. Reiss’s study, then, is a good example of how not to do cross-cultural research. But although the book has 61 Google citations and three reviews, including Frayser and Whitby (1995; Frayzer a CCR specialist in sexuality), no one has noticed or used the actual data of the SCCS to discover the panoply of Reiss’s errors. CCR needs criticism of studies based on purely correlation evidence if it is to survive as a scientific field. For this reason Reiss’s book and his diagram are an excellent online “starters” for students to learn about 1) errors that are easy to make as a CCR beginner, 2) how to test for replication, 3) how to search for context variables that can clarify the interpretations of relationships among variables, 4) how to use DEf R scripts after developing theoretical ideas to test, and possibly, 5) at a higher level of formulating hypotheses and theory, using NoVA to help in testing models and identifying context or third factor effects. Learning from Reiss’s mistaken study: What to do and not to do SCCS databases, including the SCCS that was misused by Reiss, have long been available online to use with SPSS (or the alternative free software PSPP). Only in the past few years is it possible to analyze these data with its STCS extensions and DEf software. The STCS project encourages authors to retest their earlier models and results, to use DEf functions to reanalyze the studies of other researchers, and to use DEf in new studies and in its extended form in new NoVA approaches. The original SCCS Theory develops by two opposites: simplicity and conservation, observation and correction, not wild hypotheses about overall pattern from shreds of data (Quine 1974:137). Although radical simplification may be needed for new theory, Quine’s relative empiricism is “don’t venture farther from [the] evidence than you need to.” Diagram 5 offers evidence, not fantasy. 5 v 6 codebook (2012) is online, and there is no reason to reverse the direction of variables from the SCCS codebook, as did Reiss, to make things “less complicated.” Had Reiss examined cross-tabs of v54 (and v53) Father/child (like Father/infant) closeness, which show negative correlation with v51, he would have found that they correlate with v615 Husband-Wife Deference (r=-0.19, p=0.10) and v752 (separate) Husband-Wife Eating Arrangements (r=-0.32, p=<0.001), negatively, and positively with v750 Sleeps in the same bed with his wife (r=-0.24, p=0.03), each of which measure intimate closeness. These results support the idea that fathers’ caring for children is linked to intimacy with the wife, with husbands helping their wives in care for infants (v51) and supporting beliefs in female equality. All other SCCS husband-wife variables (v747, v754, v755, and v973; also 749) that do not reflect intimate closeness have nonsignificant correlations with v54 and v53. All these results go strongly against Reiss’s hypotheses about cause of beliefs in female inferiority. Networks of variables analysis (NoVA) with direct and indirect paths of potential influences are not the only way to refute Reiss’s theory and his Diagram 1, as in the discussion above. But a key feature of a statistically valid NoVA model is that DEf and other regression models eliminate what might be spurious variables that are not predictive given the independent variables that contribute to total variance and that pass tests of exogeneity and other criteria. A NoVA graph may make some variables endogenous if error terms of constitutent DEf models are correlated with the independent variables of other equations. Provided that there is some exogenous variance to start the system, there are corrections for the endogeneities that allow statistical inferences to be made about direct and indirect effects (Wikipedia:Seemingly unrelated regressions, SUP). An SUP R script (Henningsen and Hamann 2007), for example, is available to make these corrections. Development of Theory related to Gender Variables: Male Dominance, Female Power, the Plow, and Complexity The development of possible models about gender inequalities requires thinking through theories about sociocultural processes as (ideally) the first steps in developing a NoVA model. Thinking about Reiss’s Diagram 1 as a ”Proposed NoVA for Causes of female inequality,” for example, would be based on the argument that agriculture is a main driver of three processes: Class stratification (v270), Type of kin group (e.g., v570), and extent of Mother’s involvement in child care (e.g., v51, v52). For Reiss, mistakenly, these feed indirectly into “Beliefs in Female Inferiority.” And single measure for agriculture (v3) is overly simplistic. Further, if there were “root variables” early in time that led to beliefs about gender inequalities, multiple variables need to be considered as relevant outcomes. One hypothesis deriving from anthropological sources that has empirical support in cross-tab correlations, significance tests and maps is that agriculture without the plow (NoPlowAgric=(1-sccsA$plow)*sccsA$v3) supports Beliefs in Women's Equality with men (p<0.0006) and Female Creation Figures. But there is no effect of “NoPlowAgric” on mothers' or fathers' roles. (Maps of the SCCS geographic distributions for these variables are shown at http://intersci.ss.uci.edu/wiki/index.php/SaveGGplots.) Conversely, agriculture with the plow (NoPlowAgric=sccsA$plow*sccsA$v3) correlates with (but is not a predominant effect on) Beliefs in Female Inferiority (v626), a greater role of Fathers with their infants and children (v53-54), and a slight increase in "Machismo,” i.e., Male Toughness (v664). Plowing requires physical control over a large domesticated animal while driving the blade into the soil, which makes it almost v 7 universally a masculine activity, and tends to index male dominance in agriculture (Alesina, Giuliano, and Nunn 2013, Braudel 1988, Burton and Reitz 1981, Pryor 1985). Pryor’s (1985) study of the plow (sccs v1123-1131) shows that, although present by 2000 BC in Sumer and Egypt, barriers to adoption are severe. Return to effort is insufficient for adoption in societies with low population density. Boserup (1965, 1981) shows that “where the population density is very low, slash-and-burn agriculture and similar techniques which feature very long fallow periods have much higher labor productivity which include use of the plow. However, as the population density increases, there is a shortening of the fallow period, a decline in labor productivity, intensification in the type of agriculture practiced, and a replacement of the previous ground-breaking instruments by the plow.” (Pryor:730). “The need for a further change of tool arises when the fallow, owing to too frequent cultivation ... gets still more grassy with less trees and bushes. The best method for clearing of land under long fallow—the burning of the natural vegetation—is inefficient when the natural vegetation is grass. This is so because ... these roots are exceedingly difficult to remove by means of hoeing. Thus, the use of a plow becomes indispensible” (Boserup 1965:24). See: Map: http://intersci.ss.uci.edu/wiki/index.php/SaveGGplots#Plow. Population growth is not exogenous in creating a transition to plowing, however: Pryor notes that tropical climates are not suited to wheat production and cold climates are unsuited for rice. Of Murdock’s (1967) 892 Atlas societies, 109 of the Mediterranean and East Eurasian societies had plows, while only 12 of his Murdock’s 674 “peripheral,” mostly tribal and peasant societies, at the time of ethnographic study, had plow agriculture. Further, tree crops and maize, which are land-intensive and requires only small holes for planting of seeds, do not require plowing, while (sccs v1127) land-extensive wheat, buckwheat, barley, wet rice, rye, teff, and industrial crops are “plow-positive” as opposed to “plow-negative” millet, sorghum, maize, dry rice, and root/tree crops. Thus a regression of Plow = Population + PPcrops, or, in R: lm(sccs$Plow ~ sccsA$v1122 + sccsA$v1127) gives an Adjusted R-squared of 0.4438. A better NoVA (without DEf) than Reiss’s Diagram 1 might involve these variables and other factors (exploring these variables is a possible modeling exercise for students): Population (v1122) ——————— PPCrops (v1127)—>Plow (v243)—> Beliefs in Female Inferiority (v626) (v664) Machismo,” i.e., Male Toughness —————> Greater Father’s role/infants and children (v53) Agriculture (v3)———————> Greater Mother’s role/infants & children (v5152) Female Creation Figures (v676) lm(sccsA$v664 ~ sccsA$v53) p<.001, no other effect Diagram 2: Initial consideration of a NoVA diagram from first causes Table 3 shows correlations and significance tests for many variables above and others that might help define hypotheses for an exploratory NoVA concerning gender v 8 inequalities. As a student or new researcher, one needs to do such tests to become familiar with the topic and variables chosen for study, including study of the codebook so as to become familiar with details variables available for study. For example, half of Whyte’s variables (v625 and v626, manliness/machismo and beliefs in female inferiority), the two variables most associated with child care roles of Mother/infant and Father/infant/child, are significantly correlated with Complexity (v158.1) and also with each other (R=.32, p<0.001); while Sanday’s variables are also linked to Mother/infant and Father/infant/child. They are independent of Complexity, and also have the highest correlations overall and with princomm, the principal component. They were included by Reiss, based on correlations, for his Diagram 1. Among these variables, only v626 is significantly correlated with the plow (R=-.38), Complexity (p<0.001), and total population (v1122, p<0.001). v626 is also correlated Equality with husband (v621 R=.39, p<0.001). Care for older children, v52 tends to shift to nonmaternal case, and has no significant correlations to other variables. Overall, It seems that social complexity (v185.1) might not be associated with gender inequality except for v626 male/female equality where effects of plowing also have a major effect. Table 3: Correlations, significance and principal components for Gender Inequality variables, Plow, and Complexity. Sorted by study (Sanday 1981, Whyte 1978, Murdock 1967, Barry and Paxson 1971, correlations among these variables are high generally, but are even higher for variables within the same study. Light shading shows significance at p<0.10, dark shading at p<0.16. *NPC is the NoPlowCplx variable with Plow set of zero, else Cplx (v158.1). It has no significant correlations except for v664 (r=.15, p<0.04). Plow correlates with v626, v51, v52 and total population, v1122 (r=.57, p<0.000000001); v1122 correlates with v626, female equality (r=-.40, p<0.00001), likely in association with plow and/or complex machinery usually operated by males.6 Variables Signif\R ♂Toughnes ♀♂Creatio ♀Equalit ♀NoHuDo ♀Propert n y m y No♂Mach o ♂Plow s V664 V676 V626 V621 V628 V625 Plow24 3 Murdoc k 2 Authors Princomp V664 Sanday 1981 Whyte 1978 -0.28 -0.40 0.33 (0.17) n.s. 0.37 -.14N pC* -------- 0.31 -.19 -.16 -.31 -0.14 -.16 6 Not all relations among variables are pairwise, but can involve three or more variables. For example, because Plow, constituting for 15% of the SCCS sample of 186, is associated with complexity (r=.67, p<.001), 3-way fisher exact significance tests may be needed in conjunction for other variables included in Table 3. v 9 V676 V626 V621 V628 V625 Plow24 3 V51 0.37 V52 n.s. V53 0.86 V54 0.87 V369 .-V158.1 V1122 V64 pop p=.01 -------- -.40 -.17 -.07 0.30 -.03 p=.16 p=.32 p=.02 p=.34 p<.001 p=.31 p=.63 p=.04 -------p<.001 p=.09 p<.001 0.39 -------p=.04 p<.001 0.18 0.26 -------p=.04 0.32 0.42 0.23 -------- -.38 -.06 -.06 0.08 p=.11 p=.76 p<.001 p=.66 p=.56 p=.47 -------- .21 .31 p=.001 p=.07 p=.04 p=.006 .70 p=.001 p=.01 p<.001 p=.01 .75 p=.05 .12 p=.03 .12 .40 .59 .12 .98 .86 p=.03 .93 .35 .79 p=.003 .41 p=.005 p=.01 =.06 p=.05 p=.02 .69 .83 .-- p=.84 p=.91 p<.001 p=.33 p=.34 p=.08 3-17 p=.90 p=.37 p<.001 p=.77 p=.71 p=.36 p<.001 p=1.0 p=.38 p=.08 p=.50 p=.26 p=.08 8-6 An R function that computes correlation and significance tables is at http://intersci.ss.uci.edu/wiki/index.php/R_correlation_matrix 54% of Sanday/Whyte variables are significant at p<0.10 66% at p<0.16 46% of Barry & Paxson Parental variables are significant at p<0.10 57% at p<0.12 table(sccsA$plow,round(sccsA$v158.1/10,0)) 1 2 3 4 5 0 14 63 62 13 1 1 0 0 6 10 17 Plow v243 Equality v626 Cross-tab table(sccsA$plow,sccsA$v626) 0 1 1 15 12 2 59 7 Plow Fisher exact dichotomy p= < 0.0006 (2 tailed) All of the findings in Table 3, however, based on correlation coefficients and significance tests with imputed values for missing data, are suggestive but not definitive statistical tests – none reflect correlations adjusted for Galton’s problem, so the estimates are likely to be inefficient: often above or below their values when autocorrelation is taken into account. The principal components (also based on fully imputed variables) show a single factor structure that accounts for 37% of the total variance, p<1.-42, among the six gender and four parental (v51-v54) variables, with signs that reflect female(+) vs. male(-) bias except for v51. The 19% variance of the second factor is not significant, having heavy loadings for v52, and less for v628 and v621; v664 loads independently with little percentage variance. Whether v621 loads on the main component depends on whether v626 is included in the 1stPCA (yes->no; no->yes). The 1stPCA, however, is not significantly correlated with Plow, population density (v64), or the general complexity measure (v158.1 as the sum of variables v149 to v158). Only one among the complexity variables is significant, at p=0.04: v157 (at the low end of the political integration scale) which correlates with the female bias end of the 1stPCA at r=-0.13. The opposite end of v157, at 3-4 political levels of jurisdiction above the local community, is correlated with male bias on the principal component for gender biases. Total population (v1122) is also correlated with 1stPCA male bias at r=-0.19 (p=0.01). v 10 Other variables that correlate with the gender bias 1stPCA include three of the 15 sexual practice variables of Broude and Green (1976): talk about sex (v159 r=-0.13 p=.08) with male bias; double standard extramarital sex (v169 r=0.15 p=0.04) with male bias (the opposite being single standard pro-female bias); and male sexual aggressiveness (v175 r=-0.19 p=04) with male bias (the opposite being male diffidence with pro-female bias). These results together resemble the known historical role of male power, especially in capital cities, exercised in the extramarital sexual exploitation of women. Although rape is not correlated with the 1stPCA it seems that sexual behavior needs to be investigated in terms of the norms of sexual aggressiveness and the political power of men. Given that the 1stPCA accounts for about 37% of the gender bias variances for seven of the ten variables in Table 3, and 15% of the Broude-Green sexual behavior variables, with relatively low regression coefficients, this belies a small but consistent pattern of sexual inequality among many aspects of gender differentiation, as attested by Whyte (1978). It is likely then to be the case that NoVA models of gender differentiation will give a more nuanced view of relations among elements of gender differentiation than principal components or the dimensions of scaling models. It could be said that political power differentials skew the great diversity of gender differentials seen in ethnographic studies towards inequalities favoring males. NoVA Recommendations for students and teachers Advice for students: Don’t be so quick, as was Reiss, as to take only a few selected variables to form a network of variables. Multiple independent variables at different distances from the core dependent variables can give contexts for interpretation. Paths of indirect and direct variables may hint at context in a way that would not occur if you treated them simply in terms of separate regressions. Sanday (1981) did extensive analysis in her book, working independently from Whyte and coding her own data for gender roles (e.g., v676 and v664). Out of the diverse samples coded within SCCS for gender roles, her results seemed to get the most out of a single subsample. Sanday’s study was highly coherent, probably because she had read and coded the ethnographies herself rather than relying on coding by others. Students don’t have to code their own sample to do CCR, but some ethnographic reading can be useful, e.g., from eHRAF (2012), especially through the list of its SCCS societies (White 2013). One technique is to pick pairs of ethnographies that contrast at opposing poles of a variable of interest: R command table (sccsA$v626, sccsA$v243>1), for example, will give a crosstab, table(sccsA$v626) and table(sccsA$v243>1) will show the numbers of societies with different characteristics indicated by numbers, and the SCCS codebook will tell what those numbers mean. The key to modeling, however, is to think out the theory behind the phenomena you want to study, based on readings and discussions of cross-cultural and ethnographic studies, from the furthest independent variables back, as in the rough sketch of Diagram 2. This is “backwards” in terms of modeling predictors of the dependent variable, but “forwards” in terms of conceptual modeling. As will be seen, however, there is also a component of Bayesian thinking about the order of direct and indirect effects on dependent variables as modeled by networks of DEf regressions that will lead to Diagrams 5 and 6 in the NoVA models exemplified here. The construction of a NoVA diagram is not an easy matter. It is easiest in classes with a series of student projects are indexed to dependent variables linked to independent variables that are sometimes shared and where an independent variable of one DEf model is occasionally linked to the dependent variable of another model, as shown in Diagram 4. Here solid arrows represent a positive and significant regression v 11 coefficient from an independent “downward” to a dependent variable; dashed arrows represent effects with a negative coefficient. Here for example the v660 polygyny as a dependent variable is linked positively to as an independent variable v578, fraternal interest groups, which in turn has negative independent variables v205-Fishing, and v663-Sanday’s (1981) Female Power Guttman Scale, and also the positive independent variable v1742-Selection of political officials at the lower level of hierarchy up through the hierarchy to the top. Diagram 4. http://intersci.ss.uci.edu/wiki/index.php/Indep/Depvar_list_-_All Diagram 4 is made by Pajek (Batagelj and Mrvar 2003) software, arranging the nodes in a directed asymmetric graph. Diagrams of this sort from classroom projects showed a rather dense configuration of links in regression models around the Moral Gods variable, studied by Brown and Eff (2010) and Snarey (1996), which prompted further interest by White, Oztan, Gosti, Wagner and Snarey (2011) in a small NoVA model slated for restudy (with Christian Brown) in a later chapter. For small NoVA diagrams, researchers and students can use DAGitty (see Appendix and Diagrams 5 and 6) with instructions on “launch” at http://www.dagitty.net/. Students are advised to use the SCCS variables without renaming and reordering their ordinal categories (as did Reiss in some cases). If necessary, one can flip all the names for greater clarity and the regression coefficients will remain the same. The coefficients for X –> Y and MaxX+1-X –> MaxY+1-Y will be the same so that no confusion arises if all variables are flipped. 7 NoVA analysis may not suit individual student projects well: they conceptually complex even if the component DEf (two or three at most) are easy to compute with the use of “totry” direction even from the very first computation of a trial model. They are ideal for a researcher with expertise in the field, whose goal includes a remapping of what is known or expected in the field using a much higher grade of analysis such as DEf network-lagged modeling with imputation of missing data. Ideally, in this case, the theoretical models are well known and extensive empirical studies have been done. But even in the simplest form, starting with an interest in 2 or 3 dependent variables that might be related given “totry” DEf guidance from the R screen that can guide the modeling in a straightforward manner. 7 I.e., when the signs of regression coefficients are reversed (from X —> Y to –X —> –Y) the slope and direction of a linear regression (X on Y or –X on –Y) remain unchanged labels are reversed from X to –X and Y to -Y. v 12 For Reiss’s research on gender roles, including his NoVA in Diagram 1, there was insufficient prior understanding of the field of study, and Reiss capitalized on shocking interpretations of correlational anomalies (v51 with v626). As a basis of a new analysis of v626 (Female equality versus Beliefs in female inferiority), Table 3 examined the larger context of variables related to gender by examining correlation matrices and principal components. Table 4 provided initial hypotheses about a DEf model with an interesting and complex set of independent variables that would seem to relate to a number of other topics or models concerning gender inequalities and parenting. The goal from here forward is simply to add several other clearly related variables according to the preliminary analyses shown in Table 3. Testing Alternative Hypotheses that may invalidate Reiss’s Theory Condensed views of NoVA relationships relevant to testing hypotheses about gender inequalities and that contradict Reiss’s interpretive “theory” of Diagram 1 are shown in Diagrams 4 and 5, based on DEf analyses of four dependent variables. Choosing appropriate variables for alternative explanations requires some initial hypotheses about the central topic. This chapter began with v626, measuring whether or not there are Beliefs in Female Gender Inequality (coded 1=Yes or 2=No), a variable more appropriately named “Female Equality” in line with the SCCS codebook. Contra Reiss (1978), one hypothesis is that variable v62 helps to explain v51, renamed “Mother’s helpers” from Table 2, based on the hypothesis that the “helper” relevant to v51 “Non-maternal care of infants” is the husband rather than other women. Another hypothesis to test is that other measures of male-female equality will help to explain the ““Female Equality” variable v626. Four dependent variables (Dvs) lower in Diagrams 4 and 5 that reflect aspects of Female Equality. The upper nodes that DEf regression results the Dvs in Tables 4-7. Diagram 4 shows predictors at the point of analysis where earlier modeling had begun using “totry” suggestions for independent variables predicting the four Dvs (Tables not shown). Diagram 5 shows predictors after the initial models had converged to the outcomes in Tables 4-7. In each case the Dvs tend have many more separate predictors than ones in common, which is expected. Diagram 4 shows four IVs (independent variables) in common (v53 Fa care for infant, v1122 log10 total population, v162 Foreplay, and the Bio.5 warmer temperature variable). Diagram 5 shows six IVs in common (v53, v154, v245, v625, v1257, the filariae parasite, and v1122, which is negative for Dv626 and positive for Dv621). Overall, these results give support for opposing hypotheses compared to Reiss, giving a new understanding of mothers’ diminished time devoted to their infants that depends on husband’s help with infants (v53) and older children (v54) in the context of more egalitarian husband-wife relationships simple agricultural societies (v921, v207, i.e., lacking the plow, which turns agriculture to a male pursuit). In both Diagrams parental measures (v369-v52v53 in 4 and v52-v53-v54 in 5) emerge as predictors of v51 and often for one or more “equality” Dvs as well, supporting hypotheses totally contrary to those of Reiss. Predictors of Dv676, Female Gender Origin Symbols have 3-4 indicators of matrilineal societies with gathering, women’s contribution to subsistence, food storage and settlement fixity and no male dominance. Gender Equality v626 tends to reflect a double origin in low complexity societies (no writing or advanced land transport) versus urban societies with monetary, i.e., older and more modern forms of equality, as was shown in the analysis of the first principal component of gender equality variables in Table 3. In both diagrams, v626, Gender Equality, predicts both Hu-Wife Equality (Dv621) and Dv51, supporting the “Mother’s helper” hypothesis contra Reiss. There is v 13 support here for the hypothesis that as the level of precision of the three “female equality” and v51 “mother’s helper” DEf models tends to increase toward “totry” convergence (and higher Rsquareds) between outcomes in Diagrams 4 and 5, there is greater support for the interpretation of the anomalous v51 – that mothers give more of infant care over to the husband in more egalitarian and/or monogamous in which husbands help their wives. Diagram 4: An initial NoVA showing independent variables for the four lower Dvs. v 14 Diagram 5: A Multifocal “Integral NoVA” helping to explain v51 “Mother’s helpers” as predicted contexts of Husband-Wife Equality and v676 No Beliefs in Female Equality In these NoVA diagrams all arrows are significant effects in 2-stage DEf regression and no arrows were found to be reciprocal except for the dubious v51<— >v626 Dvs where it is extremely unlikely that v51—>v626 is a valid predictor. These “context variables” for Dv51 give a very different view of family life than Reiss’s misogynistic interpretation of Dv51 on the basis of correlations alone. Although the v621 measure of equality is a less than significant regression variable for Dv51. Diagram 5 also shows that each of the four dependent variables (v621, v626, and v51) has its separate independent variables although many of these variables have common themes related to the two major kinds of regression effects: (1) those mostly about belief systems, origin myths, population, and nuclear family structure (mostly pertaining to nodes in green) that show effects on male-female and husband-wife equality and (2) those about how couples balance childcare, sexual intimacy, how mother’s and fathers foci in childcare is rebalanced when the couple is more egalitarian rather than authoritarian. DEf results for Diagram 5 are shown in Tables 4-7. The purpose of comparisons between Diagram 4 and 5, with accompanying tables only the latter, is to suggest that every “finished” DEf is still considered as a good approximation. Because of missing data imputation every model will give slightly different results when rerun, especially for variables with more missing observations. Thus it is appropriate, as between Diagram 4 and 5, to show the evolution of one or more models in a single NoVA diagram, which can show changes in a very compact form. Table 4: DEf-based model for NoVA predictors of Beliefs in Female Equality/Inferiority v 15 (SCCS Variables: Those in red fonts report male biasing effects against male-female equality) DepVar_v626 Beliefs in Female Equality Descriptors Variable of (Intercept) NA Wy R =0.92 Bio.11 NA Warm V1257 V149 Filariae Scale 1 Writing V152 V154 Scale 3 Urbaniz. Scale 4 Land Trans. V155 V51sq R2=0.60 Coef. Hausman test Pval stdcoef Pvalue Star VIF 2.284 -0.132 NA -0.063 0.000 0.470 *** NA 1.253 0.95 0.18 0.001 -0.117 0.288 -0.257 0.010 0.012 ** ** 2.329 1.912 0.92 0.87 -0.079 0.070 -0.261 0.214 0.016 0.023 ** ** 2.305 1.725 0.39 0.92 Scale 5 Money -0.110 0.081 -0.297 0.266 0.007 0.018 *** ** 2.352 2.201 0.62 0.40 V625 MaleAggression Low Machismo 0.018 0.211 0.217 0.361 0.010 0.000 ** *** 1.302 1.343 0.90 0.49 V64 (Corr.with Plow) Pop. Density V676 Creation Stories -0.132 -0.121 -0.580 -0.201 0.000 0.035 *** ** 2.472 1.256 0.74 0.95 Log10Tot.Pop. (neg.) (neg.) n.s.0.145 2 Male v1122 @Dv626.1 RESET test. H0: model has correct functional form Fstat Wald test. H0: appropriate variables dropped Breusch-Pagan test. H0: residuals homoskedastic Shapiro-Wilkes test. H0: residuals normal star df pvalue star 14.496 1.729 596 23 0 0.201 0.631 0.96 261 128 0.428 0.329 *** Table 5: Wy (p=n.s.) and Predictors of v51 Mother’s helpers with Infant DepVar_v51 Mothers helped Descriptor of Variable (Intercept) @Dv51.5 NA 0.402 Wy NA 0.331 -0.390 0.122 -0.152 0.047 0.015 -0.154 0.194 -0.201 0.221 0.141 R2=0.73 R2=0.37 stdcoef VIF Hausman test ** ** 1.072 1.176 0.664 0.914 0.006 0.002 *** *** 1.500 1.474 0.379 0.505 0.224 0.001 *** 1.281 0.799 -0.209 -0.242 0.000 *** 1.143 0.703 0.208 0.209 0.004 *** 1.104 0.261 0.363 0.220 0.003 *** 1.167 0.863 0.010 0.141 0.095 * 1.190 0.841 0.003 *** Coef. NA pvalue Star 0.494 NA v2001=1 Deep Islam v1257 Filariae v1258 v921 Spirochetes Scale 6- Land Transport Female Caretakers Early Boy Lo Care in Early Childhood Are Inferior to Men Importance Trade Agricultural Potential 1 v245 Milking When these are added On Agriculture Father in Infancy Scale 5- Technol. Specialization But either Hausman signif v154 v369 Sex of Parental Care v52 Non-maternal v626 No Belief That Women ... v819* See Table 8 v207 Dependence v53 v153 @Dv51.5 Role of ... 0.037 0.170 Or these Or 0.584 n.s. pvalue star n.s. VIF *v819 does not pass the Bonferonni test for group tests No effect of DQC variables v are 1.103 Autoc Fstat df RESET test. H0: model has correct functional form 4.107 315.8 0.044 Wald test. H0: appropriate variables dropped 1.905 76.1 0.172 Breusch-Pagan test. H0: residuals homoskedastic 2.458 580.3 0.117 16 ** Shapiro-Wilkes test. H0: residuals normal 1.170 154.4 0.281 R script at: http://intersci.ss.uci.edu/wiki/index.php/Dv51.5 Table 6: Wy & Predictors of v676 Female-Neutral-Male Gender Origin Symbolism coef DepVar_v676 Male Desc R2=.62 stdcoef (Intercept) NA 2.547 NA Wy R2=.85 NA 0.134 0.073 Bio.5 Temp. Warm -0.003 -0.205 V203 Dependence On Gathering 0.072 0.146 V150 Scale 2 Fixity 0.088 0.181 V21 Storage of Food Surplus -0.208 -0.204 V53 Role of Father Infancy -0.248 -0.292 V670=v663+v669 No Male Dom. 0.211 0.213 V673 Sex of Creative Agent 0.214 0.510 V826 Female Contr. to Subsistence -0.010 -0.196 @Dv676.5 old No effect of DQC variables (good at VM) RESET test. H0: model has correct functional form Wald test. H0: appropriate variables dropped Breusch-Pagan test. H0: residuals homoskedastic Shapiro-Wilkes test. H0: residuals normal pvalue 0 0.404 0.003 0.011 0.014 0.003 0.001 0.002 0.000 0.008 Fstat star *** *** ** ** *** *** *** *** *** df Hausman test exogenous NA 1.292 1.144 1.660 1.696 1.243 1.356 1.090 1.279 1.097 Star VIF NA 1.193 1.159 1.740 1.623 1.233 1.342 1.116 1.177 1.066 pvalue 2.835 3.277 2.642 399 73 245 0.093 0.074 0.105 0.777 204 0.379 * * Table 6: Wy & Predictors of v676 Female-Neutral-Male Gender Origin Symbolism coef stdcoef Desc R2=.37 (Intercept) NA 2.334 NA Wy Netwk Lag 0.460 0.230 bio.5 Bio.5 Hot -0.217 -0.279 fishgath fishgath 0.038 0.309 v150 Fixity of Resid. 0.113 0.230 v205 (not 203) Dep.on Fishing -0.141 -0.315 v21 Food Storage -0.244 -0.240 v53 Role of Father -0.292 -0.339 v670 NoMaleDom 0.151 0.155 v826 FemaleContr -0.015 -0.310 @Dv676.5 new No effect of DQC variables (good at VM) RESET test. H0: model has correct functional form Wald test. H0: appropriate variables dropped Breusch-Pagan test. H0: residuals homoskedastic Shapiro-Wilkes test. H0: residuals normal Hausman test. H0: Wy is exogenous Sargan test. H0: residuals uncorrelated with instruments DepVar_v676 Male VIF relimp pval hcpval NA 1.197 1.352 1.805 1.469 1.760 1.233 1.351 1.053 1.102 Fstat 0.9989 0.1035 1.4443 6.141 0.659 0.637 NA 0.096 0.026 0.023 0.010 0.028 0.035 0.084 0.037 0.068 df 6 84 325247 131872 20 410 0.000 0.013 0.013 0.004 0.017 0.004 0.007 0.000 0.064 0.000 pvalue 0.357 0.7485 0.2294 0.0132 0.423 0.424 0.000 0.006 0.012 0.003 0.007 0.009 0.003 0.000 0.066 0.000 Star ** Table 7: Wy & Predictors of v621 Equality of Husband and Wife (No Hu. Dominance) R2=0.53 DepVar_v621 desc (Intercept) NA Wy NA Fieldwork by Female Fieldworker fluent Total Population R2=0.89 Femalefieldwkr Langfieldwk v1122 v log10 coef 1.65 stdcoef NA pvalue 0.027 -0.766 -0.112 0.124 0.310 0.258 0.004 -0.357 -0.217 0.163 0.450 star VIF Hausman ** NA NA 1.383 0.715 *** 1.138 0.575 0.009 *** 1.297 0.310 0.000 *** 2.676 0.836 17 v53 Role of v54 Role of v626 v68 v817 @Dv621.1 Father, Infancy Father, Early Childhood Female Equality Form of Family Imptnc Hunting -0.352 -0.569 0.001 *** 3.038 0.601 0.272 0.490 0.001 *** 2.803 0.845 0.348 0.315 0.003 *** 2.205 0.827 -0.048 -0.303 0.000 *** 1.237 0.293 0.010 0.332 0.000 *** 1.639 0.947 DQC variables added (Divale 1976), little change Fstat df Pvalue RESET test. H0: model has correct functional form 8.073 58.5 0.015 Wald test. H0: appropriate variables dropped 0.927 31.7 0.343 Breusch-Pagan test. H0: residuals homoskedastic 0.565 144.6 0.453 Shapiro-Wilkes test. H0: residuals normal 0.056 238.4 0.813 Star ** Convergence in DEf Modeling A DEf model for a dependent variable should be as complete as possible, not just with a few chosen variables, but also with many independent variables and high Rsquared, many statistically significant independent variables (pvalue <.10 or .05 or less), and with VIFs (variance inflation factors) below 5 or 10. DEf has features that allow the modeler to achieve convergence in adding variables to a model, as well as a robust modeling strategy that is facilitated by analysis of each file given as csv file output or as R script exit functions h[1]-h[10] (2012:[11]). The csv file and “didwell” h[9] or “totry” h[8] functions help to improve the model. The list of variables “totry” are calculated from an (“add1”) function that evaluates not only all the variables in the evm list (imputed for missing data) but all those in the full dataset except those created by the user to define new variables by combinations of others. The high numbers of independent variables for the models in Tables 4-7 (ten on average) reflect systematic use of the “totry” option in successive iterations. As a model improves to include a broader range of significant regression effects, additional variables are often able to pick up residual variance in the model due to other effects. One possibility that may occur as a DEf regression model converges to a broader array of independent variables, often to create a more robust model in the respects listed above, is that various sources of autocorrelation may be “explained away” by independent variables that have spatial and environmental clustering, distributions that follow language phylogenies, or other sources of non-independence among cases in the sample. Of the four Dv variables in Diagram 4 chosen as measures of gender equality, three (v621, v626, and v51, excluding v676) include results of “totry” DEf options in which the effect of autocorrelation in the model has converged to non-significance (Dv626 is marked A_Dv626 in Diagram 4 to indicate significant autocorrelation). There are useful rules of practice to follow in creating successful DEf models. A series of steps in modeling is entailed in which successful improvements often entail the possibility of further results attuned to the “totry” suggestions. If the objective is to look for convergent results it is often best to examine results via the “h[]” functions, starting with h[5], which returns model $diagnostics. If the h[5] Breusch-Pagan test of homoscedasticity (H0:) is non-significant, h[4] will show the appropriate $ModelRobust output; if not, then h[3] will show the $Rmodel adjusted for heteroscedasticity. These v 18 give the coef, pvalue, and VIF for each variable name and number. Rsquareds and other statistics are given by h[6]- h[7]. Most important for modifying the model are the h[8] and h[9] outputs of the lists of “totry” and “didwell” variables. They should be compared to the output of the riv (restricted independent variables). One possible objective for convergent modeling is to "match" the riv list with “totry” variables (h[8]) while retaining the “didwell” (h[9]) variables and any of the constructed variables likely to be significant or key to hypothesis testing. The DEf R software can draw "totry" variables from those in the evm list of variables which includes those in the riv, iv (having additional variables) and others added to evm by the user. (In the online DEf options, these are listed in three separate windows that cumulate variables). The "totry" variables, however, actually come from entire dataset and squares of those in the inclusive evm list. Some "totry" vars will not conform to the model and can be dropped or later reentered. But even with all "totry" vars in a current there may be other significant variables that fit the model and should be retained along with “totry” variables. As practical matters if variables are added to riv from “totry” the researcher should make sure they are entered in the iv<- and evm<- lists. In DEf R scripts all riv lists must also be in iv, and all iv lists in evm. In the online windows for entering variables each must be in riv, iv, or evm alone, those in riv are automatically copied into the iv list, and those in iv into the evm list. In either approach, the user must take are to locate riv variables properly and name them correctly, separated by commas, contained in quote marks, often with "v…" leading the variable name, e.g. "v777" and not "777.” Any variable with suffix “sq” is a square of a primary numeric variable. When variable v277sq is listed, for example, v277 must also be specified in evm (2012:automatic if 6 or more values; 3 or more in 2013); if the R script requires two statements, as for example: sccsA$v51sq=sccsA$v51^2 addesc("v51sq", "v51sqName") Otherwise each is simply named "sccsA$v51" and "sccsA$v51sq" in distinct evm, iv and riv lists. In the online windows of CoSSci, an “sq” prefix creates a squared variable, e.g., sqv277, so long as the counterpart v277 is in a separate window. In this way “sq” dependent or variables may be entered or that ones that appear in h[8] “ToTry” may be entered as independent variables. When encountering an error the user must search for a given variable number to verify that it is present in the proper riv, iv, or evm sequences. Matching the functions that write riv and the totry on your screen will help identify whether the model has executed correctly. The csv output which show whether high VIF (variable inflation factors) are present, as may occur with one or more of 10 individual complexity variables v149-158 and their average value, measured in variable 158.1. In general, pairs of variables that overlap in meaning must be avoided. Variables in a Guttman scale and the Guttman scale itself cannot occur in the same model, and similarly where one variable is a composite of another (e.g., v675 and v674 used to jointly define v676). In use of R scripts it is advisable to begin any new model from a successful DEf script template as the template already in use. Using the DEf network lag model in NoVA 8 8 Still to be answered: It is possible that the network lag model single equation regression has never been used previously in a path analysis requiring several interrelated equations: this reflects current status of statistical advances. v 19 Garson (2012:5) notes that “path analysis is an extension of the regression model,” so there is no inherent reason that this should not apply to the network lag model single equation regression, i.e., two-stage least squares (2sls) DEf. Basically, the Wy-hat = a + WXb variable that is used in the 2sls estimation would need to be acknowledged in the path diagram for each endogenous variable, since these would be used in each required regression of all endogenous variables in the path diagram. What is explicitly required is that the various correlated error terms between all endogenous variables in a NoVA are corrected by appropriate equations (Wikipedia: Seemingly unrelated regressions), an extension of standard regression with the corrections required by path analysis. (In addition, some Xs used in the WX terms for earlier endogenous variables may also appear in later equations for endogenous variables, thus independence of errors may be violated if corresponding estimates are biased and inconsistent. Those kinds of corrections can be solved by SUR methods (Henningsen and Hamann 2007) as to whether the WX terms in the model induce a set of new “spurious” correlations between endogenous variables, in addition to any that were the result of the proposed ordering of relations between variables in the path diagram or endogeneities induced by nonindependent error terms of each dependent variable. For evaluating which variables in Tables 4-7 might be non-significant do to repeated analyses, Holm (1979) provides a sequentially rejective multiple significance procedure to test whether a series of statistics, like the significance of the 27 regression coefficients (18 in Diagram 4, others not shown) derived from DEf results in the Tables used for Diagram 4. These can be extrapolated to Tables 4-7 with only one variable (6: v819, p=.095) to drop. All of the 18 coefficients significant at p<0.01 test as significant (100%) in these group significance tests. An additional five test as significant at p<0.05 (91%, e.g., those at p=.014 and below). Another five pass as significant at p<0.10 (85%, e.g., those at p=.05 and below). The least significant variables in Diagram 4 (0.10>p>0.05) are v154 Land—>v621, v673 Sex of Creator Figure—>v676, and v162 Foreplay—>v51; not on the Diagram is NoPlowCplx—>v626. There is thus strong evidence that 15 of 18 arrows in Diagram 6 are highly significant both in DEf results and in the NoVA structure. Table 8: Holm-Bonferoni tests: Wikipedia:Holm–Bonferroni_method#Example Definitions a=Set sig.test level m=no.at this level i=no. to pass test p=prob.where passed b=a/(1+m-i)-p #neg Percent passing test Tables 4-7 a=.01 m=17 i=17 p=.008 b=a/(1+m-i)-p 100% Tables 4-7 a=.05 m=22 i=20 p=.014 b=a/(1+m-i)-p 91% Tables 4-7 a=.10 m=27 i=23 p=.051 b=a/(1+m-i)-p 85% Given the statistical structures implied by these findings, only Land Transport in Diagram 4 (p=0.097) and v819 (p=0.093) from Diagram 5 should be dropped. All other variables in Tables 4-7are significant at p<.004. DEf, NoVA, Path analysis and SEM It is useful to define an ordered succession of five modeling approaches: 1) DEf models, 2) exploratory analysis of multiple DEf models, 3) path analysis of multiple DEf models with simple observed variables. The focus here is on 2) and 3), NoVA for networks of variables analysis with simple observed variables, and the further requirements for path analysis. Path analysis Garson (2012) may be viewed as an v 20 extension of multiple regression models (NoVAs), with a warning that some variables may become endogenous because paths interconnect (e.g., in a directed asymmetric graph). Even if variables in each component regression model are exogenous, i.e., uncorrelated with their error term, their respective error terms may be intercorrelated. In these seemingly unrelated regressions (:Wikipedia), or SUR, formulas for restoring exogeneity are required. These adjustments may be solved by simultaneous equations, as in the R systemfit package (Henningsen and Hamann 2007), for which Wright’s (1921, 1923, 1934) simultaneous equation models (SEM) provided a prototype. Without SEM, however, as noted by Garson (2012:5): "path analysis can be accomplished as a series of multiple linear regressions, one for each endogenous variable. This method yielded standardized regression coefficients (beta weights) and a R-squared goodness-of-fit for each endogenous variable, but did not yield an overall goodness-of-fit for the model,” as does SEM. Garson notes that while "SEM typically centers on latent variables, it is possible to model simple observed variables. [Further:] When only observed variables are included in the model, the researcher is conducting a path analysis." Results will still be tentative because R-squared goodness-of-fit is a relative concept that cannot establish that a model is "correct” (Garson 2012:5). The MCMCpack R library package has the capability of comparing alternative models for best fit. Again, this can be applied to fully imputed DEf output data but could be run with data that is imputed for missing values. The MCMCregress() function is analogous to the linear regression lm() function but the procedure followed is to take a finished DEf model, create a series of variants M1 to Mlast that drop one variable at a time, and use MCMCregress(M1) to MCMCregress(Mlast) functions in a series, each with one variable deleted. The function BF <- BayesFactor () is then used for this series of comparisons (M1, ... , Mlast), and summary(BF<-) to calculates comparative goodnessof-fits for the alternative regression models, dropping one variable at a time. This stepwise approach to measuring type II error offers goodness-of-fit evaluation for the rich cumulative inventory of simple observed variables in cross-cultural research. Like SEM, systemfit (Henningsen and Hamann 2007) also has the capability of comparing covariance matrices of alternative models for best fit. The options of systemfit include not only SUR path analysis but may be also combined with time series panel data, and instrumental variables. These models can be run with Eff’s function “aa” that recovers the average of imputations for independent variables, X, in equation 1. Taking the Est(Wy) or "Wy-hat" estimate from first-stage analysis, (Wy) = α0 + ∑i=1,n(WXi) + ε Equation (1) Est(Wy) = α0 + ∑i=1,n(WXi) Equation (2) and the second-stage DEf linear equation: y = β0Est(Wy) + β1+ ∑i=2,n+1 (βiXi) + ε Equation (3) In the 2013 release of DEF2, “Data used are output (so that Wyhat and Wy...)” are now available, including each component XW of the autocorrelation predictors. The mathematical validity of the SUR procedure, as attested by Garson, is thus soluable with the systemfit package for multiple equations such as (3) provided they are mathematically soluable. Equations of type (3) provide models for each equation in a Network of Variables Analysis (NoVA) that can be jointly solved with SUR to arrive at the full exogeneity of a path analysis. β0 is simply the regression coefficient for Est(Wy) in second stage DEf regression, and X the set (aa) of imputed independent variables v 21 Given the rich inventory of simple observed variables in CCR, and surprisingly, SEM is less viable than path analysis. This is confirmed in the Amos SPSS program website article (http://spss.wikia.com/wiki/SEM_%28structural_equation_modeling%29_-_Amos): “Warning: It may seem odd to begin with a warning, but the popular misuse and misinterpretation of Structural Equation Modeling is so widespread that users of this wiki should be aware of some of the issues involved before they begin. While this warning is overly brief, you can follow-up these issues and more in the Further Reading section of this article. A number of these issues also apply to Confirmatory Factor Analysis. While Structural Equation Modeling has been popular in recent years to test the degree of fit between a proposed structural model and the emergent structure of the data, the perceived superiority of the technique is waning. Aside from the fact that the results of Structural Equation Modeling are often poorly reported, the conclusions drawn do not typically grasp the limitations of the technique. The most obvious, and some ways the most critical issue is that of incorrectly inferring a particular configuration of causal relationships from correlational data. This mistake can be illustrated with the simplest of all structural examples – that of 2 variables (variable A and B). If we ignore the additional complexity of latent structure, the number of possible causal structures is 4. Clearly, the number of possible models grows exponentially as the number of variables grows. In this example, the 4 possible causal models in this example are: A causes B; B causes A; A and B cause each other; finally, A and B are unrelated. If A and B are indeed significantly correlated, it is likely that the first 3 models will be supported by significant fit statistics. If this is the case, what has been proven? Which of the 3 supported models is the correct model? What makes matters worse is that we have not even conclusively ruled out the last model. It is still possible that the correlation between A and B was spurious. To reinforce a maxim that most people know, but fail to apply to Structural Equation Modeling – you cannot determine causation from correlation. Yet in most cases, researchers only test one or two models out of all the myriad of potential models, poorly report their results, then proclaim confirmation of their model (implying the exclusion of all other possible models). So what is the value of Structural Equation Modeling? If large correlational datasets are already available, and a large range of plausible models are assessed, the results can be valuable in conceiving an experimental study that can test the proposed causal relationships.” Goals for DEf and NoVA A primary objective of the Companion is how to use DEf for survey data, including cross-cultural analysis, and to provide examples of new findings with DEf. Uses and examples of NoVA are a secondary but no less important objective. Both are guided by the concept of developing and testing theory in various stages of research that begin with extensive study and reading of prior studies to develop hypotheses based on theory and theoretical insights. (For students, this may take the first weeks of a class.) Exploratory use of DEf is typical of the next stages of research. This involves exploratory analysis on choice of dependent and independent variables, the latter divided into a larger “unrestricted” set of potential independent variables and a “restricted” set used in a particular regression. DEf results will help to identify those variables that are significant, those that have low variable inflation (high VIF means that two or more “independent” variables are collinear or already correlated), and those that pass Hausman tests for exogeneity. Summary features of the model given by DEf include R2, tests on presence of nonlinear rather than linear effects, normality and homoscedasticity in the residuals, and whether coefficients of excluded variables are nonsignificant. Models with larger R2, however, are not necessarily the best, so MCMCpack regression comparison scripts like MCMCregress may prove useful. v 22 NoVA models are more complex than single dependent variable DEf in that they link DEf results into networks of related variables. This can be done with or without tests for exogeneity between the error terms of component DEf models. These tests do not alter a network of variables but are used to correct for endogeneity (a variety of Galton’s problem) among variables in the constituent DEf models. Linking different NoVA diagrams with shared Variables *+* (=Eff) The Companion editors are skeptical in general about inferring causation from synchronic cross-cultural data even in the DEf (Dow-Eff functions) approach or NoVA with DEf. There is much more structure in these graphs, however, than in factor analysis, scaling models and correspondence analysis. They allow hypotheses to be more clearly focused about what may be temporal processes, and they may allow competing hypotheses to be brought to the fore even in the manner in which “multifocal” mappings of results pose alternatives. In this chapter “causal” has been a word largely used by Reiss for his Diagram 1. A nice example of causality, however, might be provided in the example about the relationship between inheritance rules and post-marital residence, and how a direction of causality was found for a specific society (the Chuukese) by examining detailed genealogies and residence choices. For ethnology, one society at a time, using diachronic data, may be one way causality can be established (Vayda and Walters 2001), or through a whole database with diachronic data (Turchin et al. 2012). On the other hand, visualization methods like NoVA can help form good hypotheses about causality. This reflects views common to the Companion editors. The Appendix (Causal Graphs) shows the state of the art in evaluating causal processes in medical experiments with databases of contexts, treatments and outcomes. The DAGitty software illustrated, however, can also be used for drawing NoVA diagrams although they may contain no causal inferences but simply DEf regression results. NoVA diagrams are not recommended for regression coefficients other than DEf because of the lack of controls for autocorrelation. *+*The Companion editors do concur in an interest in accumulating the results of interrelated NoVA models containing clusters of DEf models estimated on the CoSSci Supercomputer Gateway and their usefulness in online-course sites. These clusters may give insight into structural/functional or other relationships and provide a kind of metaanalysis, given higher-order statistical tests for path analysis. A logical next step would be to try to apply meta-analysis methods—R packages meta, rmeta, and psychometrics. Further new approaches can be very valuable, and NoVA representations of CCR modeling could be valuable for future research insights. NoVA is a useful visualization technique that can help frame research agendas in cross-cultural research. But it requires good data (Handwerker 2011) as inputs and good models, and it would provoke controversy to describe such diagrams as depicting causal relations. *+*That said, NoVA is in the end dependent for its value on the estimations that are fed into it. A poorly specified model will provide biased estimators and a misleading NoVA graph. It is important to think carefully about the ethnographic context and relationships among the variables when constructing each separate model with a given dependent variable and how such models fit together. The model diagnostics provide some good feedback as to whether or not the model is well specified, but that is not enough. For example, both omitted variables and/or controlling for intermediate/alternative outcomes cause estimated coefficients to be biased, and neither of these problems has a clear diagnostic test. Nevertheless, one can imagine that models produced on the portal can be selected based on the diagnostics, and a v 23 researcher can then select from this smaller set, so that the results for the best models can be used in NoVA. A key feature of signed NoVA Diagrams (like those of Diagram 4) is that maximal sign-consistent clusters are defined by a positive product of signs along statistically significant DEf paths within the cluster. A consistent cluster for Diagram 4 is defined by v51 (Mother Uninvolved), v53 & v54 (Father Distant), v621 (Husband Dominant), v626 (Beliefs in Female Inferiority), v628 (Women do not control their own property), v826 (Female Contribution to Subsistence), v664 (Machismo), and v676 (Male Origin Symbols). Hunting large game does not fit this pattern in Diagram 4 because of its positive relationship with v628 (Women’s control of their property) but significant negative relationship (r=-.18, p < .05) with v51 (Mother’s helpers). Nor does large game hunting correlate positively with treat their wife with equality, eat meals with her regularly, or regularly sleep in the same bed, variables v615, v752, and v750). Conclusion DEf (Dow-Eff functions) are transformative for cross-cultural research but *+*(=Eff) the editors have no wish to oversell NoVA. It requires good models as inputs. It is a useful visualization technique that can help frame research agendas in crosscultural research but it would provoke controversy to describe Diagrams 3 or 4, for example, as depicting causal relations. NoVA studies with “augmented” datasets (e.g., STCS/SCCS), however, are capable not only of rejecting models but replacing them with improved understandings of theoretical and empirical issues tested at various levels of comparison or various additional context variables. A principal objective of DEf, NoVA, and other approaches covered in this chapter and the Wiley Companion book is to help improve the quality of existing and future CCR by examining major and minor comparative studies when they are supplemented by DEf solutions to missing data and Galton’s problem and further advanced approaches to modeling. Further use and improvement of NoVA and other models that use DEf methods facilitate restudies of any of the thousands of publications that use SCCS or other open databases. Ethnographic studies themselves cannot be so easily replicated as can cross-cultural research but the both will grow in importance as more is gleaned through innovative methods of data collection and analysis. Ethnographers should distrust CCR studies when they clearly contradict what can be learned directly in ethnographic field studies. The middle section of this Chapter has explored in detail a previous study (Reiss 1986) that exemplifies this problem. Reiss claimed an evolutionary “causal” ordering of variables in what was clearly an anomalous attempt to explain multisequential “origins” of Beliefs in Female Inequality. The “theory” conjured up ethnographic settings unsupported by known ethnographic cases or theoretical understandings, and the author may have read little of the primary ethnography and have little knowledge of prior work with the SCCS comparative dataset. His work was guided by imagining ways that correlations among variables in the SCCS might be interpreted. Rather like a student taking on a project in a CCR class, exploration of the topics involved in modeling problems related to the actual contexts of Reiss’s work (agriculture, plow, gender roles, gender of parenting, etc.) were undertaken as a NoVA project to understand alternatives to Reiss’s theory and modeling, starting with review of the literature on these topics, review of variables in the SCCS codebook, and followed by use of standard exploratory tools in R. This provided preliminary knowledge about the variables that could be employed to ground exploratory research on ethnographic domains within which sets of variables were related. These explorations used empirical v 24 correlations, significance tests, cross-tabs, and regressions but without imputing missing data or controls for autocorrelation, leaving open the questions of whether some correlations were spurious (inflated) or underestimated. To explore and design an expanded Networks of Variables Analysis (NoVA) approach, a preliminary theoretical model was outlined. The approach was to (1) start not with a dependent variable but antecedent variables, early in time, such as the rise of agriculture (2) include understandings that been omitted by Reiss, such as the importance of the plow in agriculture and its effect on gender relations (3) consider a broader set of variables dealing with gender relations and gender issues and (4) consider variables to help interpret the contexts of relationships among variables in a statistical model. The approach to the modeling was then expanded to exemplify (5) a pragmatic focus on establishing robust and relevant contexts for theoretical inferences, (6) much more accurate regression results when combined with the Dow-Eff (DEf) statistics that control for autocorrelation and estimate missing data with probabilistic inferences, and (7) a much broader set of variables to use of multiple regressions in building an NoVA model capable of pinpointing context effects that help in testing hypotheses and testing results. The results showed how context variables could aid in making interpretations from statistical data. This example of a research process aims at far better outcomes from CCR than those that rely on correlations alone. The complete replacement of Reiss’s (1986) correlation-based model, for example, resulted in one that is far more sensitive to ethnographic context and relational inferences about gender issues. The inclusion of direct or indirect regression effects through NoVA models provides conceptualization of relevant variables that help to explicate and support the usefulness of tailoring comparative analysis toward inclusion of context variables that contribute to specific insights into how structural/functional relationships might conceivably operate, rather than relying only on an investigator’s imagination. The modeling strategy employed in this new NoVA project helped to focus on the problem of whether one of Reiss’s “causal” variables (v51, Non-Maternal Relationships in Infancy), which he had thought to involve “Mothers helper” females in infant care, showed clearly that these “female helpers” were extremely unlikely contributors to beliefs in “Female Inferiority,” as was inferred by Reiss. As opposed to what had had only assumed, “others providing care” for the mother’s offspring, the NoVA approach that generated Diagram 4 showed clearly that something other than female weakness was as work in the role of fathers. That is, the focus of Reiss’s modeling in his Diagram 1 was restricted to interpreting a context of mothers giving less care to their infants even while fathers were closely engaged with their children at a later age. Father/Child Closeness (v54) was thus misinterpreted by Reiss as part of a male superiority/female inferiority complex associated with lesser infant care by mother (and thought by Reiss to imply weak maternal care generally). Opening the larger context of an NoVA study supplied the ground to identify Reiss’s inferences as invalid. The more detailed NoVA modeling that supplies more of the ethnographic context along with other variables gave a new image of relations of husbands and wives across the full range of variability in the 186 SCCS society sample. Tables 2, 3B, 4 and Diagrams 3 and 4 showed that it was not other women but fathers who were likely to be “Mothers helpers.” Fathers, in addition to being “close” to their children, were shown in the context of NoVA diagram to tend, statistically, to treat their wife with equality, eat meals with her regularly, and regularly sleep in the same bed, as measured by explicit coded variables v615, v752, and v750 (defined in the SCCS codebook). One could infer with little doubt that husbands were very likely to be helping their wives with infant care, negating the core of Reiss’s assumptions. This finding lead to exploration, in the v 25 fuller NoVA study, of the relation of v621, “No Husband Dominance,” as a predictor of both “Father/Child Closeness” and “Belief in Female Equality” in DEf regression analysis with controls for Galton’s problem of over- or under-estimation of regression effects. Reiss’s mistaken inferences, then, could not be clearer, and alterations of his model could hardly be more robust statistically. Further, the SCCS Guttman scale measure of levels of deference of wife to husband, SCCS Variable v615, examined in the NoVA regressions, was not examined in the Reiss’s original study, even though it and “Belief in Inferiority” variable were coded by Whyte (1978). This was a major oversight, The NoVA study found that it predicted a lack of Father’s Closeness that, given context variables, pointed directly to Father as among the “Mother’s helpers.” That is, equality of Husband and Wife proved to be associated with Father’s closeness to their children. Thus, Father would normally “Mother’s helper” with infants when there is equality of husband and wife. The measure of equality was a DEf predictor of “Belief in Equality of Women. The point of NoVA modeling is not to extend indefinitely the study of indirect effects (like v621 and others) for their own sake but to examine and identify the contexts of mutuality among multiple variables that predict central dependent variables, such as origins of “Belief in Female Inferiority” versus “Equality”, v621. As an alternative to Reiss’s correlational approach, which is more like interpreting patterns in a Rorschach test, NoVA diagrams can gave clarity about links among variables. For example, “Father Closeness to Child”, once understood as linked to “Mother’s helpers,” “Closeness of Husband and Wife” and “Lack of Husband Dominance”, leads to further inferences about egalitarian family systems in this cluster of variables. In this example of a new NoVA approach to a previously published model resulted in the need to relabel Reiss’s independent variable (simply for clarity), remove others that lacked statistical significance given controls for autocorrelation, and add a number of variables that contributed to understanding a new and very different model that shows coherent prosociality in child training, gender roles, and beliefs in egalitarian family systems with close husband/wife/infant/child families. Instead of an improperly tested and poorly conceptualized misogynistic theory of a conspiracy of women aligning in “patriarchies” that devalue women with women’s consent and an assumption of widespread lack of caring by women for their own children – thought to accompany Beliefs in Female Inferiority – the broader and better conceptualized NoVA model gives radically different results. Hopefully, some of the approaches exemplified here will become part of those kinds of research that help to establish more trust and convergence of CCR with ethnographic knowledge. What will undoubtedly occur, beyond the NoVA example in Diagrams, is that as databases begin to be analyzed with STCS types of data and tests for reliability, robustness, exogeneity of variables (and hopefully, cross-validation or replication with other datasets), better models will begin to be developed in the CCR literature. What is opened up with the combination of DEf and NoVA approaches is the realization that every past as well as future CCR study with accessible databases is open to review, retesting, revision, and replacement of models. The importance of relational contexts in cross-cultural studies is a key element in responding to the postmodern critique that CCR deals only with features taken out of context. Networks of Variables, expanding on constituent DEf regression models, help to establish relational interpretations and context.9 In NoVA Diagram 4, for example, the 9 The same is true for understanding Galton’s problem as study of the effects of spatial networks, of common histories and processes of interdependence between societies. v 26 variables and arrows of the diagram show v621 “Husband’s Dominance” as a probable direct cause of “Beliefs in Female Inequality” and v51 as both a direct and mediated cause, along with v54. Logical and intuitive expressions of relationships among variables in the model match much of what we know about specific types of societies. These contextualized relationships contrast with the basis for Reiss’s interpretations, grounded on very limited data from which he argued incorrectly that “lack of help from others” in Mother’s care of her infants was a debility and weakness complicit with patriarchy. To the contrary, the more careful NoVA investigation of linkages among SCCS variables provide strong evidence pointing to help from Fathers in egalitarian family systems as part of a prosocial family organizational pattern that supports belief in Female Equality. This chapter goes into ethnographic detail as reflected by more intensive use of coded cross-cultural data. It does not argue that CCR forms a discipline with assumptions that separate it from other disciplines. Rather, like the transcience concept of disciplinary integration (e.g., Krakauer 2011), NoVA is defined as an approach not by the object of study (given features and names and intrinsic “features” as in postmodernist critiques of CCR) but by identifying analogous structures and dynamics, recognizing that all living organisms have evolved rules for interpreting signals and their meanings and deploy strategies based on context and networks of meaningful communication. Biologists, like anthropologists, linguists, and many others, recognize that variables that are linked through context do affect meanings and likewise our understanding of meanings compared across different cultures. Theories of interpretation of “how meaning and behavior interact” change with new insights gained from studies based on both observation and the emergence of new theories of interpretation. The most useful approach to CCR is not based on blindly “comparing” a collection of traits across cultures but on recognition and analysis of complex relationships within and between cultures and their environments. The title of Binford’s opus on foragers begins, appropriately: “Constructing frames of reference....” CCR need not be condemned to “butterfly collecting,” nonwithstanding Edmund Leach’s view of decades ago.10 Advances in comparative research benefit, possibly massively, from complex statistical approaches shared with other sciences and from insights on how to progress in analyses as in the example studied here. DEf combined with NoVA modeling can be shared broadly with anthropologists, sociologists and others in the human and environmental disciplines. Simple insights emerged in the studies undertaken as examples of NoVA constructions in this chapter. What seemed like difficult choices about what independent variables to include in successive DEf models, how they should be linked, the stability of DEf results and how to ground modeling in basic theoretical understandings gave way to practical techniques (follow “to try”) and pay full attention to (“follow the Bayesian order of things” in) the understanding of constructs. Postscript: Rapid discovery Leach’s (1976) Radcliffe-Brown lecture gave homage (Pinney 2001) in “a famously barbed attack on Radcliffe-Brown's butterfly collecting ("at best defective, at worst fraudulent") which elaborates a critique he had been developing since the late 1950s. The bombast is wearisome, but the analysis of structuralfunctionalism's reliance on impossibly bounded cultural isolates remains vividly fresh and resonant with anthropology's ongoing struggle to live inside the fragments of the prison house of culture that it has created.” 10 v 27 Turchin, Whitehouse, Francois, Slingerland, Collard (2012:285-286) are consonant with the views expressed here about new developments in Cross-Cultural Research: “Some years ago Randall Collins (1994) pointed out that the natural sciences are typically characterized by rapid discovery of new phenomena and a high degree of consensus among the practitioners, once the research front has moved away. For example, biologists generally agree that Darwin’s version of the evolutionary theory has decisively won over that of Lamarck. The social sciences, by contrast, exhibit low levels of consensus even on core issues. Each new generation of scholars is as likely to reject the ideas of their predecessors as to endorse them. Collins argued that rapid discovery and high consensus are related: “high consensus results because there is higher social prestige in moving ahead to new research discoveries than by continuing to dispute the interpretation of older discoveries.” Although Collins was skeptical about the ability of social science to break out of this mold and transform itself into a rapid-discovery science, we think that he was unduly pessimistic. Consider the anthropological database called the Standard Cross-Cultural Sample, or SCCS (Murdock and White 1969). The SCCS codes 186 cultures for a great variety of social, economic, and political variables. The introduction of this database was a truly transformative event in cross-cultural research. It held out the prospect of transforming cultural anthropology into rapid-discovery science characterized not by cyclic development (in which each new generation rejects the insights of their elders) but by a cumulative growth of knowledge. The SCCS made knowledge accumulation possible in at least two ways. First, although Murdock and White initially coded only a few dozen variables, over the last four decades other researchers added hundreds of additional variables. The total count currently approaches 2000 variables.” The construction of Diagram 5 in the “Integral NoVA” sections of this chapter illustrates a modeling approach with potential suggestions for rapid discovery following “totry” and “didwell” evaluations given by DEf is reaching models that reach equilibrium for “best R-squared” and “best significance,” although these equilibria may be only local and goodness-of-fit will require testing with Bayesian statistics. Appendix (Graph Drawing for Science 2.0). As of February 2013, graph-drawing for a subset of meaningful links in a NoVA directed asymmetric graph (DAG) is provided by the DAGitty http://www.dagitty.net/ web program (Textor, Hardt, and Knüppel 2011) with instructions at “launch” and “How to....” (Diagram 5). The on-line program can distinguish potential causal paths from biasing paths (as in Diagram 6), as in diagrams of contexts of exposures to agents, exposures, and outcomes in medical experiments. Diagram 5 shows the opening dynamic DAGitty screen with instructions for making a DAG. Once a DAGitty graph is constructed, however, as in Diagram 6, it can be saved with screen shots. The examples at http://intersci.ss.uci.edu/wiki/index.php/DAGitty#From_DAGitty show a variety of Mac screen shots saved as *.png files that can be inserted into a Word document or wiki. v 28 Diagram 5: The DAGitty graph-construction page, with instructions Diagram 6: V626 Graph constructed by DAGitty graph-construction A key notion (Wikipedia:Moral graph) of directed asymmetric graphs is that connecting nodes that have a common child, and then making all edges in the graph undirected to form its moralized counterpart. Equivalently, a moral graph of a directed acyclic graph G is an undirected graph in which each node of the original G is now connected to its Markov blanket. The name stems from the fact that, in a moral graph, two nodes that have a common child are required to be married by sharing an edge. ACKNOWLEDGEMENTS Thanks to Halbert White and Judea Pearl for the impetus to begin this project, to Malcolm Dow and Anthon Eff for providing an R platform on which to build, to Kenneth Koput for elements of its final statistical completion, and to Scott White and members of our SFI working group for help in experimentation. The project would not have been possible without support of the Santa Fe Institute and its faculty and postdocs through a series of Workshops for project members, meeting at SFI these times Mar. 26 - Apr. 1. 2012, Jun. 16 - July 2. 2011, and Aug. 20 - Sep. 5, 2010. Thanks to all the members of the SFI-sponsored workshops: Peter Turchin, John Snarey, Henry Wright, Scott White, B. Tolga Oztan, Giorgio Gosti, Elliott Wagner, and Ren Feng. Thanks to the Leipzig Max Planck Institute for Mathematics in the Sciences and its Director Jürgen Jost, Professor, Nihat Ay and their Institute members for support of our working group meeting (Jun. 16 - July 2. 2011), including Tolga Oztan, Giorgio Gosti, and Ren Feng. Starting in 2012 the v 29 project received great impetus from Suresh Marru for initial guidance in Science Gateways, to Robert Sinkovits and Nancy Wilkins-Diehr at UCSD’s San Diego Supercomputer Center, to Thomas Uram at Argonne Labs for the programming to establish the CoSSci Supercomputer Gateway, and to John Saska and Francisco Lopez at UCI’s OIT for implementation of a Galaxy Server on which to install the CoSSci Gateway. REFERENCES (many are now obsolete and will go into another chapter) Alesina, Alberto, Paola Giuliano, Nathan Nunn. 2012. On the Origins of Gender Roles: Women and the Plough. forthcoming in: The Quarterly Journal of Economics, 2013. http://www.econ.northwestern.edu/seminars/Nemmers11/Giuliano.pdf Barry, Herbert, III and Leonora M. Paxson. 1971. Infancy and Early Childhood: CrossCultural Codes 2. Ethnology 10:466-508. Reprinted in Herbert Barry III and Alice Schlegel, Eds. 1980. Cross-Cultural Samples and Codes. Pittsburgh: University of Pittsburgh Press. Barry, Herbert, III, and Alice Schlegel. 1980. Cross-Cultural Samples and Codes. Pittsburgh: University of Pittsburgh Press. Bernard, H. Russell (Ed.). 1998. Handbook of Methods in Cultural Anthropology. Walnut Creek, CA: Rowman & Littlefield. Batagelj, Vladimir, and Andrej Mrvar. 2003. Pajek - Analysis and Visualization of Large Networks. In, Jünger, M., Mutzel, P., (Eds.) Graph Drawing Software. Springer, Berlin. p. 77-103. Binford, Lewis R. 2001. Constructing frames of reference: An analytical method for archaeological theory building using hunter-gatherer and environmental data sets, University of California Press. Borland, Dolores M. 1975. An Alternative Model of the Wheel Theory. The Family Coordinator 24(3): 289-292. http://www.jstor.org/stable/583179. Braudel, Fernand. (1998) Mediterranean in the Ancient World, London: Allen Lane. Broude, Gwen, and Sarah J. Greene. 1976. Ethnology 15:409-429. Brown, Christian, E. Anthon Eff. 2010. The State and the Supernatural: Support for Prosocial Behavior. / Structure and Dynamics: eJournal of Anthropological and Related Sciences, 4(1) art 1. (eJournal). Burton, Michael L. and Karl Reitz. 1981. The Plow, Female Contribution to Agricultural Subsistence and Polygyny. Cross Cultural Research 16 (3 & 4): 275-304. Carpi, Anthony, and Anne Egger. 2011. The Process of Science, Revised Edition. Process of Science Series. Coltrane, Scott. 1992. The Micropolitics of Gender in Nonindustrial Societies. Gender & Society vol. 6 no. 1: 86-107. URL http://gas.sagepub.com/content/6/1/86.short Croissant, Yves, and Giovanni Millo. 2007. plm: Linear Models for Panel Data. R Package URL http://CRAN.R-project.org. http://cran.rproject.org/web/packages/plm/plm.pdf Croissant, Yves, and Giovanni Millo. 2008. Panel Data Econometrics in R: The plm Package. Journal of Statistical Software 27(2):1-43. http://www.jstatsoft.org/v27/i02/paper Divale, William. 1976. Female Status and Cultural Evolution: A Study in Cultural Evolution. Cross-Cultural Research 11:169-212. Dow, Malcolm M., and E. Anthon Eff. 2009. Multiple Imputation of Missing Data in CrossCultural Samples. Cross-Cultural Research, Vol. 43, No. 3, 206-229. (eJournal). v 30 Dow, Malcolm M., and E. Anthon Eff. 2013. Basic Concepts in Cross-Cultural Research. Chapter 1, Wiley Companion to Cross-Cultural Research. Douglas R. White, Anthon E. Eff, Malcolm M. Dow, and J. Patrick Gray, Eds. Blackwell Wiley Press. Eff, Anthon, and Malcolm M. Dow. 2009. How to Deal with Missing Data and Galton's Problem in Cross-Cultural Survey Research: A Primer for R. Structure and Dynamics: eJournal of Anthropological and Related Sciences 3#2: 223-252. (eJournal). eHRAF, 2013. Human Relations Area Studies: Cultural information and research. eHRAF World Cultures. http://www.yale.edu/hraf/collections Ember, Melvin. 1971. An Empirical Test of Galton’s problem. Ethnology 10: 98-106. Ember, Carol R., and Melvin Ember. 1998. Cross-Cultural Research. Chapter 17, pp. 595687, in, H. Russell Bernard (Ed.) 1998. Handbook of Methods in Cultural Anthropology. Walnut Creek, CA: Rowman & Littlefield. --------- 2001. Cross-Cultural Research Methods. Walnut Creek, CA: AltaMira Press. Frayser, Suzanne G., and Thomas J. Whitby. 1995 (2nd edition). Studies in Human Sexuality: A Selected Guide. Libraries Unlimited, Inc. Garson, David. 2012. Path Analysis. Statistical Associates Blue Book Series. Asheboro, NC. Statistical Associates Publishing. Geertz, Clifford. 1983. From the Native's Point of View: On the Nature of Anthropological Knowledge. Chapter 3, Local Knowledge: Further Essays in Interpretive Anthropology. New York: Basic Books. Goodenough, Ward H. 1970. Description and Comparison in Cultural Anthropology. The Lewis Henry Morgan Lectures. Alfred. Harris. Handwerker, W. Penn. 2011 How to Collect Data that Warrant Analysis. Pp. 117-130, Chapter 7, in, David B. Kronenfeld Giovanni Bennardo, Victor De Munch, and Michael Fischer, Eds. Blackwell's Companion to Cognitive Anthropology. Oxford: WileyBlackwell. Harris, Marvin. 1976. History and Significance of the Emic/Etic Distinction. History and Significance of the Emic/Etic Distinction. Annual Review of Anthropology 5: 329350 Henningsen, Arne, and Jeff D. Hamann. 2007. systemfit: A Package for Estimating Systems of Simultaneous Equations in R. Journal of Statistical Software 23(4), 1-40. URL http://www.jstatsoft.org/v23/i04/ (eJournal). Holm, Sture. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6 (2): 65–70. http://www.ams.org/mathscinetgetitem?mr=538597. Hupka, Ralph B. and James M. Ryan. 1990. The cultural contribution to jealousy: Crosscultural aggression in sexual jealousy situations. Cross-Cultural Research, 24, 51-71. Kemper, Robert V. & A. Peterson Royce (eds.). 2006. Chronicling Cultures. Long Term Field Research in Anthropology, Altamira Press, Walnut Creek, Lanham, N.Y. Korotayev, Andrey, and Victor de Munck. 2003. “Galton’s Asset” and “Flower’s Problem”: Cultural Networks and Cultural Units in Cross-Cultural Research. American Anthropologist. 105: 353–358. Krakauer, David. 2011. Transcience Disciplines and the Advance of Plenary Knowledge. SFI Bulletin volume 25. Kronenfeld, David B. Kronenfeld, Giovanni Bennardo, Victor De Munch, and Michael Fischer, Eds. 2011. Blackwell's Companion to Cognitive Anthropology. Oxford: Wiley-Blackwell. Leach, Edmund. 1976. The British Academy Radcliffe-Brown Lecture: “Social Anthropology: a Natural Science of Society?” London: Proceedings of the British Academy 62. v 31 Leaf, Murray. 2007. Empirical formalism. Structure and Dynamics 2(1):804-824. (eJournal). --------- 2009. Human Organizations and Social Theory: Pragmatism, Pluralism, and Adaptation. University of Illinois Press. Levinson, David. 1991-1996. Encyclopedia of World Cultures. Berkshire Encyclopedia of World Cultures. Vols. 1-10. Boston, MA.: G. K. Hall. ––> Levinson's Encyclopedia of World Cultures Volume 10:78-322 has in the Volumes 1-9 of Col22 thousands of coded variables for the societies in Col23. URL http://eclectic.ss.uci.edu/~drwhite/worldcul/Sccs33B.htm Loftin, Colin. 1972. Galton's problem as spatial autocorrelation: Comments on Ember's empirical test. Ethnology 11:425–35. Martin, Andrew D., Kevin M. Quinn, Jong Hee Park. 2011. MCMCpack: Markov Chain Monte Carlo in R. Journal of Statistical Software. 42(9): 1-21. (eJournal). Murdock, G. P., and Diana O. Morrow. 1970. Subsistence Economy and Supportive Practices: Cross-Cultural Codes 1. Ethnology 9:302-330. Reprinted in Herbert Barry III and Alice Schlegel, eds. 1980. Cross- Cultural Codes and Samples. Pittsburgh: University of Pittsburgh Press. Murdock, George P., and Douglas R. White. 1969. Standard Cross-Cultural Sample. Ethnology 8(4):329-369. URL http://www.jstor.org/stable/3772907 new edition 2008. Naroll, Raoul. 1964. On Ethnic Unit Classification. Current Anthropology 5(4): 283-312. Paige, Karen Ericksen, Jeffery M. Paige. 1982. The Politics of Reproductive Ritual. Berkeley: Pearl, Judea. 2009 (1st edition 2000). Causality: Models, Reasoning, and Inference. Cambridge: Cambridge University Press. Pinney, Christopher. 2010. The Essential Edmund Leach. Times Higher Education 7 December 2001. Pryor, Frederic. 1985. The Invention of the Plow. Comparative Studies in Society and History, 27 (4): 727-743. Pryor, Frederic L. 2005a. Rethinking Economic Systems: A Study of Agricultural Societies. Cross-Cultural Research 39 (3): 252–292. URL http://ccr.sagepub.com/cgi/content/abstract/39/3/252. Pryor, Frederic L. 2005b. Economic Systems of Foraging, Agriculture, and Industrial Societies. New York: Cambridge University Press. University of California Press. Quine, Willard V. 1973. The Roots of Reference. La Salle, IL. Open Court Publishing. Reiss, Ira L., 1960. Premarital Sexual Standards in America. The Free Press. Reiss, Ira L., 1983. "Trouble in Paradise: The Current Status of Sexual Science," The Journal of Sex Research 18(2): 97-113. Reiss, Ira L., 1986. Chapter 4, “The Power Filters: Gender Roles,” in Journey into Sexuality URL http://www2.hu-berlin.de/sexology/Reiss3/html/ch4.htm. Sanday, Peggy R. 1981. Female Power and Male Dominance: On the Origins of Sexual Inequality. 1981. Cambridge: Cambridge University Press. SCCS Codebook. 2012 (periodically updated, multiple authors). URL http://eclectic.ss.uci.edu/~drwhite/courses/SC-C-Codes.htm. Skyhorse, Patricia. 2003. Residence on Romanum Revisited. Ph.D. Dissertation. Social Networks, University of California, Irvine. --------- 1998. Adoption as a Strategy on a Chuukese Atoll. The History of the Family 3(4): 429-439. Snarey, John R. 1996. The natural environment's impact upon religious ethics: a crosscultural study. Journal for the Scientific Study of Religion 35(2): 85-96. v 32 Textor, Johannes, Juliane Hardt, Sven Knüppel. 2011. DAGitty: A Graphical Tool for Analyzing Causal Diagrams. Epidemiology 5(22):745. (eJournal). Turchin, Peter. 2005. Dynamical Feedbacks between Population Growth and Sociopolitical Instability in Agrarian States. Structure and Dynamics 1(1) (eJournal). http://escholarship.org/uc/search?entity=imbs_socdyn_sdeas;volume=1;issue=1 Turchin, Peter, Harvey Whitehouse, Pieter Francois, Edward Slingerland, Mark Collard, Mark. 2012. A Historical Database of Sociocultural Evolution. Cliodynamics 3(2) (eJournal). Vayda, Andrew P., and Bradley B. Walters, Eds. 2001. Causal Explanations for Social Scientists: A Reader. Lantham, Maryland: Altamira Press. White, Douglas R. 1989. Focused Ethnographic Bibliography for the Standard CrossCultural Sample. Cross-Cultural Research vol. 23 no. 1-4: 1-145. URL http://ccr.sagepub.com/content/23/1-4/1.abstract URL http://eclectic.ss.uci.edu/~drwhite/worldcul/SCCSbib.pdf (early edition). White, Douglas R. 2007. Standard Cross-Cultural Sample. International Encyclopedia of the Social Sciences, 2nd edition. Vol "S":88-95. New York: Macmillan Reference. URL http://intersci.ss.uci.edu/wiki/pub/IntlEncyStdCross-CulturalSample.pdf. White, Douglas R. 2013. List of SCCS societies and eHraf/Hraf Files. (Including those of the Levinson Encyclopedia of World Cultures). URL http://eclectic.ss.uci.edu/~drwhite/worldcul/Sccs33B.htm accessed at: URL http://eclectic.ss.uci.edu/~drwhite/courses/SC-C-Codes.htm. White, Douglas R., B. Tolga Oztan, Giorgio Gosti, Elliott Wagner, and John Snarey. 2013. Discovery of Hidden Variables for the Evolution of Ethical Religions. Unpublished. URL http://intersci.ss.uci.edu/wiki/pdf/2sLS-isReOrg10a.pdf (early version) White, Douglas R., et al. 1985-2013. The multi-authored codebook for the SCCS and the datafiles for SCCS have been available free in the online publication World Cultures at URL URL http://eclectic.ss.uci.edu/~drwhite/worldcul/world.htm and at URL http://eclectic.ss.uci.edu/~drwhite/courses/SC-C-Codes.htm. Whyte, Martin K. 1978. The Status of Women in Preindustrial Society. Princeton, NJ: Princeton University Press. Wright, Sewall. 1921. Correlation and causation. J. Agricultural Research 20: 557–585. Wright, Sewall. 1923. The theory of path coefficients: A reply to Niles’ criticism. Genetics 8: 239–255. Wright, Sewall. 1934. The method of path coefficients. Annals of Mathematical Statistics 5: 161–215. v 33