Multilevel networks and world ethnography

advertisement
Multilevel networks and world
ethnography
Doug White and UC team
UCI human complexity seminar 1:30
Fri, Sept 24, 2010
UCI grads and undergrads can sign up for
1.33 credits:
SOC SCI 240A SEM A (72100)
UCI_imbs+UCSD_econ+UCLA_cs, project team
Scott D. White, UCI , One Spot
B. Tolga Oztan, MBS
Ren Feng. Xi’an Xiaotung Univ.
Karim Chalak BU, Econ
Halbert White UCSD
Doug
White, MBS
Assist from
Judea Pearl
UCLA
Tony Eff, MST,U Econ
Comparing ethnographic data
•
Early best-described ethnographies give our best chance of understanding the
evolution of societies and cultures.
•
We now have large samples, N=186, 390, 1400, ….
•
And many variables, V=2000+(SCCS), 500+ (foragers), 150 (EthnoAtlas), ….
•
Problem is that correlations of variables have no meaning.
•
Historical interactions have huge effects – splitting, branching , merging, borrowing,
migrating, colonizing, conquering.
We still have, among these ethnographic cases, living societies like the Hadza with
the genetic stock of humanity’s common ancestors of 150,000 years ago, or the San
with the next-split of 120,000 years ago, etc.
•
•
It’s not a matter of splitting the historical network interaction and regional
similarities from functional and causal relations. (Harold Driver’s struggle with
Kroeber)
•
Its that the statistics for doing so have been so weak as to not be capable of
making these inferences, and the concept of inference has been too weak.
Inference, Statistics, Causality
•
Statistical inference has to do with replicability, or change, given changing conditions.
In survey data, it’s hard to get replication because of changing
– Composition of the sample: location, composition
– Peer effects and interactions within the sample
– Time period
•
So results will vary with different (e.g., cross-cultural) samples .
•
Looking for invariance is like Norm Schofield ‘s question “What are the causal variables
for how people vote” other than the names and parties of the candidates? I.e., no
proper nouns in the variables for causality.
•
When there are peer effects operating in the sample, significance tests are exaggerated
(type 1 error). Cross-cultural studies are full of type 1 error.
•
Random sampling (Ember&Bernard solution) does not solve this problem although
means and correlations may be estimated correctly.
•
•
But correlations will vary widely with changes in the sample or with time.
replication is thwarted finding differences in replication when none exist (type 2 error)
•
A stronger concept of inference would deal with causation not correlation.
•
This began with structural equation models (Sewall Wright 1921, 1935 SEM)
and continues with Judea Pearl et al’s extensions of graphical methods
Where we are now: *Rccs*
• new program package *Rccs* takes into account
network effect matrices (distance, language, etc.)
• Computes regression coefficients
• R code for 2SLS implemented for classroom use
• (see pdfs attached to intersci wiki talk)
• Compute causal graph models (in development)
• E.g., computes total effects (adding indirect paths)
• Chalak and H. White extension: reciprocal effects
x-->y plus y-->x-->y as including the indirect path
Solving Galton’s problem with 2SLS: 2-Stage OLS
2-stage ordinary least squares regression with peer effects
Stage 1 OLS
Calculate the
“Instruments”
2nd Stage OLS,
Include the
“Instruments”
peer effects
I
X1
IW I
independent variables X2
X3
Y dependent variable
Causal graph, Pearl’s regression method
“Say for three variables you are trying to estimate the direct
effect c of X on Z given an indirect effect of Y. The causal
diagram model gives you a license to do it by the regression
method, where, for example (for reference on the pdf)
E(y|x, z) – E(y|x´, z)
a
X Y
c = —————————————
(1)
c
b
x – x´
,
Z
Controlling for the change from x to x´, E(y|x, z) and E(y|x´,
z) are the changes in variable Z due to unit changes in X
controlling for Y.” (email from Pearl see Pearl 2000:151,
368; Chalak and White 2010). Because the x,z in (y|x,z) is a
joint distribution, eqn (1) means that x→x´ changes y which
through the x-y-x path, considered as a joint distribution,
changes z. From this it follows, given the single door
criterion (Pearl 2000:150) that c + a•b = rxy.z, the coef for
total effect of X on Z.
Comparing causality in ethnographic data
• R package *Rsccs* takes into account any number of peer
effects as Instruments in the previous equations that allow
further causal analysis
• Regressions change with time periods
• Correlate total effects to X → Y (time lagged correlations)
• Regressions that yield results for causality can be identified
in Pearl’s (2000) single door, back door, and front door
causal graph criteria. Some graphical structures may require
that some potential confounder variables be blocked, and
that direct, indirect and total effects be computed from
regression coefficients without those confounders.
Language families as an Instrument for
measuring peer effects
Multilevel effect
language tree
Spatial distances as Instrument for
measuring peer effects
Standard Cross-Cultural Sample
(SCCS wikipedia maps by Tony Eff)
Afro-Eurasia drawn to a
slightly smaller scale
Multilevel effect
spatial
World system peer effects -- of exchange – as Instruments
Folded image: Core, Semi-periphery1, SP2, P1-2
Core
Semi-Peri1
Semi-Peri2
Periphery1
Multilevel effects
Periphery2
world system
A structurally endogamous kinship network core of
a Turkish nomad clan
Multilevel effects
(White and Johansen 2005: 379; 76-79). internal networks
Up and down effects
13 linked regressions out of 2000+ SCCS variables
http://eclectic.ss.uci.edu/~drwhite/courses/SCCCodes.htm
Nodes are variables in regression analyses of variables from the Standard Cross-Cultural Sample of 186 societies (SCCS).
Lines represent independent variables. They point down to 13 dependent variables in successive colored layers.
Black lines are positive effects, red lines negative effects from regression results.
Colors of nodes for variables show depth in a causal hierarchy with net effects estimated as causal graphs (Pearl 2000).
At level 4 the Evil eye dependent variable has a triangular relationship with money and milked domestic animals.
The regressions control for peer effects of spatial transmission (distance) and cultural transmission (language phylogeny),
incorporated as Instrumental Variables in a second-stage regression, with the IVs estimated in a first-stage regression.
Node sizes reflect the significance of spatial transmission peer effects. Language effects are sometimes negative.
Paired visual comparison of spatial distributions
v1189 Belief in evil eye
238.
v238 Moral gods==4
HIGH GODS
18
. = Missing data
68
1 = Absent or not reported
47
2 = Present but not active in human affairs
13
3 = Present and active in human affairs but not supportive
of human morality
40
4 = Present, active, and specifically supportive of human morality
NOTE the circum-Mediterranean overlap with Evil eye (previous slide)
Paired visual comparison of spatial distributions
v1189 Belief in evil eye (dichotomy)
Large nodes red
Small nodes orange
155. SCALE
77
14
43
27
25
v155 True money==5
7- MONEY (here, an independent variable)
1 = None
2 = Domestically usable articles
3 = Alien currency
4 = Elementary forms
5 = True money
NOTE the circum-Mediterranean overlap with Evil eye (previous slide)
Paired visual comparison of spatial distributions
v1189 Belief in evil eye
v272 Caste stratification
272. CASTE STRATIFICATION (ENDOGAMY) (two cases have secondary
castes)
5 . = Missing data
(154) 0 = (Omitted from map) Absent or insignificant
17 1 = Despised occupational group(s)
3 2 = Ethnic stratification
7 3 = Complex
NOTE the circum-Mediterranean overlap with Evil eye (previous slide)
Paired visual comparison of spatial distributions
v1189 Belief in evil eye
v245 Milked animals
NOTE the circum-Mediterranean overlap with Evil eye (previous slide)
v1189 Belief in evil eye
R2=0.513; N=186;
10 imputations; standard errors
00R2 adjusted for two-stage least
squares. Language nonsignificant (p > .33). No effect of
Islam or Christianity.
v1189 Belief in evil eye
Some nonlinear relationships
No additional variables
Error terms homoskedastic
" " not normally distributed
no " " cultural lag
no " " spatial lag
R2=0.490; N=186;
10 imputations; standard errors
00R2 adjusted for two-stage least
squares. Distance (p > .00002) &
language significant (p > .003).
v155 Money
No nonlinear relationships
Some additional variables
Error terms homoskedastic
" " normally distributed
no " " cultural lag
no " " spatial lag
v155 Money
R2=0.504; N=186;
10 imputations; standard errors
00R2 adjusted for two-stage least
squares. Distance (p < .00001) &
language insignificant (p > .15).
v238 Moral gods
No nonlinear relationships
No additional variables
Error terms ~homoskedastic
" " not normally distributed
no " " cultural lag
no " " spatial lag
v238 Moral gods
Transmission effects (Galton’s problem): Spatial and cultural
Peer Effect
Spatial
Transmission
(Distance)
Cultural
Transmission
(Language)
Variable
Money
Moral gods
Evil eye
Money
Moral gods
Evil eye
coef
.960
.824
.767
-.988
-.672
-.228
pvalue
.0000009
.0000014
.000002
.002
p > 0.14
p > 0.36
Negative peer effects for language indicate that, for each of
these dependent variables, there is a tendency, strong for Money
and weak for the other two variables, NOT to be the result of
cultural tradition but of innovation that differentiates the societies
with Money, Moral gods and Evil eye from the norms in their
respective language families. This tendency is nearly significant
(value < 0.15) for societies with Moral gods.
Excluding peer effects: Causal graph with multiple triangular
000regression coefficients - numbers are the regression coefficients
-0.393
Milking animals A
B Money (v155)
(v245)
0.484
0.102 p<0.14
Moral gods D
(v238)
0.294
0.792
0.430
0.597
0.664
1.372
C Evil eye (v1188)
Caststrat LGd E
Causal
graph total
effects and
regression
slopes
Independent Dependent
Variable
Variable
Net effects=Direct and Indirect =Total
Causal Graph Effects
effects
Money
Evil eye
0.597
0.597
Moral gods
Evil eye
0.294+(0.102*.597)
0.355
Milking
Evil eye
0.664+(-.393*.597)+(.484*.104*.597)
0.744
Moral gods
Money
0.102
0.102
Milking
Money
-.393+(0.484*.102)
-0.344
THESE CAUSALITIES A-E-D-B-C ARE TRANSITIVE, all
significant or nearly so, and completely ordered but
the arrow from A to B is NEGATIVE
A Milked domestic animals
E Caste stratification
D Moral gods (to money only p <.15)
B Money
C Evil eye
A 2-slide example for
two time periods is
next, if time allows
(package*Rccs*
applies to time series,
includes multiplicative
interactions as well)
Causal analysis: Transformation predictions
from Indian Jajmani to market system
R2 = .672
Data source:
Maximizing in Jajmaniland: A
Model of Caste Relations. 1968.
MARTIN ORANS. American
Anthropologist 70(5): 875–897.
R2 = .623
Correct time 2 predictions
match causal inferences
P=.067
R2 = .747
P=.055
Peer effect regression time 1
(Temporal predictions about
changes are even stronger)
P=.05
p=.067
Causal graphs may incorporate multiplicative or interaction effects, which are
used by Martin Orans in his 1968 article. These are diagrammed
Jajmani system
Power concentration
Power concentration
Isolation
concentration
Power
Isolation
Ritual-secular
correlation
Isolation
Jajmani
system
None of these models were significant, however, compared to the simple linear
additive effects that we tested and found significant (Ren Feng, T. Oztan, D. White)
Further slides, if time
allows, show different
kinds of analysis than
that of *Rccs*
Other kinds of cross-cultural data structures and analyses:
Statistical Entailment Analyses:
Society sets for variables tend to form chains of sets
Galois duality lattice (Concept lattices):
Society sets for variables tend to form chains of sets
and intersections, and opposite ordering of
Sets of variables that tend to form chains of sets
VS1 VS2 VS3 VS4
A B C D
A B C D
Intrasocietal network structure overlays on genealogy
For each society these will define new variables such as
1) sidedness, reciprocal marriage to opposites.
2) structurally endogamous groups
3) marriage-type census as against random simulation
4) distribution of structural features over generations
Multilevel analysis e.g. regional or world system effects
local societies.
on
Fig. 3: An exact world
entailment digraph for the
sexual division of labor
Late Task A Early Task B
Female
Male
Female
Male
Fig. 3: An exact world
ethnographic lattice of kin
avoidances has a four-dimensional
partial ordering of distributions: 1)
parents of Hu, Wi (opp/same sex,
within circles), 2) siblings and
siblings-in-law of Hu and Wife
(opp/same sex, in parallelograms),
3) opposite sex siblings & parents
siblings & parallel cousins (White
1995). Lower types of avoidances
entail upper ones features in
perfect inclusion relations, found
by statistical entailment analysis
(White 1999b). Of the 250
societies, names attached to each
node show each subset of
avoidance relations.
Table 1
Pajek Repast Simulation
X
X
Peer Effects
ArcGIS.com
New Codes
New Ethnogr. Cases
X
X
3
400 foragers2
X
X
(Binford & Boehm)
85 World-system 3
X
X
1294 Atlas4
X
X
0
186 SCCS5
28 1945-19656
30 Post 19657
X
X
0
X
X
X
28 (SCCS)
X
X
X
308 (eSCCS)
80KinSources1
X
Cohesion
2
(country data)
1 http://kinsource.net/kinsrc/bin/view/KinSources archives kinship network data
contributed by anthropologists. Only three KS ethnographies remain for conversion from
paper-based genealogies to e-networks for analysis with Pajek, but others will be added.
2,5 Binford’s (2001) Constructing Frames of Reference forager database has been
spreadsheeted by Boehm and Hill. Non-foragers from the SCCS will be analyzed separately.
Extensive testing of “peer effects” methods have established their validity.
3 Smith and White (1992) have postwar WS commodity flow time series in 5yr intervals;
capital and migration flow will be added.
4 Murdock’s Ethnographic Atlas (EA) in Spss format has been supplemented by newly
authored installments 30-31.
5 Murdock and White’s (1969) Standard Cross-Cultural Sample dataset on 186 societies in
Spss and R formats has coded data contributions from 80+ different authors on 2008+
variables. Citations to SCCS are now 95+/year and growing.
Table 1
80 KinSources1
Pajek
X
Repast Simulation
X
Cohesion
X
Peer Effects
ArcGIS.com
New Codes
New Ethnogr. Cases
X
X
3
400 foragers2
X
X
(Binford & Boehm)
85 Wrld-system3
X
X
1294 Atlas4
X
X
0
186 SCCS5
X
X
0
28 1945-19656
X
X
X
28 (SCCS)
30 Post 19657
X
X
X
308 (eSCCS)
2
(country data)
5 Murdock and White’s (1969) Standard Cross-Cultural Sample dataset on 186 societies in
Spss and R formats has coded data contributions from 80+ different authors on 2008+
variables. Citations to SCCS are now 95+/year and growing.
6 109 missing codes for 28 SCCS variables 1006-1115 will be coded for 28 SCCS societies on
the world-system impacts variables partially coded in White and Burton’s (1985-1988) NSF
8507685 funded research on “World-Systems and Ethnological Theory.”
7 To bring the SCCS societies up to date for post-1965 societies, 30 well described post-1965
ethnographic cases will be added to an (expanded) eSCCS and coded for EA variables and
the CDC Cultural Diversity Codebook of 180 SCCS variables.
8 Given that the SCC Sample was published in 1969, the eSCCS additions to the sample will
bring it up to date temporally. This will allow study of world-system impacts on 37 welldescribed ethnographic cases in the contemporary post-war period.
A structurally endogamous kinship network core of
a Turkish nomad clan (White and Johansen 2005:
379; 76-79).
Fig. 1.A. Gmap of Cultural Survival
(2010) 100+ recent trouble spot
study cases: Gmaps extend to
networks at the global level,
clicking into cases at the local level.
Live: http://bit.ly/c1funC
Fig. 1.B. This google map tracks
cases of swine flu in 2009, types of
cases are color coded, fatal cases
have no dot, clicking a region gives
a more detailed map of cases within
the region.
Similarly, Wolf (1982) drills down at several
hundred ethnographically data points to
analyze how commodity exchange affected
indigenous societies in the 1500-1980 period
of overseas conquest and modern worldsystems.
Interactive maps provide for drilling down
from a network at one level (network spread
of disease not shown here) by clicking a node
to see a more detailed map or a network
within that node. The upper level nodes can
be societies with organizations networks
reached by a click of a given node.
Download