Fundamental Building Blocks of Social Structure

advertisement
Fundamental Building Blocks
of Social Structure
Honoring Peter Killworth’s
contribution to social network
theory
Southampton, Sept. 28, 2006
The network scale-up team





Peter D. Killworth (SOC)
Christopher McCarty (U Florida)
Gene A. Shelley (Georgia State U)
Eugene Johnsen (UC-Santa Barbara)
H. Russell Bernard (U Florida)
Some background: “I’ll have a go at that”
(Scripps, 1972).



I asked everyone on a ship to rank order their
interactions with all the others.
I came to the physics department coffee
break and asked "anybody here want to know
the social structure of a vessel that gets all
your data?"
The ocean-going physicists in the room knew
they weren't supposed to talk to people like
me and didn't even look up.


Peter hadn’t gotten the memo about
social scientists and said he thought it
might be fun.
And that’s what it’s been, for 34 years
and 40-odd papers later …
How to get at the structure of these data?
“Let’s try this …”



Peter applied an algorithm from F.S.
Acton’s (then) recent book “Numerical
Methods that (Usually) Work” ...
The algorithm had been developed to
solve the a traffic problem: How to get
from point A to point B fastest,
irrespective of the number of red lights
on the path.
Visualizing the messy result.
The prison studies




We combined numerical methods with
ethnography.
The cliques always made sense, until one day
…
Three numerically tied inmates whose
connections made no apparent sense:
different crimes, North and South, rural and
urban, Black and White.
Finally, finally an artifact ….
Peter: “This is too easy.”


We discovered that physicists don’t
apply their models to social structure
and anthropologists don’t test the error
bounds of their instruments.
We were half-way on this one, so we
started the accuracy studies.
How to study accuracy?

We studied people whose real communication
could be unobtrusively monitored and whose
members we could ask questions like: "So, in
the last [day], [week], [month], who did you
talk to in this group?"





Deaf people on TTYs
Ham radio operators in a local network
An early e-mail group
An office
A fraternity
Half of what people tell you is incorrect


People don’t recall behaviors that did
occur and recall behaviors that didn't
occur.
People aren’t lying. They’re just terrible
behaviorscopes.
Extending (or redefining) the problem



We asked: are the instruments for gathering
data about human behavior producing
accurate measurements of human behavior?
Others used our data and asked: what do
those instruments produce a valid
measurement of?
Answer: If you ask people who they interact
with, people retrieve who they usually
interact with and report who they ought to
interact with, given everything they already
know about their place in the social structure.
Next, the small world…


Milgram’s famous small-world experiment told
us that there are 5.5 links between any two
white people in the U.S. and exactly one
more link between any white and any black
person in the U.S.
But these numbers do not tell us anything
about the structure of the society.
Peter: “Let’s find out how the SW
actually operates”

Show people a list of SW targets,
complete with the information about
location, occupation, hobbies, and
organizations.


ask people to tell us their first link in a
small-world experiment.
Repeat 500 times and analyze the
information needed by people to make
their choice of a first link.
The reverse small world
experiments



We ran six of these experiments in the U.S.,
in Micronesia and in Mexico.
Things that people in the US find useful to
the task (name, location, occupation,
hobbies, organizations) are the same things
that people in other cultures need to know to
place someone in their network.
For both of us, the cross-cultural regularity
discovered in this series of experiments is
among the most exciting results of our work.


We created a similarity matrix between
targets: how many people used the
same choice for a given pair of targets?
A 2-d MDS shows the enduring
influence of Gerhard Mercator on
schooling.
Finding the distribution c


Our real objective, though, is to
understand the basic components of
social structure.
One quantity that seems important is
the number of people whom people
know.

We call this c
Network size … “It’s just one number”

From the first, Peter pushed us all to
learn more about the basic quanta:



How does network size vary, within and
across cultures?
What’s the distribution look like?
Our first estimate, in 1978, for average
network size in the U.S. was 250.
Peter: “You have to start somewhere.”


And what was that 250?
It was the number of people on whom the
people of Morgantown, West Virginia who sat
through this grueling, 8-hour experiment
could call on to be first links if Milgram had
shown up and asked them to participate in a
small world experiment.
Deriving c from an assumption


Let t be the size of a population, and let e be
the size of some subpopulation within it.
We assume that the fractional size
p = e/t
of that subpopulation also applies to any
individual’s network, other things being equal.
That is, everyone’s network in a society
reflects the distribution of subpopulations in
that society.
The scale-up method to estimate c

To test this, we ask a representative
sample of people to tell us how many
people they know in many
subpopulations whose sizes are known:

e.g., diabetics, gun dealers, postal workers,
women named Nicole, men named Michael
People answer accurately

Now, assuming that people can and do
answer our question accurately
A maximum likelihood estimate of
an individual’s network size:

t

L
ci
m
j 1 ij
L
e
j 1 j
where there are L known subpopulations. (Here i is
the individual, who knows mij in subpopulation j.)
Network size is (the sum of all the people you say
you know in some subpopulations of known size,
divided by the total size of those subpopulations)
times the population within which the subpopulations
are embedded.
The estimates of c are reliable



This doesn’t deal with the big IF, but
across 7 surveys in the U.S., average
network size = 290 (sd 232, median
231).
The 290 is not an average of averages.
It’s a repeated finding.
And it’s almost certainly not an artifact
of the method.
Reliability I:


In one survey, we estimated c by asking
people how many people they know in each
of 17 relation categories – people who are in
their immediate family, people who are coworkers, people who provide a service – and
summing.
The summation method (due to Chris
McCarty) produced a mean for c of 290.
Reliability II: Change the data



We changed reported values at or above 5 to
a value of 5 precisely. The mean dropped to
206, a change of 29%.
We set values of at least 5 to a uniformly
distributed random value between 5 and 15.
We repeated this random change only for
large subpopulations (with > 1 million).
The mean increased to 402, a change of 38%
-- in the opposite direction.
Reliability III: Survey clergy



We surveyed a national sample of 159
members of the clergy – people who
are widely thought to have large
networks.
Mean c = 598 for the scale-up method
Mean c = 948 for the summation
method
290 is not a coincidence



1. Two different methods of counting
produce the same result.
2. Changing the data produces large
changes in the results.
3. People who are widely thought to
have large networks do have large
networks.
Something is going on


This next slide shows the probability, for
two of our surveys, of knowing no one
in each of 29 populations of known size,
by the actual size of those populations.
The two distributions track, except for
the expected offset.
The distribution of c

Here is the graph of the distribution of
network size:
Reliability vs. validity



Ok, it’s reliable. But if the model works, we
ought to be able to use it to estimate the size
of populations whose sizes are not known.
Create a maximum likelihood estimate for the
size of an unknown subpopulation based on
what all respondents tell us and our
estimates of their network sizes.
“Roughly speaking, inverting the previous
formula.”
Can we predict what we know?


Test this by ‘predicting’ the size of 29
populations of known size.
The overall result is encouraging:
r =.79 … but note the outliers
Over- and under-estimation




The two largest populations are people
who have a twin brother or sister and
diabetics.
These are highly underestimated.
Without these two outliers, the
correlation rises from r = .79 to r = .94
“No cheating …”
Stigma vs. not newsworthy


Being a twin or a diabetic is neither
stigmatizing, nor newsworthy.
From Gene Shelley’s work, we know that
personal information about close co-workers
or business associates can take a decade or
more to be transmitted ... and in the case of
being a twin or a diabetic, may never be
transmitted.
Another encouraging result


Charles Kadushin ran a national survey to
estimate the prevalence of crimes in 14 cities,
large and small, across the U.S.
He asked 17,000 people to report the number
of people they knew who had been victims of
six kinds of crime and the number of people
they knew who used heroin regularly.

Here are the estimates for the number
of heroin users in each of the 14 cities,
along with the estimates from the UCR.


The fact that we track well with official
estimates means only that we have a
much, much less expensive way to get
at these estimates – not that the
estimates are correct.
And estimates of other crimes in those
14 cities did not track so well.
Reliability, validity, and accuracy

So, while definitely reliable and perhaps
valid, our estimate of network size (and
its distribution) is not sufficiently
accurate.
Compromising assumptions


1. Transmission effects: Everyone
knows everything about everyone they
know.
2. Barrier effects: Everyone in the
population has an equal chance of
knowing someone in any subpopulation.
Correlation between the mean number of Native Americans
known and the percent of the state population that is Native
American is 0.58, p = 0.0001.
Network social barriers





Race (Blacks may know more diabetics than
Whites do.)
Gender (men may know more gun dealers
than women do.)
Even first names are associated with the
barrier effect.
We address the barrier effect by using a
random, nationally representative sample of
respondents.
However, using the method on specific
populations may still lead to incorrect
estimates.
The transmission effect

We asked people things about people
they knew … and then called up those
people to see how much people really
do know about their network members.
Some things are easy to get right



99% know their alters’ marital status.
People know how many children 89% of
their alters have.
98% know the employment status of
their alters.
Some things are harder to know


People say they know the state in which
70% of their alters were born, but only
57% of the reports (ego’s and alter’s)
agree on this.
People don’t know the number of
siblings their alters have 52% of the
time.
Some people withdraw


Gene Shelley found that people who are
HIV+ withdraw from their network in
order to limit the number of people who
know their HIV status.
Eugene Johnsen confirmed this by
showing that HIV+ people have, on
average, networks that are one-third
the global average.
A theory of transmission bias

Take another look at the comparison of
the data from clergy and others:


It’s likely that you know at least one
Christopher (the probability of knowing NO
Christophers is close to zero).
Twins are likely to be underreported


Peter said: Assume that people report
correctly what they know but that what
they know is incorrect.
What would happen to the jaggedy
curve if people responded honestly to
correct information instead of honestly
to incorrect information?
How to adjust the x-axis rather
than the y-axis in the diagram?



Suppose that widows don’t tell half the
people they know about their being a
widow.
The .013 on the x-axis remains the
same but the number that people would
be responding to would be .013/2.
To make the x-axis the effective size of
that population, we slide it to the left
while the y-axis remains the same.
“The jaggedy line would go”


Of course, we have no idea what the
transmission error might be.
We do know that if the numbers remain
the same on the y-axis and we make up
the effective sizes on the x-axis, the
jaggedy line would go.


Peter did this analytically and computed
the predicted distribution of c.
The next slide shows that we may be
on the right track:
Peter’s (highly) unusual place in the
social sciences



No. of articles 154
In Social Science journals (43)
Total number of Citations: 3194


In Social Science journals: 456 (14 %)
In non-Social Science journals: 2738 (86%)
1981-2005 Overall Standard Category Baseline
Category
Description
CITATIONS
SOURCES
UNCITED
MEAN1
AGD
Agricultural Sciences
3026292
405216
110302
7.47
BID
Biology & Biochemistry
30591140
1289467
144133
23.72
CHD
Chemistry
25653067
2160543
406171
11.87
CLD
Clinical Medicine
55909778
3637855
675163
15.37
CSD
Computer Science
881307
162037
62040
5.44
EVD
Ecology/Environment
4375528
369984
71045
11.83
ECD
Economics & Business
2037455
230987
70664
8.82
EDD
Education
271748
64763
24699
4.2
EGD
Engineering
6389607
1155040
405542
5.53
GED
Geosciences
5532230
419742
90315
13.18
IMD
Immunology
8335481
263920
19389
31.58
LAD
Law
305586
46858
16145
6.52
MSD
Materials Science
3705565
537936
166000
6.89
MTD
Mathematics
1820446
303592
90453
6
MCD
Microbiology
7234059
362533
37267
19.95
MBD
Molecular Biology & Genetics
15207323
418987
37618
36.3
OTD
Multdisciplinary
2557670
279654
80925
9.15
NED
Neurosciences & Behavior
15246539
581369
51923
26.23
PMD
Pharmacology
5444675
384245
50634
14.17
PHD
Physics
20291158
1820458
407636
11.15
PLD
Plant & Animal Science
10260550
1042930
212907
9.84
PSD
Psychology/Psychiatry
6277431
439095
83162
14.3
SSD
Social Sciences, general
3015451
484739
152641
6.22
ASD
Space Science
3469032
181566
23561
19.11
• http://garfield.library.upenn.edu/histcomp/k
illworth-pd_citing/
(http://tinyurl.com/nmhdc)
• http://garfield.library.upenn.edu/histcomp/k
illworth-pd_auth/ (http://tinyurl.com/ppr82)
Download