Phonological neighbors in a small world: What can graph theory tell

advertisement
Phonological neighbors in a small
world: What can graph theory tell
us about word learning?
Michael S. Vitevitch
Department of Psychology
University of Kansas
NIH-NIDCD R03 DC 04259
NIH-NIDCD R01 DC 006472
1
Graph Theory
• Graphically represent complex systems
– Graph or Network
• Vertices or Nodes
• Edges or Links/Connections
– Examples of systems
•
•
•
•
Diamond (crystals)
WWW
power grid
Interstate highway system
2
Ordered Graph
3
Random Graph
4
Graph Theory
• Between ordered and random graphs are
small-world graphs
– Small path length (Six Degrees of Separation)
– High clustering coefficient (relative to random)
• “Probability” of my two friends being friends with
each other.
• 0 to 1
5
Movie Actors
(Watts & Strogatz, 1998)
• Large and complex system
– 225,226 actors
• Internet Movie Database circa 1998
• Node = Actor
• Connection = Co-starred in a movie
6
Movie Actors
• Despite large size, the network of actors
exhibits small-world behavior
– Average of 3 links between any two actors
– Clustering coefficientrandom = .00027
– Clustering coefficientactors = .79
• Over 2000 times larger
7
Graph Theory
• Some small-world graphs also are scale-free.
– Degree = number of links per node
• These systems exhibit interesting characteristics
– Efficient processing
– Development/growth
– Robustness of the system to attack/failure
8
Graph Theory
• A randomly connected network has a bellshaped degree distribution.
– Most nodes have the average number of links
– Few nodes have more links than average
– Few nodes have less links than average
• Scale = stereotypical node (characterized by mean)
9
Graph Theory
• In a network with a scale-free degree
distribution there is no stereotypical node.
– The degree-distribution follows a power-law
– Many nodes have few connections
– Few nodes have many connections
10
Scale-free network
The power-law degree distribution of a scalefree network emerges as a result of:
• Growth
– New nodes are added to the system over time.
• Preferential Attachment
– New nodes tend to form links with nodes that are highly
connected.
11
Implications of graph structure
• The structure of a network constrains the
processes that operate in the system and influences
how well the system withstands damage.
12
Viewing the mental lexicon as a graph
• Does the mental lexicon have a smallworld structure?
• Does the mental lexicon have a scale-free
structure?
• How does the structure of the mental
lexicon influence various processes?
13
The mental lexicon as a graph
• Nodes = 19,340 word-forms in database
– Nusbaum, Pisoni & Davis (1984)
• Links = phonologically related
– One phoneme metric (Luce & Pisoni, 1998)
14
Links in the mental lexicon
• One phoneme metric
– cat has as neighbors
• scat, at, hat, cut, can, etc.
– dog is NOT a phonological neighbor of cat
• Steyvers & Tenenbaum (2003)
• Ferrer i Cancho & Solé (2001)
15
The mental lexicon as a graph
• Small-world network
– Relatively small path-length
– Relatively high clustering coefficient
• Scale-free topology
– Power-law degree distribution
• Growth
• Preferential attachment
16
The mental lexicon as a graph
Pajek (Batagelj & Mrvar, 2004)
• Program for analysis and visualization of
large networks
17
Graph of Adult Lexicon
18
The mental lexicon as a graph
Network
Characteristic
Lexical
Network
10 Random
Networks
n
<k>

D
C
19,340
19,340
3.23
3.23
6.05
8.44 (.04)
29
19 (.816)
.045
.000162
(.000047)
19
Path-length
Average distance between two nodes
–cat-mat-mass-mouse
– = 6.05
20
Diameter
Longest path length
– D = 29
– connect & rehearsal
• connect, collect, elect, affect, effect, infect, insect, inset,
insert, inert, inurn, epergne, spurn, spin, sin, sieve, live,
liver, lever, leva, leaven, heaven, haven, raven, riven, rivet,
revert, reverse, rehearse, rehearsal
21
Clustering Coefficient
A small-world network has a larger clustering
coefficient (by orders of magnitude) than a
random network
– C = .045
• Over 250 times greater
22
The mental lexicon is a
small-world network
• Relatively short path length
• Large clustering coefficient
23
Does the mental lexicon have a
scale-free topology?
• A degree distribution that follows a power law.
– Growth
– Preferential attachment
24
log P(k)
log k
log k
25
log P(k)
12500
10000
P(k)
7500
5000
2500
0
0
10
20
k
30
40
26
1E+05
Log P(k)
1E+04
1E+03
1E+02
1E+01
1E+00
1E+00
1E+01
Log k
1E+02
27
Power-law degree distribution
28
Power-law degree distribution
Degree distribution for the lexicon:
 = 1.96
(approaching 2 < < 3)
29
Growth & Preferential Attachment
• Growth
– Children (and adults) learn new words
– New words are added to the language over time
• Preferential attachment
30
Preferential Attachment
• Words that are added to the lexicon early in
life should have more links than words that
are added to the lexicon latter in life.
– Storkel (2004)
• Relationship between AoA and Density
31
Preferential Attachment
• Phonological neighborhoods should
become “denser” over time.
– Charles-Luce and Luce (1990, 1995)
• Analyzed words in adults and children 5- and 7-years old.
• Neighborhood density for words in the adult lexicon were
denser than the neighborhood density for those same words
in the 5- and 7-years old lexicons.
32
Preferential Attachment
• Words with denser neighborhoods
should be easier to learn/acquire.
– Storkel (2001, 2003)
• Pre-school age children learned novel words that had
common sound sequences/dense neighborhoods more
rapidly than novel words that had rare sound
sequences/sparse neighborhoods.
• Adults, too (Storkel, Armbrüster & Hogan, submitted)
33
Advantage of this structure
Topological robustness
• Damage does not result in
catastrophic failure
34
Topological robustness
• Damage tends to affect less connected nodes.
• Hubs maintain integrity of the whole system.
– Even if a hub is damaged, the presence of other hubs will
absorb the extra processing load.
– Only if every node has been damaged will a scale-free network
catastrophically fail.
35
Topological robustness in the
mental lexicon
• Speech production errors occur more for words with
sparse than dense neighborhoods.
– Vitevitch (1997; 2002) and Vitevitch & Sommers (2003)
• Same pattern for errors in patients with aphasia
– Gordon & Dell (2001)
• More errors in STM for words with sparse than
dense neighborhoods.
– Roodenrys et al. (2002)
36
Mental lexicon as a scale-free network
• The present analysis suggests the lexicon has
a scale-free topology.
• Evidence from several areas is consistent with
predictions derived from a scale-free lexicon.
37
Do the characteristics of
graph theory have any
psychological reality?
38
Psychological reality of graph theory
• k (degree) = Neighborhood Density
– Luce & Pisoni (1998)
– Vitevitch (2002)
• Clustering Coefficient
– Probability of two neighbors of a word
being neighbors with each other.
39
Clustering Coefficient Experiment
• Auditory Lexical Decision Task (n = 57)
• Words varying in clustering coefficient
–
–
–
–
–
–
Frequency
Familiarity
Neighborhood Density
NHF
Phonotactic Probability
Onsets
40
hive
wise
41
Clustering Coefficient Experiment
Low CC
hive
High CC
wise
944ms
93%
919 ms
93%
42
Clustering Coefficient Experiment
In spoken word recognition
• k (degree), neighborhood density
– Words with sparse neighborhoods are responded to more
quickly than words with dense neighborhoods.
• Clustering Coefficient
– Words with high CC are responded to more
quickly than words with low CC.
43
When does a scale-free
lexicon emerge?
44
When does a scale-free lexicon emerge?
• Traditional benchmark for “vocabulary
spurt” is 50 words (about 18 mo.)
– (e.g., Goldfield & Reznick, 1996; Mervis & Bertrand, 1995).
• Various mechanisms have been proposed
for the vocabulary spurt
– (e.g., Golinkoff et al. 2000; Nazzi & Bertoncini, 2003).
45
MacArthur Communicative
Development Inventory (CDI)
Estimate known words in 16-30 m.o. children
– The earliest age at which 50% of the children knew a
given word at a particular age.
– 16, 18, 19, 30 months of old
46
Network Statistics
16 mo.
18 mo.
19 mo.
30 mo.
n
24
38
78
490
<k>
.42
.39
.47
1.31

D
1.00
[1.33]
1 [2]
1.11
[1.30]
2 [2]
1.65
[1.79]
3 [4]
6.25
[11.05]
17 [29]
C
.042 [0]
.026 [0]
.034 [0]
.089 [0]

1.63
1.76
2.41
2.09
47
Emergence of a scale-free lexicon
“Vocabulary spurt” is often observed:
–18- to 19-months of age
–50-words
–Signals reorganization in the lexicon.
48
Emergence of a scale-free lexicon
A scale-free network emerged at the
same age/developmental milestone.
• This may lead to highly efficient word
learning and language processing.
49
Emergence of a scale-free lexicon
Variability in age/vocabulary size associated
with this developmental milestone may be
due to different initial starting states.
– The first few sound patterns that are learned
may play a large role in determining how
easily subsequent words are acquired.
• Mandel, Jusczyk & Pisoni (1995)
50
Limitations of this analysis
• Minimal scale-free model
– Binary links
• Weighted links for other similarity relationships
– Embedded words, morphology, phonographic
– Undirected links
• Directed links for asymmetric relationships
– cat-catfish versus catfish-cat
51
Limitations of this analysis
• Minimal scale-free model
– Preferential attachment & fitness function ()
• Frequency and Recency of usage
• Age
52
Advantages of this approach
• Connects research of complex cognitive systems
to research of other (biological, technological,
social) complex systems.
– Speech is (not so) special
53
Advantages of this approach
• Common framework for theories of language
evolution, development, adult processing, aging, and
disordered populations (topological robustness).
54
Advantages of this approach
• Perception versus Production
• Degree versus Clustering Coefficient
• Cross-linguistic analyses
55
Acknowledgements
•
•
•
•
•
•
NIH-NIDCD R03 DC 04259 & R01 DC 006472
Members of the Child Language Doctoral Program
Members of the Spoken Language Laboratory
Holly Storkel
John Colombo
Ed Auer
56
Download