Phonological neighbors in a small world: What can graph theory tell us about word learning? Michael S. Vitevitch Department of Psychology University of Kansas NIH-NIDCD R03 DC 04259 NIH-NIDCD R01 DC 006472 1 Graph Theory • Graphically represent complex systems – Graph or Network • Vertices or Nodes • Edges or Links/Connections – Examples of systems • • • • Diamond (crystals) WWW power grid Interstate highway system 2 Ordered Graph 3 Random Graph 4 Graph Theory • Between ordered and random graphs are small-world graphs – Small path length (Six Degrees of Separation) – High clustering coefficient (relative to random) • “Probability” of my two friends being friends with each other. • 0 to 1 5 Movie Actors (Watts & Strogatz, 1998) • Large and complex system – 225,226 actors • Internet Movie Database circa 1998 • Node = Actor • Connection = Co-starred in a movie 6 Movie Actors • Despite large size, the network of actors exhibits small-world behavior – Average of 3 links between any two actors – Clustering coefficientrandom = .00027 – Clustering coefficientactors = .79 • Over 2000 times larger 7 Graph Theory • Some small-world graphs also are scale-free. – Degree = number of links per node • These systems exhibit interesting characteristics – Efficient processing – Development/growth – Robustness of the system to attack/failure 8 Graph Theory • A randomly connected network has a bellshaped degree distribution. – Most nodes have the average number of links – Few nodes have more links than average – Few nodes have less links than average • Scale = stereotypical node (characterized by mean) 9 Graph Theory • In a network with a scale-free degree distribution there is no stereotypical node. – The degree-distribution follows a power-law – Many nodes have few connections – Few nodes have many connections 10 Scale-free network The power-law degree distribution of a scalefree network emerges as a result of: • Growth – New nodes are added to the system over time. • Preferential Attachment – New nodes tend to form links with nodes that are highly connected. 11 Implications of graph structure • The structure of a network constrains the processes that operate in the system and influences how well the system withstands damage. 12 Viewing the mental lexicon as a graph • Does the mental lexicon have a smallworld structure? • Does the mental lexicon have a scale-free structure? • How does the structure of the mental lexicon influence various processes? 13 The mental lexicon as a graph • Nodes = 19,340 word-forms in database – Nusbaum, Pisoni & Davis (1984) • Links = phonologically related – One phoneme metric (Luce & Pisoni, 1998) 14 Links in the mental lexicon • One phoneme metric – cat has as neighbors • scat, at, hat, cut, can, etc. – dog is NOT a phonological neighbor of cat • Steyvers & Tenenbaum (2003) • Ferrer i Cancho & Solé (2001) 15 The mental lexicon as a graph • Small-world network – Relatively small path-length – Relatively high clustering coefficient • Scale-free topology – Power-law degree distribution • Growth • Preferential attachment 16 The mental lexicon as a graph Pajek (Batagelj & Mrvar, 2004) • Program for analysis and visualization of large networks 17 Graph of Adult Lexicon 18 The mental lexicon as a graph Network Characteristic Lexical Network 10 Random Networks n <k> D C 19,340 19,340 3.23 3.23 6.05 8.44 (.04) 29 19 (.816) .045 .000162 (.000047) 19 Path-length Average distance between two nodes –cat-mat-mass-mouse – = 6.05 20 Diameter Longest path length – D = 29 – connect & rehearsal • connect, collect, elect, affect, effect, infect, insect, inset, insert, inert, inurn, epergne, spurn, spin, sin, sieve, live, liver, lever, leva, leaven, heaven, haven, raven, riven, rivet, revert, reverse, rehearse, rehearsal 21 Clustering Coefficient A small-world network has a larger clustering coefficient (by orders of magnitude) than a random network – C = .045 • Over 250 times greater 22 The mental lexicon is a small-world network • Relatively short path length • Large clustering coefficient 23 Does the mental lexicon have a scale-free topology? • A degree distribution that follows a power law. – Growth – Preferential attachment 24 log P(k) log k log k 25 log P(k) 12500 10000 P(k) 7500 5000 2500 0 0 10 20 k 30 40 26 1E+05 Log P(k) 1E+04 1E+03 1E+02 1E+01 1E+00 1E+00 1E+01 Log k 1E+02 27 Power-law degree distribution 28 Power-law degree distribution Degree distribution for the lexicon: = 1.96 (approaching 2 < < 3) 29 Growth & Preferential Attachment • Growth – Children (and adults) learn new words – New words are added to the language over time • Preferential attachment 30 Preferential Attachment • Words that are added to the lexicon early in life should have more links than words that are added to the lexicon latter in life. – Storkel (2004) • Relationship between AoA and Density 31 Preferential Attachment • Phonological neighborhoods should become “denser” over time. – Charles-Luce and Luce (1990, 1995) • Analyzed words in adults and children 5- and 7-years old. • Neighborhood density for words in the adult lexicon were denser than the neighborhood density for those same words in the 5- and 7-years old lexicons. 32 Preferential Attachment • Words with denser neighborhoods should be easier to learn/acquire. – Storkel (2001, 2003) • Pre-school age children learned novel words that had common sound sequences/dense neighborhoods more rapidly than novel words that had rare sound sequences/sparse neighborhoods. • Adults, too (Storkel, Armbrüster & Hogan, submitted) 33 Advantage of this structure Topological robustness • Damage does not result in catastrophic failure 34 Topological robustness • Damage tends to affect less connected nodes. • Hubs maintain integrity of the whole system. – Even if a hub is damaged, the presence of other hubs will absorb the extra processing load. – Only if every node has been damaged will a scale-free network catastrophically fail. 35 Topological robustness in the mental lexicon • Speech production errors occur more for words with sparse than dense neighborhoods. – Vitevitch (1997; 2002) and Vitevitch & Sommers (2003) • Same pattern for errors in patients with aphasia – Gordon & Dell (2001) • More errors in STM for words with sparse than dense neighborhoods. – Roodenrys et al. (2002) 36 Mental lexicon as a scale-free network • The present analysis suggests the lexicon has a scale-free topology. • Evidence from several areas is consistent with predictions derived from a scale-free lexicon. 37 Do the characteristics of graph theory have any psychological reality? 38 Psychological reality of graph theory • k (degree) = Neighborhood Density – Luce & Pisoni (1998) – Vitevitch (2002) • Clustering Coefficient – Probability of two neighbors of a word being neighbors with each other. 39 Clustering Coefficient Experiment • Auditory Lexical Decision Task (n = 57) • Words varying in clustering coefficient – – – – – – Frequency Familiarity Neighborhood Density NHF Phonotactic Probability Onsets 40 hive wise 41 Clustering Coefficient Experiment Low CC hive High CC wise 944ms 93% 919 ms 93% 42 Clustering Coefficient Experiment In spoken word recognition • k (degree), neighborhood density – Words with sparse neighborhoods are responded to more quickly than words with dense neighborhoods. • Clustering Coefficient – Words with high CC are responded to more quickly than words with low CC. 43 When does a scale-free lexicon emerge? 44 When does a scale-free lexicon emerge? • Traditional benchmark for “vocabulary spurt” is 50 words (about 18 mo.) – (e.g., Goldfield & Reznick, 1996; Mervis & Bertrand, 1995). • Various mechanisms have been proposed for the vocabulary spurt – (e.g., Golinkoff et al. 2000; Nazzi & Bertoncini, 2003). 45 MacArthur Communicative Development Inventory (CDI) Estimate known words in 16-30 m.o. children – The earliest age at which 50% of the children knew a given word at a particular age. – 16, 18, 19, 30 months of old 46 Network Statistics 16 mo. 18 mo. 19 mo. 30 mo. n 24 38 78 490 <k> .42 .39 .47 1.31 D 1.00 [1.33] 1 [2] 1.11 [1.30] 2 [2] 1.65 [1.79] 3 [4] 6.25 [11.05] 17 [29] C .042 [0] .026 [0] .034 [0] .089 [0] 1.63 1.76 2.41 2.09 47 Emergence of a scale-free lexicon “Vocabulary spurt” is often observed: –18- to 19-months of age –50-words –Signals reorganization in the lexicon. 48 Emergence of a scale-free lexicon A scale-free network emerged at the same age/developmental milestone. • This may lead to highly efficient word learning and language processing. 49 Emergence of a scale-free lexicon Variability in age/vocabulary size associated with this developmental milestone may be due to different initial starting states. – The first few sound patterns that are learned may play a large role in determining how easily subsequent words are acquired. • Mandel, Jusczyk & Pisoni (1995) 50 Limitations of this analysis • Minimal scale-free model – Binary links • Weighted links for other similarity relationships – Embedded words, morphology, phonographic – Undirected links • Directed links for asymmetric relationships – cat-catfish versus catfish-cat 51 Limitations of this analysis • Minimal scale-free model – Preferential attachment & fitness function () • Frequency and Recency of usage • Age 52 Advantages of this approach • Connects research of complex cognitive systems to research of other (biological, technological, social) complex systems. – Speech is (not so) special 53 Advantages of this approach • Common framework for theories of language evolution, development, adult processing, aging, and disordered populations (topological robustness). 54 Advantages of this approach • Perception versus Production • Degree versus Clustering Coefficient • Cross-linguistic analyses 55 Acknowledgements • • • • • • NIH-NIDCD R03 DC 04259 & R01 DC 006472 Members of the Child Language Doctoral Program Members of the Spoken Language Laboratory Holly Storkel John Colombo Ed Auer 56