24.964 Fall 2004 Modeling phonological learning A. Albright 7 Oct, 2004 Class 5: Refined statistical models for phonotactic probability (1) (Virtually) no restrictions on initial CV sequences: Vowel [i] [I] [e] [E] [æ] [u] [U] [o] [O] [2] [a] [aI] [aU] [OI] [ju] /p/ peel pick pale pen pan pool put poke Paul puff pot pine pout poise puke /t/ teal tick tale ten tan tool took toke tall tough tot tine tout toys — /k/ keel kick kale Ken can cool cook coke call cuff cot kine cow coin cute (2) Relatively more restrictions on VC combinations: Vowel [i] [I] [e] [E] [æ] [u] [U] [o] [O] [2] [a] [aI] [aU] [OI] [ju] /p/ leap lip rape pep rap coop — soap — cup top ripe — — — /t/ neat lit rate pet rat coot put coat taught cut tot right bout (a)droit butte /k/ leek lick rake peck rack kook book soak walk tuck lock like — — puke And compare also voiced: Vowel [i] [I] [e] [E] [æ] [u] [U] [o] [O] [2] [a] [aI] [aU] [OI] [ju] /b/ grebe bib babe Deb tab tube — robe daub rub cob bribe — — cube /d/ lead bid fade bed tad food could road laud bud cod ride loud void feud /g/ league big vague beg tag — — rogue log rug cog — — — fugue 24.964 Modeling phonological learning—7 Oct, 2004 p. 2 (3) CV co­occurrence for voiced stops Vowel [i] [I] [e] [E] [æ] [u] [U] [o] [O] [2] [a] [aI] [aU] [OI] [ju] /b/ beep bin bait bet back boon book boat ball bun bot buy bout boy butte /d/ deep din date deck Dan dune — dote doll done dot dine doubt doi(ly) — /g/ geek gill gait get gap goon good goat gall gun got guy gout goi(ter) (ar)gue And after sonorants: Vowel [i] [I] [e] [E] [æ] [u] [U] [o] [O] [2] [a] [aI] [aU] [OI] /m/ meat mitt mate met mat moot Muslim moat moss mutt mock mine mouse moist /n/ neat nip Nate net nap newt nook note naught nut knock nine now noise /N/ — — — — — — — — — — — — — — /l/ leap lip late let lap lute look lope log luck lock line lout loin /r/ reap rip rate wreck rap route rook rope Ross rut rock rhyme route Roy /w/ weep whip wait wet wax woo wood woke walk what wand whine wound [ju] /j/ yeast yip yay yet yak you Europe yoke yawn young yard — (yowl) — (yoink) (4) Kessler & Treiman (1997) Pearson’s χ2 : tests whether relative frequencies of events match predicted (theoretical) frequen­ cies • In this case: is observed onset/coda asymmetry significantly different from the predicted (equal) distribution? [k] Observed Predicted Onset 148 181 Coda 214 181 (5) Calculation of χ2 : � (Observed−Expected)2 χ2 = Expected So for the [k] example: (148−181)2 181 + (214−181)2 181 =2× 332 181 = 12.033 (Incidentally: for most uses, Fisher’s Exact Test is actually a more honest test) 24.964 Modeling phonological learning—7 Oct, 2004 p. 3 (6) Nosofsky’s GCM: Similarity of i to existing items j = � e−D· d i,j Where • di,j = “psychological distance” between i and j • D is a parameter (set to 1 or 2) • e = 2.718281828 (7) Bailey and Hahn (2001): Adapting the GCM for neighborhood effects • Similarity of items di,j intuitively related to how differences they have – How many of their phonemes differ (cat,cap > cat,tap) – How important those differences are (cat, cap > cat, cup) • Use string edit distance algorithm to calculate how many modifications are needed to trans­ form one word into the other • Use method devised by Broe (1993), Frisch (1996), and Frisch, Broe and Pierrehumbert (1997) to weight the relative cost of different modifications based on the similarity of the segments involved • Also, want to let token frequency plays a role, but in a complex way: not only are low fre­ quency words less important, but very high frequency words are also ignored – Implementation: add a quadratic weighting term, to allow greater influence of mid­range items (parabola­shaped function) � Similarity of i = (Af j 2 + Bf j + C) · e−D·d i,j