Class

advertisement
24.964 Fall 2004
Modeling phonological learning
A. Albright
7 Oct, 2004
Class 5: Refined statistical models for phonotactic probability
(1) (Virtually) no restrictions on initial CV sequences:
Vowel
[i]
[I]
[e]
[E]
[æ]
[u]
[U]
[o]
[O]
[2]
[a]
[aI]
[aU]
[OI]
[ju]
/p/
peel
pick
pale
pen
pan
pool
put
poke
Paul
puff
pot
pine
pout
poise
puke
/t/
teal
tick
tale
ten
tan
tool
took
toke
tall
tough
tot
tine
tout
toys
—
/k/
keel
kick
kale
Ken
can
cool
cook
coke
call
cuff
cot
kine
cow
coin
cute
(2) Relatively more restrictions on VC combinations:
Vowel
[i]
[I]
[e]
[E]
[æ]
[u]
[U]
[o]
[O]
[2]
[a]
[aI]
[aU]
[OI]
[ju]
/p/
leap
lip
rape
pep
rap
coop
—
soap
—
cup
top
ripe
—
—
—
/t/
neat
lit
rate
pet
rat
coot
put
coat
taught
cut
tot
right
bout
(a)droit
butte
/k/
leek
lick
rake
peck
rack
kook
book
soak
walk
tuck
lock
like
—
—
puke
And compare also voiced:
Vowel
[i]
[I]
[e]
[E]
[æ]
[u]
[U]
[o]
[O]
[2]
[a]
[aI]
[aU]
[OI]
[ju]
/b/
grebe
bib
babe
Deb
tab
tube
—
robe
daub
rub
cob
bribe
—
—
cube
/d/
lead
bid
fade
bed
tad
food
could
road
laud
bud
cod
ride
loud
void
feud
/g/
league
big
vague
beg
tag
—
—
rogue
log
rug
cog
—
—
—
fugue
24.964 Modeling phonological learning—7 Oct, 2004
p. 2
(3) CV co­occurrence for voiced stops
Vowel
[i]
[I]
[e]
[E]
[æ]
[u]
[U]
[o]
[O]
[2]
[a]
[aI]
[aU]
[OI]
[ju]
/b/
beep
bin
bait
bet
back
boon
book
boat
ball
bun
bot
buy
bout
boy
butte
/d/
deep
din
date
deck
Dan
dune
—
dote
doll
done
dot
dine
doubt
doi(ly)
—
/g/
geek
gill
gait
get
gap
goon
good
goat
gall
gun
got
guy
gout
goi(ter)
(ar)gue
And after sonorants:
Vowel
[i]
[I]
[e]
[E]
[æ]
[u]
[U]
[o]
[O]
[2]
[a]
[aI]
[aU]
[OI]
/m/
meat
mitt
mate
met
mat
moot
Muslim
moat
moss
mutt
mock
mine
mouse
moist
/n/
neat
nip
Nate
net
nap
newt
nook
note
naught
nut
knock
nine
now
noise
/N/
—
—
—
—
—
—
—
—
—
—
—
—
—
—
/l/
leap
lip
late
let
lap
lute
look
lope
log
luck
lock
line
lout
loin
/r/
reap
rip
rate
wreck
rap
route
rook
rope
Ross
rut
rock
rhyme
route
Roy
/w/
weep
whip
wait
wet
wax
woo
wood
woke
walk
what
wand
whine
wound
[ju]
/j/
yeast
yip
yay
yet
yak
you
Europe
yoke
yawn
young
yard
—
(yowl)
— (yoink)
(4) Kessler & Treiman (1997)
Pearson’s χ2 : tests whether relative frequencies of events match predicted (theoretical) frequen­
cies
• In this case: is observed onset/coda asymmetry significantly different from the predicted
(equal) distribution?
[k]
Observed
Predicted
Onset
148
181
Coda
214
181
(5) Calculation of χ2 :
� (Observed−Expected)2
χ2 =
Expected
So for the [k] example:
(148−181)2
181
+
(214−181)2
181
=2×
332
181
= 12.033
(Incidentally: for most uses, Fisher’s Exact Test is actually a more honest test)
24.964 Modeling phonological learning—7 Oct, 2004
p. 3
(6) Nosofsky’s GCM:
Similarity of i to existing items j =
�
e−D·
d i,j
Where
• di,j = “psychological distance” between i and j
• D is a parameter (set to 1 or 2)
• e = 2.718281828
(7) Bailey and Hahn (2001): Adapting the GCM for neighborhood effects
• Similarity of items di,j intuitively related to how differences they have
– How many of their phonemes differ (cat,cap > cat,tap)
– How important those differences are (cat, cap > cat, cup)
• Use string edit distance algorithm to calculate how many modifications are needed to trans­
form one word into the other
• Use method devised by Broe (1993), Frisch (1996), and Frisch, Broe and Pierrehumbert (1997)
to weight the relative cost of different modifications based on the similarity of the segments
involved
• Also, want to let token frequency plays a role, but in a complex way: not only are low fre­
quency words less important, but very high frequency words are also ignored
– Implementation: add a quadratic weighting term, to allow greater influence of mid­range
items (parabola­shaped function)
�
Similarity of i = (Af j 2 + Bf j + C) · e−D·d i,j
Download