Modeling phonological variation

advertisement
Day 4 Classic OT


Although we’ve seen most of the ingredients
of OT, there’s one more big thing you need to
know to be able to read OT papers and listen
to OT talks
Constraints interact through strict ranking
instead of through weighting
Analogy: alphabetical order

Constraints
–
–
–
–
–
–
HaveEarly1stLetter
HaveEarly2ndLetter
HaveEarly3rdLetter
HaveEarly4thLetter
HaveEarly5thLetter
...
Harmonic grammar

Cabana wins because it does much better on
less-important constraints
banana
1st 2nd
w=5 w=4
-1
azalea
-25
azote
-25
cabana
-2
3rd
w=3
-13
-14
-1
4th
w=2
5th
w=1
-13
harm.
-57
-11
-4
-126
-19
-4
-184
-13
-26
Classic Optimality Theory

Strict ranking: all the candidates that aren’t the best
on the top constraint are eliminated
–
–
“!” means “eliminated here”
Shading on rest of row indicates it doesn’t matter how well
or poorly the candidate does on subsequent constraints
banana
azalea
azote
cabana
1st
1!
2nd
25
25
2!
3rd
13
14!
1
4th
5th
13
11
19
4
4
13
Classic Optimality Theory



Repeat the elimination for subsequent constraints
Here, the two remaining candidates tie (both are the
best), so we move to the next constraint
Winner(s) = the candidates that remain
banana
 azalea
azote
cabana
1st
1!
2nd
25
25
2!
3rd
13
14!
1
4th
5th
13
11
19
4
4
13
Example tableaux: find the winner
Constraint1
a.
*
b.
*
c.
*
C2
C3
C4
*
*
*
Example tableaux: find the winner
C1
a.
**
b.
*
c.
*
C2
C3
C4
*
*
*
Example tableaux: find the winner
C1
a.
C2
C3
C4
*
b.
*
c.
*
***
*
Example tableaux: find the winner
C1
C2
C3
a.
**
*
b.
**
*
c.
***
C4
*
“Harmonically bounded” candidates


A fancy term for candidates that can’t win under any ranking
Simple harmonic bounding: What can’t (c) win under any
ranking?
C2
C3
a.
*
b.
*
*
c.
**
*
C4
*
“Harmonically bounded” candidates

Joint harmonic bounding: What can’t (c) win
under any ranking?
C1
a.
**
b.
c.
C2
**
*
*
Why this matters for variation


“Multi-site” variation: more than one place in word
that can vary
Which candidates can win under some ranking?
/akitamiso/
*i Max-V
/akitamiso/
Max-V *i
a. [akitamiso]
a. [akitamiso] **
**
b. [aktamiso]
*
*
b. [aktamiso]
*
*
c. [akitamso]
*
*
c. [akitamso]
*
*
**
d. [aktamso]
**
d. [aktamso]
Why this matters for variation

Even if the ranking is allowed to vary,
candidates like (b) and (c) can never occur
/akitamiso/
*i Max-V
/akitamiso/
Max-V *i
a. [akitamiso]
a. [akitamiso] **
**
b. [aktamiso]
*
*
b. [aktamiso]
*
*
c. [akitamso]
*
*
c. [akitamso]
*
*
**
d. [aktamso]
**
d. [aktamso]
How about in MaxEnt?

Can (b) and (c) ever occur?
/akitamiso/
*i Max-V
a. [akitamiso] **
b. [aktamiso]
*
*
c. [akitamso]
*
*
d. [aktamso]
**
How about in Noisy Harmonic
Grammar?

Suppose the two constraints have the same
weight
/akitamiso/
*i Max-V
w=1 w=1
a. [akitamiso]
**
b. [aktamiso]
*
*
c. [akitamso]
*
*
d. [aktamso]
**
Special case in Noisy HG
/apataka/
a. [apataka]
b. [epataka]
c. [apetaka]
d. [apateka]
e. [apatake]
f. [epateka]
g. [epatake]
d. [apetake]
*aCa Ident(lo) harmony
w=a
w=b
***
-3a
wins (or
ties) if
a<½b
**
*
*
**
*
*
*
*
-2a-b
-a-b
-a-b
-2a-b
-a < b < 2a
a < b < 2a
--
*
**
**
**
-2b
-a-2b
-2b
b<a
-b<a
Summary for harmonic bounding

In OT, harmonically bounded candidates can never
win under any ranking
–



means that applying a change to one part of a word but not
another is impossible
In MaxEnt, all candidates have some probability of
winning.
In Noisy HG, harmonically bounded candidates can
win only in special cases.
See Jesney 2007 for a nice discussion of harmonic
bounding in weighted models.
Is it good or bad that (b) and (c) can’t
win in OT?
/akitamiso/
*i Max-V
a. [akitamiso] **
b. [aktamiso]
*
*
c. [akitamso]
*
*
d. [aktamso]

**
In my opinion, probably bad, because there are
several cases where candidates like (b) and (c) do
win...
French optional schwa deletion






There’s a long literature on this. See Riggle & Wilson
2005, Kaplan 2011 Kimper 2011 for references.
La queue de ce renard no deletion
La queue d’ ce renard some deletion
La queue de c’ renard some deletion
La queue de ce r’nard some deletion
La queue d’ ce r’nard as much deletion as
possible, without violating *CCC
Pima plural marking



Munro & Riggle 2004, Uto-Aztecan language of
Mexico, about 650 speakers [Lewis 2009].
Infixing reduplication marks plural.
In compounds, any combination of members can
reduplicate, as long as at least one does:
Singular: [ʔus-kàlit-váinom], lit. tree-car-knife ‘wagon-knife’
Plural options:
ʔuʔus-kàklit-vápainom ‘wagon-knives’
ʔuʔus-kàklit-váinom
ʔuʔus-kàlit-vápainom
ʔus-kàklit-vápainom
ʔuʔus-kàlit-váinom
ʔus-kàklit-váinom
ʔus-kàlit-vápainom
Simplest theory of variation in OT:
Anttila’s partial ranking (Anttila 1997)


Some constraints’ rankings are fixed; others vary
I’m using the red line here to indicate varying ranking
/θɪk/
Max-C Ident(place) *θ Ident(cont) *Dental
 a [θɪk]
*
 b [t̪ɪk]
c [ɪk]
d [sɪk]
*
*
*!
*!
*
Anttilan partial ranking
Max-C
Ident(place)
*θ
Ident(continuant)
*Dental
Linearization


In order to generate a form, the constraints have to be put into
a linear order
Each linear order consistent with the grammar’s partial order is
equally probable
grammar
Max-C
*θ Id(cont)
linearization 1 (50%)
Max-C
Ident(place)
*θ
Ident(cont)
*Dental
lineariztn 2 (50%)
Max-C
Ident(place)
Ident(cont)
*θ
*Dental
*Dental
 [t̪ɪk]
 [θɪk]
Id(place)
Properties of this theory


No learning algorithm, unfortunately
Makes strong predictions about variation
numbers:
–
–
If there are 2 constraints, what are the possible
Anttilan grammars?
What variation pattern does each one predict?
Finnish example (Anttila 1997)

The genitive suffix has two forms
–
–
“strong”: -iden/-iten (with additional changes)
“weak”: -(j)en
(data from p. 3)
Factors affecting variation

Anttila shows that choice is governed by...
–
–
avoiding sequence of heavies or lights (*HH, *LL)
avoiding high vowels in heavy syllables (*H/I) or low
vowels in light syllables (*L/A)
Anttila’s grammar (p. 21)
(Without going through the whole analysis)
Sample of the results (p. 23)
Day 4 summary



We’ve seen Classic OT, and a simple way to
capture variation in that theory
But there’s no learning algorithm available for
this theory, so its usefulness is limited
Also, predictions may be too restrictive
–
E.g. if there are 2 constraints, the candidates
must be distributed 100%-0%, 50%-50%, or 0%100%
Next time (our final day)


A theory of variation in OT that permits finergrained predictions, and has a learning
algorithm
Ways to deal with lexical variation
Day 4 references







Anttila, A. (1997). Deriving variation from grammar. In F. Hinskens, R.
van Hout, & W. L. Wetzels (Eds.), Variation, Change, and Phonological
Theory (pp. 35–68). Amsterdam: John Benjamins.
Jesney, K. (2007). The locus of variation in weighted constraint
grammars. In Workshop on Variatin, Gradience and Frequency in
Phonology. Presented at the Workshop on Variatin, Gradience and
Frequency in Phonology, Stanford University.
Kaplan, A. F. (2011). Variation Through Markedness Suppression.
Phonology, 28(03), 331–370. doi:10.1017/S0952675711000200
Kimper, W. A. (2011). Locality and globality in phonological variation.
Natural Language & Linguistic Theory, 29(2), 423–465.
doi:10.1007/s11049-011-9129-1
Lewis, M. P. (Ed.). (2009). Ethnologue: languages of the world (16th
ed.). Dallas, TX: SIL International.
Munro, P., & Riggle, J. (2004). Productivity and lexicalization in Pima
compounds. In Proceedings of BLS.
Riggle, J., & Wilson, C. (2005). Local optionality. In L. Bateman & C.
Ussery (Eds.), NELS 35.
Day 5: Before we start


Last time I promised to show you numbers
for multi-site variation in MaxEnt
If weights are equal:
/akitamiso/
*i
Max-V harmony
w= 1 w = 1
a. [akitamiso] **
e-2
prob.
0.25
b. [aktamiso]
*
*
e-2
0.25
c. [akitamso]
d. [aktamso]
*
*
**
e-2
e-2
0.25
0.25
Day 5: Before we start

As weights move apart, “compromise” candidates
remain more frequent than no-deletion candidate
/akitamiso/
*i
Max-V
w= 1 w = 2
a. [akitamiso] **
harmony
prob.
e-2 = 0.14
0.57
b. [aktamiso]
*
*
e-3 = 0.05
0.21
c. [akitamso]
d. [aktamso]
*
*
**
e-3 = 0.05
e-6 = 0.002
sum = 0.24
0.21
0.01
Stochastic OT


Today we’ll see a richer model of variation in
Classic (strict-ranking) OT.
But first, we need to discuss the concept of a
probability distribution
What is a probability distribution


It’s a function from possible outcomes (of
some random variable) to probabilities.
A simple example: flipping a fair coin
which side lands up
probabiliy
heads
0.5
tails
0.5
Rolling 2 dice
sum of 2 dice
probability
2 (1+1)
1/36
3 (1+2, 2+1)
2/36
4 (1+3, 2+2, 3+1)
3/36
5 (1+4, 2+3, 3+2, 4+1)
4/36
6 (1+5, 2+4, 3+3, 4+2, 5+1)
5/36
7 (1+6, 2+5, 3+4, 4+3, 5+2, 6+1)
6/36
8 (2+6, 3+5, 4+4, 5+3, 6+2)
5/36
9 (3+6, 4+5, 5+4, 6+3)
4/36
10 (4+6, 5+5, 6+4)
3/36
11 (5+6, 6+5)
2/36
12 (6+6)
1/36
Probability distributions over
grammars


One way to think about within-speaker
variation is that, at each moment, the
speaker has multiple grammars to choose
between.
This idea is often invoked in syntactic
variation (e.g., Yang 2010)
–
E.g., SVO order vs. verb-second order
Probability distributions over Classic
OT grammars

We could have a theory that allows any
probability distribution:
–
–
–
–
–
–

Max-C >> *θ >> Ident(continuant): 0.10 (t̪ɪn)
Max-C >> Ident(continuant) >> *θ: 0.50 (θɪn)
*θ >> Max-C >> Ident(continuant): 0.05 (t̪ɪn)
*θ >> Ident(continuant)>> Max-C: 0.20 (ɪn)
Ident(continuant) >> Max-C >> *θ: 0.05(θɪn)
Ident(continuant) >> *θ >> Max-C: 0 (ɪn)
The child has to learn a number for each
ranking (except one)
Probability distributions over Classic
OT grammars


But I haven’t seen any proposal like that in
phonology
Instead, the probability distributions are
usually constrained somehow
Anttilan partial ranking as a probability
distribution over Classic OT grammars
Id(place)
*θ
Id(cont)
means
 Id(place) >> *θ >> Id(cont): 50%
 Id(place) >> Id(cont) >> *θ: 50%
 *θ>> Id(place) >> Id(cont): 0%
 *θ>> Id(cont) >> Id(place): 0%
 Id(cont) >> *θ>> Id(place): 0%
 Id(cont) >> Id(place) >> *θ: 0%
A less-restrictive theory: Stochastic OT

Early version of the idea from Hayes & MacEachern
1998.
–
p. 43
Each constraint is associated with a range, and those
ranges also have fringes (margem), indicated by “?” or “??”
Stochastic OT


Each time you want to generate an output, choose one point
from each constraint’s range, then use a total ranking according
to those points.
This approach defines (though without precise quantification) a
probability distribution over constraint rankings.
Making it quantitative



Boersma 1997: the first theory to quantify ranking preference.
In the grammar, each constraint has a “ranking value”:
*θ
101
Ident(cont)
99
Every time a person speaks, they add a little noise to each of
these numbers
–



then rank the constraints according to the new numbers.
⇒ Go to demo [Day5_StochOT_Materials.xls]
Once again, this defines a probability distribution over
constraint rankings
An Anttilan grammar is a special case of a Stochastic OT
grammar
Boersma’s Gradual Learning Algorithm
for stochastic OT
1.
2.
3.
4.
5.
6.
Start out with both constraints’ ranking values at 100.
You hear an adult say something—suppose /θɪk/ →[θɪk]
You use your current ranking values to produce an output. Suppose it’s /θɪk/
→ [t̪ ɪk].
Your grammar produced the wrong result! (If the result was right, repeat from
Step 2)
Constraints that [θɪk] violates are ranked too low; constraints that [t̪ ɪk]
violates are too high.
So, promote and demote them, by some fixed amount (say 0.33 points)
/θɪk/
the adult said this [θɪk]
your grammar [t̪ɪk]
produced this
*θ
Ident(cont)
*
demote to 99.67
*
promote to 100.33
Gradual Learning Algorithm

demo (same Excel file, different worksheet)
Problems with the GLA for stochastic
OT


Unlike with MaxEnt grammars, the space is
not convex: there’s no guarantee that there
isn’t a better set of ranking values far away
from the current ones
And in any case, the GLA isn’t a “hillclimbing” algorithm. It doesn’t have a function
it’s trying to optimize, but just a procedure for
changing in response to data
Problems with GLA for stochastic OT

Pater 2008: constructed cases where some
constraints never stop getting promoted (or
demoted)
–

This means the grammar isn’t even converging to
a wrong solution—it’s not converging at all!
I’ve experienced this in appyling the
algorithm myself
Still, in many cases stochastic OT
works well

E.g., Boersma & Hayes 2001
–
–
–
Variation in Ilokano reduplication and metathesis
Variation in English light/dark /l/
Variation in Finnish genitives (as we saw last
time)
Type variation

All the theories of variation we’ve used so far predict
token variation
–
In this case, every theory wrongly predicts that both words
vary
/mão+s/
Ident(round)
mãos
mães
/pão+s/
*
*
Ident(round)
pãos
pães
*ãos
*ãos
*
*
Indexed constraints


Pater 2009, Becker 2009
Some constraints apply only to certain words
/mão+s/TypeA Ident(round)TypeA *ãos Ident(round)TypeB
mãos
mães
*
*!
/pão+s/TypeB Ident(round)TypeA *ãos Ident(round)TypeB
pãos
pães
*!
*
Indexed constraints

If the grammar is itself variable, we can have
some words whose behavior is variable
(Huback 2011 example)
/sidadão+s/TypeC Ident(round)TypeC
*ãos
weight: 100
weight: 98
sidadãos
*
sidadães
*
Where to go from here: R and
regression

Download R
–

Download Harald Baayen’s book Analyzing
Linguistic Data: A Practical INtroduction to Statistics
using R
–

www.r-project.org
www.ualberta.ca/~baayen/publications/baayenCUPstats.pdf
Work through the analyses in the book
–
Baayen gives all the R commands and lets you download
the data sets, so you can do the analyses in the book as
you read about them
Where to go: Optimality Theory

Read John McCarthy’s book Doing Optimality
Theory: Applying Theory to Data
–

If you enjoy that, read John McCarthy’s book
Optimality Theory: A Thematic Guide
–


A practical guide for actually doing OT
Goes into more theoretical depth
There is a book in Portuguese, João Costa’s 2001
Gramática, conflitos e violações. Introdução à Teoria
da Optimidade
Download OTSoft
–
–
www.linguistics.ucla.edu/people/hayes/otsoft
If you give it the candidates, constraints, and violations, it
will tell you the ranking
Where to go: Stochastic OT and
Gradual Learning Algorithm


Read Boersma & Hayes’s 2001 article
“Empirical tests of the Gradual Learning
Algorithm”
Download the data sets for the article and
play with them in OTSoft
–
–
–
www.fon.hum.uva.nl/paul/gla, under part 3
Try different GLA options
Try learning algorithms other than GLA
Where to go: Harmonic Grammar and
Noisy HG


Unfortunately, I don’t know of any friendly
introductions to these
Download OT-Help and try the examples
–
–
–
people.umass.edu/othelp/
The OT-Help manual might be the easiest-to-read
summary of Harmonic Grammar that exists!
Try the sample files
Where to go: MaxEnt


The original proposal to use MaxEnt for
phonology was Goldwater & Johnson 2003,
but it’s difficult to read
Andy Martin’s 2007 UCLA dissertation has an
easier-to-read introduction (chapter 4)
–

www.linguistics.ucla.edu/general/Dissertations/Ma
rtin_dissertationUCLA2007.pdf
You could try using OTSoft to fit a MaxEnt
model to the Boersma/Hayes data
Where to go: MaxEnt’s Gaussian prior

To use the prior (bias against changing weights from
default), download the MaxEnt Grammar Tool
–
–

www.linguistics.ucla.edu/people/hayes/MaxentGrammarTool
In addition to the usual OTSoft input file, you need to make a file
with mu and sigma2 for each constraint (there is a sample file)
Good examples to read of using the prior
–
–
Chapter 4 of Andy Martin’s dissertation
White & Hayes 2013 article, “Phonological naturalness and
phonotactic learning”
/www.linguistics.ucla.edu/people/grads/jwhite/documents/Hayes
WhitePhonologicalNaturalnessAndPhonotacticLearning.pdf
Where to go: lexical variation

Becker’s 2009 UMass dissertation, “Phonological
Trends in the Lexicon: The Role of Constraints”,
develops the lexical-indexing approach
–

www.phonologist.org/papers/becker_dissertation.pdf
Hayes & Londe’s 2006 paper “Stochastic
phonological knowledge: the case of Hungarian
vowel harmony” uses another approach (Zuraw’s
UseListed)
–
www.linguistics.ucla.edu/people/hayes/HungarianVH
Thanks for attending!


Stay in touch: kie@ucla.edu
Working on a phonology project (with or
without variation)? I’d be interested to read it.
Day 5 references





Becker, M. (2009). Phonological trends in the lexicon: the role of
constraints (Ph.D. dissertation). University of Massachusetts Amherst.
Boersma, P. (1997). How we learn variation, optionality, and
probability. Proceedings of the Institute of Phonetic Sciences of the
University of Amsterdam, 21, 43–58.
Boersma, P., & Hayes, B. (2001). Empirical tests of the gradual
learning algorithm. Linguistic Inquiry, 32, 45–86.
Goldwater, S., & Johnson, M. (2003). Learning OT Constraint
Rankings Using a Maximum Entropy Model. In J. Spenader, A.
Eriksson, & Ö. Dahl (Eds.), Proceedings of the Stockholm Workshop
on Variation within Optimality Theory (pp. 111–120). Stockholm:
Stockholm University.
Hayes, B., & Londe, Z. C. (2006). Stochastic Phonological Knowledge:
The Case of Hungarian Vowel Harmony. Phonology, 23(01), 59–104.
doi:10.1017/S0952675706000765
Day 5 references







Hayes, B., & MacEachern, M. (1998). Quatrain form in English folk
verse. Language, 64, 473–507.
Hayes, B., & White, J. (2013). Phonological Naturalness and
Phonotactic Learning. Linguistic Inquiry, 44(1), 45–75.
doi:10.1162/LING_a_00119
Huback, A. P. (2011). Irregular plurals in Brazilian Portuguese: An
exemplar model approach. Language Variation and Change, 23(02),
245–256. doi:10.1017/S0954394511000068
Martin, A. (2007). The evolving lexicon (Ph.D. Dissertation). University
of California, Los Angeles.
Pater, J. (2008). Gradual Learning and Convergence. Linguistic
Inquiry.
Pater, J. (2009). Morpheme-specific phonology: constraint indexation
and inconsistency resolution. In S. Parker (Ed.), Phonological
argumentation: essays on evidence and motivation. Equinox.
Yang, C. (2010). Three factors in language variation. Lingua, 120(5),
1160–1177. doi:10.1016/j.lingua.2008.09.015
Download