slides

advertisement
The Practical Value of Statistics for
Sentence Generation:
The Perspective of the Nitrogen System
Irene Langkilde-Geary
How well do statistical n-grams
make linguistic decisions?
Subject-Verb Agreement
I am 2797
I are
47
I is
14
Singular vs Plural
their trust 28
their trusts 8
Article-Noun Agreement
a trust 394 an trust 0 the trust 1355
a trusts 2 an trusts 0 the trusts 115
Word Choice
reliance 567
reliances 0
trust 6100
trusts 1083
More Examples
Relative pronoun
visitor who 9 visitors who 20
visitor which 0 visitors which 0
visitor that 9 visitors that 14
Singular vs Plural
visitor 575 visitors 1083
Verb Tense
admire 212 admired 211
admires 107
Preposition
in Japan 5413 to Japan 1196
came to 2443
came in 1498
came into 244
arrived in 544
arrived to 35
arrived into 0
came to Japan 7 arrived to Japan 0
came into Jap 1 arrived into Japan 0
came in Japan 0 arrived in Japan 4
How can we get a computer to
learn by “reading”?
Nitrogen takes a two-step approach
1. Enumerate all possible expressions
2. Rank them in order of probabilistic
likelihood
Why two steps? They are independent.
Assigning probabilities
• Ngram model
Formula for bigrams:
P(S) = P(w1|START) * P(w2|w1) * … * P(w n|w n-2)
• Probabilistic syntax (current work)
– A variant of probabilistic parsing models
Sample Results of Bigram model
Random path: (out of a set of 11,664,000 semantically-related sentences)
Visitant which came into the place where it will be Japanese has admired that
there was Mount Fuji.
Top three:
Visitors who came in Japan admire Mount Fuji .
Visitors who came in Japan admires Mount Fuji .
Visitors who arrived in Japan admire Mount Fuji .
Strengths
• Reflects reality that 55% (Stolke et al. 1997) of dependencies are
binary, and between adjacent words
• Embeds linear ordering constraints
Limitations of Bigram model
Example
Reason
Visitors come in Japan.
He planned increase in sales.
A tourist who admire Mt. Fuji...
A dog eat/eats bone.
I cannot sell their trust.
The methods must be modified to
the circumstances.
A three-way dependency
Part-of-speech ambiguity
Long-distance dependency
Previously unseen ngrams
Nonsensical head-arg relationship
Improper subcat structure
Representation
of enumerated possibilities
(Easily on the order of 1015 to 1032 or more)
Issues
• List
• Lattice
• Forest
• space/time constraints
• redundancy
• localization of dependencies
• non-uniform weights of dependencies
Number of phrases versus size (in bytes)
for 15 sample inputs
Number of phrases versus time (in seconds)
for 15 sample inputs
Generating from Templates and
Meaning-based Inputs
INPUT  ( <label> <feature> VALUE )
VALUE  INPUT -OR- <label>
Labels are defined in:
1. input
2. user-defined lexicon
3. WordNet-based lexicon
(~ 100,000 concepts)
Example Input:
(a1 :template (a2 / “eat”
:agent YOU
:patient a3)
:filler (a3 / |poulet| ))
Mapping Rules
1.
Recast one input to another
–
2.
3.
(implicitly providing varying levels of abstraction)
Assign linear order to constituents
Add missing info to under-specified inputs
Matching Algorithm
•
Rule order determines priority. Generally:
–
–
–
–
Recasting < linear ordering < under-specification
High (more semantic) level of abstraction < low (more syntactic)
Distant position (adjuncts) from head < near (complements)
Basic properties < specialized
Recasting
(a1 :venue <venue>
:cusine <cuisine> )
(a2 / |serve|
:agent <venue>
:patient <cuisine> )
(a2 / |have the quality of being|
:domain (a3 / “food type”
:possessed-by <venue>)
:range (b1 / |cuisine|))
Recasting
(a1 :venue <venue>
:region <region> )
(a2 / |serve|
:agent <venue>
:patient <cuisine>
(a3 / |serve|
:voice active
:subject <venue>
:object <cuisine> )
(a3 / |serve|
:voice passive
:subject <cuisine>
:adjunct (b1 / <venue>
:anchor |BY| ))
Linear ordering
(a3 / |serve|
:voice active
:subject <venue>
:object <cuisine> )
<venue>
(a4 / |serve|
:voice active
:object <cuisine> )
Under-specification
(a4 / |serve|)
(a6 / |serve|
:cat noun)
(a5 / |serve|
:cat verb)
Under-specification
(a4 / |serve|)
(a5 / |serve|
:cat verb)
(a5 / |serve|
(a5 / |serve|
:cat verb
:tense present)
:cat verb
:tense past)
Core features currently recognized by
Nitrogen
Syntactic relations
:subject :object :dative :compl :pred :adjunct :anchor :pronoun :op :modal
:taxis :aspect :voice :article
Functional relations
:logical-sbj :logical-obj :logical-dat :obliq1 :obliq2 :obliq3 :obliq2-of :obliq3of :obliq1-of :attr :generalized-possesion :generalized-possesion-inverse
Semantic/Systemic Relations
:agent :patient :domain :domain-of :condition :consequence :reason
:compared-to :quant :purpose :exemplifier :spatial-locating :temporal-locating
:temporal-locating-of :during :destination :means :manner :role :role-of-agent
:source :role-of-patient :inclusive :accompanier :sans :time :name :ord
Dependency relations
:arg1 :arg2 :arg3 :arg1-of :arg2-of :arg3-of
Properties used by Nitrogen
:cat [nn, vv, jj, rb, etc.]
:polarity [+, -]
:number [sing, plural]
:tense [past, present]
:person [1s 2s 3s 1p 2p 3p s p all]
:mood [indicative, pres-part, past-part, infinitive, to-inf, imper]
How many grammar rules needed for
English?
 Constituent+
 Constituent+ OR Leaf
 Punc* FunctionWord* ContentWord
FunctionWord* Punc*
FunctionWord  ``and'' OR ``or'' OR ``to'' OR ``on'' OR ``is''
OR ``been'' OR ``the'' OR ….
ContentWord  Inflection(RootWord,Morph)
RootWord
 ``dog'' OR ``eat'' OR ``red'' OR ....
Sentence
Constituent
Leaf
Morph
 none OR plural OR third-person-singular ...
Computational Complexity
(x2/A2) + (y2/B2) = 1
???
Y
X
Advantages of a statistical approach
for symbolic generation module
•
•
•
•
•
Shifts focus from “grammatical” to “possible”
Significantly simplifies knowledge bases
Broadens coverage
Potentially improves quality of output
Dramatically reduces information demands on
client
• Greatly increases robustness
Download