Detecting collocation errors in writing of non

advertisement
Automated Identification of
Preposition Errors
Joel Tetreault
Educational Testing Service
ECOLT
October 29, 2010
Outline
• Computational Linguistics (CL) and Natural
Language Processing (NLP)
• NLP at ETS (automated scoring)
• Automated Preposition Error Detection
Linguistics
D’oh!
Homer
talked to
Marge
Computational Linguistics
Want computers to
understand language
D’oh!
Homer
talked to
Marge
Computational Linguistics
D’oh!
Homer
talked to
Marge
9omer2
ta3lked
4 6arge1
Computational Linguistics vs. NLP
• Computational Linguistics (CL):
– Computers understanding language
– Modeling how people communicate
• Natural Language Processing (NLP):
– Applications on the computer side
– Natural: refers to languages spoken by people
(English, Swahili) vs. artificial languages (C++)
– Take CL theories and implement them into tools
• CL and NLP often conflated
Computational Linguistics Space
CL
• Computer Science: learning algorithms
• Linguistics: formal grammars
• Psychology: human processing modeling
Computational Linguistics Space
Artificial
Intelligence
CL
• Perfect speech recognition
• Perfect language understanding
• Perfect speech synthesis
• Perfect discourse modeling
• Intention Recognition
• World Knowledge
• (Vision)
Intelligent
Machines
Real World Applications of NLP
• Spelling and Grammar correction/detection
– MSWord, e-rater
• Machine Translation
– Google and Bing Translate
• Opinion Mining
– Extract sentiment of demographic from blogs and
social media
• Speech Recognition and Synthesis
• Automatic Document Summarization
NLP at ETS: Motivation
• Millions of GRE and TOEFL tests taken each
year
• Tests move to more natural assessment
– Fewer multiple choice questions
– Tests have essay component
• Problem:
– Thousands of raters required
– Costly and timely
NLP at ETS
• Use NLP techniques to automatically score
essays (e-rater)
• Other scoring tools which use NLP:
– Criterion: online writing feedback
– SpeechRater: automatic speaking assessment
– C-Rater: content scoring of short answers
– Plagiarism Detection
E-rater (Automated Essay Scoring)
• First deployed in 1999 for GMAT Writing
Assessment
• Operational for the GRE and TOEFL as well as
a collection of smaller assessments
• System Performance (5 point essay scale):
– E-rater/Human agreement: 75% exact, 98% exact
(+1 adjacent)
– Comparable to two humans
E-rater (Automated Essay Scoring)
• Massive collection of 50+ weighted features
organized into 5+ high level features
• Each feature is represented by a module:
– Simple: collection of manual rules and/or regular
expressions
– More complex: NLP (Natural Language Processing)
statistical system is behind the feature
• Combined using linear regression
E-rater Features
Grammar
Usage
• Sentence fragments, garbled words
• Subject-Verb Agreement: the motel are
…
• Verb form: They are need to distinguish
…
• Pronoun Errors: Them are my reasons …
•
•
•
•
Incorrect article/preposition
Confused Word: affect vs. effect
Faulty Comparison: It is more big
Double negatives: He don’t have no
candy.
E-rater Features
Mechanics
Style
Discourse
•
•
•
•
Spelling
Punctuation
Capitalization
Missing hyphens, apostrophes
• Sentence length, word repetition
• Passives
• Discourse sequences
• RST & Syntactic structures (contrast,
elaboration, antithesis, etc.)
How to Game the System 
• Word Salad Detector
“Quick The the over brown dogs fox.
Jumped. Lazy”
“Skfhdorla;sf[e’skas as,fr’r;/.,fkrasa”
• Unusually Short / Off-Topic Essays
“I don’t know how to explain this
question because I took a nap. Sorry.”
“I THINK EVERYONE SHOULD BE ABLE
TO WEAR WHATEVER THE HELL THEY
WANT TO WEAR.”
NLP for English Language Learners
• Increasing need for tools for instruction in
English as a Second Language (ESL)
– 300 million ESL learners in China alone
– 10% of US students learn English as a second
language
– Teachers now burdened with teaching classes
with wildly varying levels of English fluency
– Assessments for EFL Teacher Proficiency
NLP for English Language Learners
• Other Interest:
– Microsoft Research (ESL Assistant)
– Publishing/Assessment Companies (Cambridge, Oxford,
Pearson)
– Universities
Objective
• Research Goal: develop NLP tools to
automatically provide feedback to ESL
learners about grammatical errors
• Preposition Error Detection
– Selection Error (“They arrived to the town.”)
– Extraneous Use (“They came to outside.”)
– Omitted (“He is fond this book.”)
Motivation
• Preposition usage is one of the most difficult
aspects of English for non-native speakers
– [Dalgish ’85] – 18% of sentences from ESL essays
contain a preposition error
– Our data: 8-10% of all prepositions in TOEFL
essays are used incorrectly
Why are prepositions hard to master?
• Prepositions are problematic because they
can perform so many complex roles
– Preposition choice in an adjunct is constrained by
its object (“on Friday”, “at noon”)
– Prepositions are used to mark the arguments of a
predicate (“fond of beer.”)
– Phrasal Verbs (“give in to their demands.”)
• “give in”  “acquiesce, surrender”
Why are prepositions hard to master?
• Multiple prepositions can appear in the same
context:
“When the plant is horizontal, the force of the gravity causes the
sap to move __ the underside of the stem.”
Choices
•
•
•
•
to
on
toward
onto
Source
•
•
•
•
Writer
System
Rater 1
Rater 2
Preposition Error Detection
• In NLP: computer system learns from lots and
lots of data
• Training Phase: Create a “model” of the
problem area
– Face detection
– Credit Card Usage
– Translating from Chinese to English
• Testing Phase: Use model to classify new
cases
Baseball Feature Example
• Predict the outcome of the baseball game
• Look at all the games where both teams
played each other:
• For each game (event), use features:
– Win/loss records before game
– Home field advantage
– Players’ prior performance
• Train learning algorithm
Baseball Feature Example
Event
Winner
Location
Prior Isotopes
Win Streak
Prior Capital
City Win Streak
Game 1
Isotopes
Springfield
0
3
Game 2
Capital City
Springfield
4
0
Game 3
Capital City
Capital City
2
0
Game 4
Isotopes
Springfield
2
1
Building a Model of Preposition Usage
• Prepositions are influenced by:
– Words in the local context, and how they interact
with each other (lexical)
– Syntactic structure of context
– Semantic interpretation
• Get computer to understand correct usage:
– Encode these influences as “features”
– Train computer algorithm on millions of examples
of correct usage with the associated features
Deriving the Features
• Derived using NLP tools
• Tokenizing
– “He is fond of beer . ”
• Part-of-Speech Tagging
– “ He_PRP is_BE fond_VB of_PREP beer_NN ._.”
• Chunking / Parsing
– “ {NP He_PRP } {VP is_BE fond_VB } of_PREP {NP
beer_NN } ._.”
Feature Overview
• System uses a minimum of 25 features
– Lexical, syntactic, semantic sources
– Head words before and after preposition
– Words in the local context (+/- 2 words)
– Part of Speech (POS) of words above
– Combination Features
– Parse Features
Preposition Feature Example
1. He is fond of beer.
2. The train will arrive at the Springfield Station.
3. The car with the broken wheel is in the shop.
Event
Prep
Prior Verb
Prior Noun
Following Word POS of Following Word
Prep 1
of
fond
<none>
beer
NN
Prep 2
at
arrive
<none>
the
Det
Prep 3
with
<none>
car
the
Det
Flagging Errors
• Train learning algorithm on millions of events
 develop model (classifier)
• Testing (flagging errors)
– Derive features
– Replace writer’s preposition with all other
prepositions, classifier outputs score for each
preposition
– Compare top scoring preposition to score of
writer’s preposition
Thresholds
FLAG AS ERROR
100
90
80
70
60
50
40
30
20
10
0
of
in
at
by
with
“He is fond with beer”
Thresholds
FLAG AS OK
60
50
40
30
20
10
0
of
in
around
by
with
“My sister usually gets home by 3:00”
Performance
• Evaluation corpus of 5600 TOEFL essays (8200
prepositions)
– Each preposition manually annotated
• Recall = 0.19 ; Precision = 0.84
– 1/5 of errors are flagged
– 84% of flagged errors are indeed errors
• Precision > recall to reduce false positives
• State of the Art performance
Conclusions
• Presented an overview of:
– NLP
– NLP at ETS
– One feature (Prepositions) in e-rater
• Future Directions
– Use of large scale corpora (WWW)
– L1-specific models
– Train on error-annotated data
Plugs
• ETS/NLP Publications:
– http://ets.org/research/erater.html
• 5th Workshop on Innovative Use of NLP for
Educational Applications (NAACL-10)
– http://www.cs.rochester.edu/u/tetreaul/naacl-bea5.html
Plugs
• “Automated Grammatical Error Detection for
Language Learners”
– Leacock et al., 2010
– Synthesis Series
Thanks!
Joel Tetreault: JTetreault@ets.org
Download