A Report on the First Native Language
Identification Shared Task
Joel Tetreault
Daniel Blanchard
Aoife Cahill
Nuance Communications
Educational Testing Service
Educational Testing Service
Task of automatically identifying a speaker’s first language based solely on the speaker’s writing in another language
Applications:
◦ Authorship profiling (Estival et al., 2007)
◦ Education: more targeted feedback to language learners (Leacock et al., 2010)
No risk no fun I agree the statement "Successful people try new things and take risk".In my mind it is so, to. When you thing you like do new stuff you need a liddelbit the kick. That is the big point what I need. For exsample I like to go to a big city like New York. I was never in this town I dont no from the city. But I like go to the city. Thats fun I stay every time for proplems. I need eat a hood offer my head. The ather side I can go dow. I dont gat waht I need…Next exsample the wall street you put money in funds, well you this make a good job. Dont for get the risk look like lose money.
German
For example, if you take a look at an ordinary school, you have different teachers for every subject. Your calculus teacher is different than your literature teacher. Each teacher must specialize in a specific subject in order to convey suffiecient and proper information to the students. However, that doesn't mean that the teacher is narrow-minded and has a limited perspective in life because to specialize in one subject doesn't hinder you or stop you from exploring other subjects.
Arabic
1.
Lots of work in NLI but…it has been hard to compare different approaches:
ICLEv2 (Granger et al, 2009): de facto train/test data is small and has NLIunfriendly idiosyncrasies
2.
No consensus on evaluation:
Which L1’s / how many L1’s?
Train/test splits?
Best features?
Goal to unify community and help field progress
Provide a larger, more NLI-friendly corpus that improves upon ICLEv2
Common evaluation framework
◦ Everyone evaluates using same train/dev/test splits and same L1s
Corpus and scripts to be made public to further promote the field
Prior Work
Data
Shared Task Overview
Results
NLI Shared Task in the Future
Treat NLI as a classification task
Koppel et al. (2005): POS n-grams, content and function words, spelling and grammatical errors
Syntactic features (Wong and Dras, 2011)
Tree Substitution Grammars (Swanson and
Charniak, 2012)
Adaptor Grammars (Wong et al., 2012)
Data Size Effects (Brooke and Hirst, 2012)
Word n-grams (Bykh and Meurers, 2012):
LMs and Ensemble Classifiers (Tetreault et al.,
2012)
12,100 essays from the ETS Test of English as a Foreign Language (TOEFL)
11 L1s:
◦ Arabic, Chinese French, German, Hindi, Italian,
Japanese, Korean, Spanish, Telugu, Turkish
◦ 900 train / 100 dev / 100 test
Sampled for equal representation of L1s across topics as much as possible
Includes 3-tier proficiency level
Public release via LDC this summer?
1.
2.
3.
Closed-Training: 11-way classification task using only TOEFL11-TRAIN and DEV
Open-Training-1: use of any amount or type of training data excluding TOEFL11
Open-Training-2: use of any amount or type of training data combined with
TOEFL11
* All sub-tasks use TOEFL11-TEST for the final evaluation set
Each team allowed to submit up to 5 different systems per task
Teams submitted a CSV file for each system to NLI Organizers
Evaluation script automatically compares each prediction file to gold standard and creates performance report and contingency tables
Bobicev Eurac
Chonger
CMU-Haifa
HAUTCS
ItaliaNLP
Cologne-Nijmegen Jarvis
CoRAL Lab @ UAB Kyle et al.
CUNI (Charles
University)
LIMSI
Cywu
Dartmouth
LTRC IIIT
Hyderabad
Michigan
MITRE “Carnie” UKP
MQ
NAIST
NRC
Oslo NLI
Toronto
Unibuc
UNT
UTD
VTEX
Tuebingen
Ualberta
Sub-task
Closed
Open-1
Open-2
# Teams Competing
29
3
4
# Submissions
116
13
15
See Table 3 of Report for full results
No statistically significant differences between top 5 teams
Team Name
Jarvis
Oslo NLI
Unibuc
MITRE “Carnie”
Tuebingen
Abbreviation
JAR
OSL
BUC
CAR
TUE
Overall Accuracy
0.836
0.834
0.827
0.826
0.822
Challenge : finding new data to cover each L1
Corpus Description
ICLE
FCE
ICNALE
Lang8
All L1s except ARA, HIN, TEL
All L1s except ARA , HIN, TEL
CHI, JPN, KOR essays only
All L1s, but mostly Asian L1s
Data sources for HIN & TEL:
◦ ICNALE Pakistani essays HIN (TUE team)
◦ Bilingual blogs (TOR & TUE team)
Machine Learning
◦ SVM overwhelmingly the most popular approach
◦ 4 teams also tried Ensemble classifiers
◦ String kernels (BUC) using character level ngrams
Features
◦ N-grams: word, POS, character, function
◦ Syntactic Features: Dependencies, TSG, CF
Productions, Adaptor Grammars
◦ Spelling Features
4 of top 5 teams used n-grams at least 4grams, some went up to 9-grams
2 of top 10 teams used syntactic features
Ideas to expand scope of task
◦ Use a new set of TOEFL essays for test
◦ Expand genres: blogs? Tweets?
◦ Number of L1s
◦ Do different L2
ItaliaNLP – preparing Italian NLI corpus with CNR Pisa
Also a corpus of Finnish with L1 (Turku Uni)
◦ Add slavic languages
Logistics
◦ Hold another shared task in 2014? Or 2015?
◦ Merge with PAN Shared Task?
Tell us your thoughts!
Derrick Higgins (ETS)
ETS TOEFL
Patrick Houghton (ETS)
BEA8 Organizers
All the NLI Participants!
nlisharedtask2013@gmail.com
http://www.nlisharedtask2013.org/