slides

advertisement

A Report on the First Native Language

Identification Shared Task

Joel Tetreault

Daniel Blanchard

Aoife Cahill

Nuance Communications

Educational Testing Service

Educational Testing Service

Native Language Identification

Task of automatically identifying a speaker’s first language based solely on the speaker’s writing in another language

Applications:

◦ Authorship profiling (Estival et al., 2007)

◦ Education: more targeted feedback to language learners (Leacock et al., 2010)

Sample Essay 1

No risk no fun I agree the statement "Successful people try new things and take risk".In my mind it is so, to. When you thing you like do new stuff you need a liddelbit the kick. That is the big point what I need. For exsample I like to go to a big city like New York. I was never in this town I dont no from the city. But I like go to the city. Thats fun I stay every time for proplems. I need eat a hood offer my head. The ather side I can go dow. I dont gat waht I need…Next exsample the wall street you put money in funds, well you this make a good job. Dont for get the risk look like lose money.

German

Sample Essay 2

For example, if you take a look at an ordinary school, you have different teachers for every subject. Your calculus teacher is different than your literature teacher. Each teacher must specialize in a specific subject in order to convey suffiecient and proper information to the students. However, that doesn't mean that the teacher is narrow-minded and has a limited perspective in life because to specialize in one subject doesn't hinder you or stop you from exploring other subjects.

Arabic

Motivation

1.

Lots of work in NLI but…it has been hard to compare different approaches:

ICLEv2 (Granger et al, 2009): de facto train/test data is small and has NLIunfriendly idiosyncrasies

2.

No consensus on evaluation:

Which L1’s / how many L1’s?

Train/test splits?

Best features?

Contributions

Goal to unify community and help field progress

Provide a larger, more NLI-friendly corpus that improves upon ICLEv2

Common evaluation framework

◦ Everyone evaluates using same train/dev/test splits and same L1s

Corpus and scripts to be made public to further promote the field

Outline

Prior Work

Data

Shared Task Overview

Results

NLI Shared Task in the Future

Prior Work

Treat NLI as a classification task

Koppel et al. (2005): POS n-grams, content and function words, spelling and grammatical errors

Syntactic features (Wong and Dras, 2011)

Tree Substitution Grammars (Swanson and

Charniak, 2012)

Adaptor Grammars (Wong et al., 2012)

Data Size Effects (Brooke and Hirst, 2012)

Word n-grams (Bykh and Meurers, 2012):

LMs and Ensemble Classifiers (Tetreault et al.,

2012)

Data: TOEFL11 Corpus

12,100 essays from the ETS Test of English as a Foreign Language (TOEFL)

11 L1s:

◦ Arabic, Chinese French, German, Hindi, Italian,

Japanese, Korean, Spanish, Telugu, Turkish

◦ 900 train / 100 dev / 100 test

Sampled for equal representation of L1s across topics as much as possible

Includes 3-tier proficiency level

Public release via LDC this summer?

Shared Task Description: 3 Sub-tasks

1.

2.

3.

Closed-Training: 11-way classification task using only TOEFL11-TRAIN and DEV

Open-Training-1: use of any amount or type of training data excluding TOEFL11

Open-Training-2: use of any amount or type of training data combined with

TOEFL11

* All sub-tasks use TOEFL11-TEST for the final evaluation set

Shared Task Description

Each team allowed to submit up to 5 different systems per task

Teams submitted a CSV file for each system to NLI Organizers

Evaluation script automatically compares each prediction file to gold standard and creates performance report and contingency tables

29 Teams

Bobicev Eurac

Chonger

CMU-Haifa

HAUTCS

ItaliaNLP

Cologne-Nijmegen Jarvis

CoRAL Lab @ UAB Kyle et al.

CUNI (Charles

University)

LIMSI

Cywu

Dartmouth

LTRC IIIT

Hyderabad

Michigan

MITRE “Carnie” UKP

MQ

NAIST

NRC

Oslo NLI

Toronto

Unibuc

UNT

UTD

VTEX

Tuebingen

Ualberta

RESULTS

Sub-Task Participation Statistics

Sub-task

Closed

Open-1

Open-2

# Teams Competing

29

3

4

# Submissions

116

13

15

Closed Sub-Task

See Table 3 of Report for full results

No statistically significant differences between top 5 teams

Team Name

Jarvis

Oslo NLI

Unibuc

MITRE “Carnie”

Tuebingen

Abbreviation

JAR

OSL

BUC

CAR

TUE

Overall Accuracy

0.836

0.834

0.827

0.826

0.822

Open Sub-tasks

Challenge : finding new data to cover each L1

Corpus Description

ICLE

FCE

ICNALE

Lang8

All L1s except ARA, HIN, TEL

All L1s except ARA , HIN, TEL

CHI, JPN, KOR essays only

All L1s, but mostly Asian L1s

Data sources for HIN & TEL:

◦ ICNALE Pakistani essays  HIN (TUE team)

◦ Bilingual blogs (TOR & TUE team)

Discussion of Approaches

Machine Learning

◦ SVM overwhelmingly the most popular approach

◦ 4 teams also tried Ensemble classifiers

◦ String kernels (BUC) using character level ngrams

Discussion of Approaches

Features

◦ N-grams: word, POS, character, function

◦ Syntactic Features: Dependencies, TSG, CF

Productions, Adaptor Grammars

◦ Spelling Features

4 of top 5 teams used n-grams at least 4grams, some went up to 9-grams

2 of top 10 teams used syntactic features

Future of NLI Shared Task

Ideas to expand scope of task

◦ Use a new set of TOEFL essays for test

◦ Expand genres: blogs? Tweets?

◦ Number of L1s

◦ Do different L2

 ItaliaNLP – preparing Italian NLI corpus with CNR Pisa

 Also a corpus of Finnish with L1 (Turku Uni)

◦ Add slavic languages

Logistics

◦ Hold another shared task in 2014? Or 2015?

◦ Merge with PAN Shared Task?

Tell us your thoughts!

Acknowledgments

Derrick Higgins (ETS)

ETS TOEFL

Patrick Houghton (ETS)

BEA8 Organizers

All the NLI Participants!

Questions?

nlisharedtask2013@gmail.com

http://www.nlisharedtask2013.org/

Download