Secretary report

advertisement
The North American
Computational Linguistics
Olympiad (NACLO)
Modified version of the
presentation given by Lori Levin
and Dragomir Radev in June 2008
Outline
• Background and History
• Pedagogical goals and high school outreach
– Audience participation – sample problems
•
•
•
•
•
•
•
Organization
Making a NACLO problem book
Running the contest and scoring
Preparing for the ILO
Outcomes
Plans
Discussion of problem ideas
Background and History
What is NACLO
• A high school contest in linguistics,
computational linguistics, and language
technologies.
• No prerequisites
– Does not require knowledge of specific
human languages, advanced math, or
computer programming
• http://www.naclo.cs.cmu.edu
Goal
• To increase participation and diversity in
linguistics, language technologies, and other
language-related careers by introducing
linguistics and computational linguistics to high
school students.
• Easy problems – everyone has fun; everyone
learns something. Students will be encouraged
to study linguistics, languages, and
computational linguistics.
• Hard problems – talent search for our next
generation of colleagues.
• Unlike the ILO and other national LOs, additional
focus on computational/formal problems
Sponsors
•
•
•
•
•
•
•
•
•
•
US National Science Foundation
Google
NAACL
Cambridge University Press
M*Modal
Vivisimo
Just Systems Evans Research
Leonard Gelfand center for outreach, CMU
Powerset
Donations from individuals
History of Linguistics Olympiads
• Similar to the 11 other “science olympiads” – some of
which are extremely popular, e.g., in math (90 countries),
physics, informatics (programming), biology, philosophy,
etc.
• Moscow: 1960’s -• Bulgaria: 1980’s -• A number of other countries
• Linguistic Challenge
– Thomas Payne, Eugene, Oregon, 1998-2000
• International Linguistics Olympiad:
– 2009: the 7th ILO
Pedagogical Goals and High
School Outreach
Contacting high schools
• Different procedure in each city
• Materials available
What we want high school students to learn
• Linguistics:
– English has rules that you are not aware of.
• Fan-blooming-tastic
• *Fantas-bloomin-tic
– There is methodology for discovering the rules.
– You can use the methodology to discover rules in
languages that you don’t speak.
• Computer Science
– (Dykstra) is no more about computers than astronomy
is about telescopes.
•
•
•
•
•
Algorithmic thinking
Abstraction in representing a problem space
Search and reduction of the problem space
Evaluation of the solution
Etc.
Fish Story
Aymara is a South American language
spoken by more than 2 million people in
the area around Lake Titicaca, which, at
12,507 feet above sea level, is the highest
navigable lake in the world. Among the
speakers of Aymara are the Uros, a fishing
people who live on artificial islands,
woven from reeds, that float on the
surface of Lake Titicaca.
a.
b.
c.
d.
Below, four fishermen describe their catch. Who caught what?
___ 1. “Mä challwalla mä hach’a challwampiwa challwataxa.”
___ 2. “Paya hach’a challwawa challwataxa.”
___ 3. “Mä hach’a challwa kimsa challwallampiwa challwataxa.”
___ 4. “Mä hach’a challwawa challwataxa.”
Also, watch out! One of the fishermen is lying.
a.
b.
c.
d.
Below, four fishermen describe their catch. Who caught what?
___ 1. “Mä challwa+lla mä hach’a challwa+mpi+wa challwataxa.”
___ 2. “Paya hach’a challwa+wa challwataxa.”
___ 3. “Mä hach’a challwa kimsa challwa+lla+mpi+wa challwataxa.”
___ 4. “Mä hach’a challwa+wa challwataxa.”
Also, watch out! One of the fishermen is lying.
Fish Story Solution
•
•
•
•
•
•
•
challwataxa
-wa
-mpi
maa
Hach’a
-lla
kimsa
•
•
•
•
1, d
2, b
3, c
4, a
(lying about size)
I caught
accusative
and
one
big
small
three
Organization
Organization
•
•
•
•
•
•
•
•
Co-chairs: Lori Levin and Thomas Payne
Program chair and head coach: Dragomir Radev
Sponsorship chair: James Pustejovsky
Web: Eugene Fink, Ida Mayer, Justin Brown
High school liaison: Amy Troyani
Publicity and outreach chair:
Local chairs of host sites
Many volunteers at each site
Making a NACLO Problem Book
Problem committee
• 2007: Emily Bender, John Blatz, Ivan Derzhanski, Jason
Eisner, Eugene Fink, Boris Iomdin, Mahesh Joshi,
Anagha Kulkarni, Will Lewis, Patrick Littell, Ruslan
Mitkov, Thomas Payne, James Pustejovsky, Roy
Tromble, and Dragomir Radev (chair).
• 2008: Emily Bender, Eric Breck, Lauren Collister,
Eugene Fink, Adam Hesterberg, Joshua Katz, Stacy
Kurnikova, Lori Levin, Will Lewis, Patrick Littell, David
Mortensen, Barbara Partee, Thomas Payne, James
Pustejovsky, Richard Sproat, Todor Tchervenkov, and
Dragomir Radev (chair).
• 2009: + Harold Somers, Xiaojin Zhu, Bozhidar
Bozhanov, Kate Spriggs, others…
Problem submissions
• Call for problems is issued several months
before the contest.
• Anyone may submit a problem.
• 2007 (more than 30 submissions)
– Some were reserved for practice
• 2008 (more than 40 submissions)
• 2009 (15 so far + 15 ideas)
Making a problem
•
•
•
•
•
Submit an idea to the problem committee
Draft the problem
Review by the problem committee
Test on high school and college students
Refine and re-submit
2007
A. English (Molistic): clustering, semantics
B. English (Encyclopedia): information retrieval
C. Ancient Greek: grammar
D. Hmong: writing system, grammar
E. English: verb morphology
F. English: spelling correction, OCR
G. Huishu: phonology
H. English (Garden Path): psycholinguistics, sentence processing,
2008 (A-E Open; F-L Invitational)
A. Apinaye (Brazil): grammar
B. Hindi: word sense disambiguation, alignment
C. Ilocano (Philippines):
D. Swedish and Norwegian: translation, alignment
E. Aymara (South America): grammar
F. Japanese: phonology, semantics
G. Manam Pile (Papua New Guinea): cultural concepts
H. English: stemming
I. Rotokas (Bougainville Island): automata, phonology
J. Irish (Place names): semantics,
K. Mayan: calendar systems
L. English: spectrograms
Running a contest and scoring
Contest procedures
• Email contest book to the host sites
– Each site prints and copies
• On the contest day:
– Different start time for each time zone
– Questions can be answered only by the jury
by email
Scoring
• One person is in charge of scoring each
problem.
– Alone or with a team of scorers
• Scoring rubric
– Components of the solution and how many
points to assign to each component
– Practice (correct solution): about 40%
– Theory (explanation): about 60%
Preparing for the ILO
Preparation for the ILO
• Online and offline training
• Live meetings
• Lectures
Outcomes
Diversity
• About half of the participants in NACLO were girls in
2007 and 2008. In 2007, 25 out of the top 50 students
were female.
• The two US teams that went to the ILO in 2007 included
three girls, out of eight total team members (two teams
of four). The 2008 teams include only one girl
• Some random statistics: (a) of the top 20 students in
2008, 14 are from public schools, (b) 26 states, 3
Canadian provinces, and the District of Columbia were
represented in 2008
• Canada participated for the first time in 2008 (about 20
students from Toronto, a handful from Ottawa and one
from Vancouver). Two students did really well at the
2008 Open (one ranked second and two tied for 13th)
but were not in the top 20 at the Invitational.
Outcomes and lessons learned
•
•
•
•
•
•
Tremendous interest among students
A number of clubs started
A large amount of positive feedback
Australian contest
Success at the ILO
A huge team effort
NACLO 2007
• 195 participants
• 3 university sites
– CMU
– Cornell
– Brandeis
• 20 high school sites
2007 winners
1. Rachel Zax, Ithaca, NY
2. Ryan Musa, Ithaca, NY
3. Adam Hesterberg, Seattle, WA
4. Jeffrey Lim, Arlington, MA
5. (tie) Rebecca Jacobs, Encino, CA
5. (tie) Michael Gottlieb, Tarrytown, NY
7. (tie) Mitha Nandagopalan, San Jose, CA
7. (tie) Josh Falk, Pittsburgh, PA
Alternate. Anna Tchetchetkine, San Jose, CA
ILO 2007
• Held in Russia (St. Petersburg)
• Two rounds: team and individual
• Problems
– Turkish/Tatar
– Braille
– Ndom (Papua New Guinea)
– Movima (Bolivia)
– Georgian (Caucasus)
– Hawaiian
ILO 2007
• Team contest (tied for first place):
– USA Team 2: Rebecca Jacobs, Anna
Tchetchetkine, Josh Falk, and Michael
Gottlieb
– Rebecca and Josh are on the 2008 team
• Individual contest (highest score):
– Adam Hesterberg
– Now at Princeton
NACLO 2008
• 763 participants
• Top 115 were invited to the second round
• 13 university sites
– CMU, Cornell, Brandeis
– Penn, Columbia, Michigan, Wisconsin, Illinois
– Oregon, MTSU, SJSU
– Ottawa, Toronto
• 65 high school sites
2008 NACLO winners
1. Guy Tabachnick, New York, NY
2. Jeffrey Lim, Arlington, MA
3. Josh Falk, Pittsburgh, PA
4. Anand Natarajan, San Jose, CA
5. Jae-Kyu Lee, Andover, MA
6. Rebecca Jacobs, Encino, CA
7. Hanzhi Zhu, Shrewsbury, MA
8. Morris Alper, San Jose, CA
ILO 2008
• In Bulgaria
• August 4-8 2008
• Problems in:
– Drehu, Cemuhí, Micmac, Old Norse, Chinese
dialects
ILO 2008
• Many medals: two team golds
• 1 individual gold: Hanzhi Zhu
• 2 individual silvers: Morris Alper and
Anand Natarajan
• 3 individual bronzes: Guy Tabachnick,
Rebecca Jacobs, and Jeffrey Lim
NACLO 2009
• 21 university sites signed up (new ones in
Seattle, Vancouver, Dallas, Memphis,
Washington, Baltimore, Lethbridge, Great
Falls, Mankato, Princeton).
Future ILOs
• 2009 in Poland
• 2010 in Sweden?
• 2011 in the US?
Plans
• SGER for improving computational problem
types.
• Automated scoring
• Reach out to the endangered languages
community (note that the ILO avoids very
common languages but we still have a long tail
of 6,000+ languages to work with)
• Interactive on-line problems
• Fundraising
• Become a non-profit organization
• Staying in touch with our students
Acknowledgments
•
•
•
•
We want to thank everyone who helped turn NACLO into a successful event.
Specifically, Amy Troyani from Taylor Allderdice High School in Pittsburgh, Mary
Jo Bensasi of CMU, all problem writers and graders (which include the PC listed
above as well as Rahel Ringger and Julia Workman) and all local contest
organizers (James Pustejovsky, Lillian Lee, Claire Cardie, Mitch Marcus, Kathy
McKeown, Barry Schiffman, Lori Levin, Catherine Arnott Smith, Richard Sproat,
Roxana Girju, Steve Abney, Sally Thomason, Aleka Blackwell, Roula Svorou,
Thomas Payne, Stan Szpakowicz, Diana Inkpen, Elaine Gold). James
Pustejovsky was also the sponsorship chair, with help from Paula Chesley. Ankit
Srivastava, Ronnie Sim and Willie Costello co-wrote some of the problems with
members of the PC. Eugene Fink helped with the solutions booklets, Justin
Brown worked on the web site, and Adam Hesterberg was an invaluable member
of the team throughout.
Other people who deserve our gratitude include Cheryl Hickey, Alina Johnson,
Patti Kardia, Josh Cannon, Christina Hunt, Jennifer Wofford, and Cindy
Robinson. Finally, NACLO couldn’t have happened without the leadership and
funding provided by NSF and Tanya Korelsky in particular as well as the
generous sponsorship from Google, Cambridge University Press, and the North
American Chapter of the ACL (NAACL) and our other sponsors.
The authors of this paper are also thankful to Martha Palmer for giving us
feedback on an earlier draft.
NACLO was partially funded by the National Science Foundation under grant IIS
0633871 Planning Workshop for a Computational Linguistics Olympiad.
Join us in preparing NACLO
2009!
Download