The North American Computational Linguistics Olympiad (NACLO) Modified version of the presentation given by Lori Levin and Dragomir Radev in June 2008 Outline • Background and History • Pedagogical goals and high school outreach – Audience participation – sample problems • • • • • • • Organization Making a NACLO problem book Running the contest and scoring Preparing for the ILO Outcomes Plans Discussion of problem ideas Background and History What is NACLO • A high school contest in linguistics, computational linguistics, and language technologies. • No prerequisites – Does not require knowledge of specific human languages, advanced math, or computer programming • http://www.naclo.cs.cmu.edu Goal • To increase participation and diversity in linguistics, language technologies, and other language-related careers by introducing linguistics and computational linguistics to high school students. • Easy problems – everyone has fun; everyone learns something. Students will be encouraged to study linguistics, languages, and computational linguistics. • Hard problems – talent search for our next generation of colleagues. • Unlike the ILO and other national LOs, additional focus on computational/formal problems Sponsors • • • • • • • • • • US National Science Foundation Google NAACL Cambridge University Press M*Modal Vivisimo Just Systems Evans Research Leonard Gelfand center for outreach, CMU Powerset Donations from individuals History of Linguistics Olympiads • Similar to the 11 other “science olympiads” – some of which are extremely popular, e.g., in math (90 countries), physics, informatics (programming), biology, philosophy, etc. • Moscow: 1960’s -• Bulgaria: 1980’s -• A number of other countries • Linguistic Challenge – Thomas Payne, Eugene, Oregon, 1998-2000 • International Linguistics Olympiad: – 2009: the 7th ILO Pedagogical Goals and High School Outreach Contacting high schools • Different procedure in each city • Materials available What we want high school students to learn • Linguistics: – English has rules that you are not aware of. • Fan-blooming-tastic • *Fantas-bloomin-tic – There is methodology for discovering the rules. – You can use the methodology to discover rules in languages that you don’t speak. • Computer Science – (Dykstra) is no more about computers than astronomy is about telescopes. • • • • • Algorithmic thinking Abstraction in representing a problem space Search and reduction of the problem space Evaluation of the solution Etc. Fish Story Aymara is a South American language spoken by more than 2 million people in the area around Lake Titicaca, which, at 12,507 feet above sea level, is the highest navigable lake in the world. Among the speakers of Aymara are the Uros, a fishing people who live on artificial islands, woven from reeds, that float on the surface of Lake Titicaca. a. b. c. d. Below, four fishermen describe their catch. Who caught what? ___ 1. “Mä challwalla mä hach’a challwampiwa challwataxa.” ___ 2. “Paya hach’a challwawa challwataxa.” ___ 3. “Mä hach’a challwa kimsa challwallampiwa challwataxa.” ___ 4. “Mä hach’a challwawa challwataxa.” Also, watch out! One of the fishermen is lying. a. b. c. d. Below, four fishermen describe their catch. Who caught what? ___ 1. “Mä challwa+lla mä hach’a challwa+mpi+wa challwataxa.” ___ 2. “Paya hach’a challwa+wa challwataxa.” ___ 3. “Mä hach’a challwa kimsa challwa+lla+mpi+wa challwataxa.” ___ 4. “Mä hach’a challwa+wa challwataxa.” Also, watch out! One of the fishermen is lying. Fish Story Solution • • • • • • • challwataxa -wa -mpi maa Hach’a -lla kimsa • • • • 1, d 2, b 3, c 4, a (lying about size) I caught accusative and one big small three Organization Organization • • • • • • • • Co-chairs: Lori Levin and Thomas Payne Program chair and head coach: Dragomir Radev Sponsorship chair: James Pustejovsky Web: Eugene Fink, Ida Mayer, Justin Brown High school liaison: Amy Troyani Publicity and outreach chair: Local chairs of host sites Many volunteers at each site Making a NACLO Problem Book Problem committee • 2007: Emily Bender, John Blatz, Ivan Derzhanski, Jason Eisner, Eugene Fink, Boris Iomdin, Mahesh Joshi, Anagha Kulkarni, Will Lewis, Patrick Littell, Ruslan Mitkov, Thomas Payne, James Pustejovsky, Roy Tromble, and Dragomir Radev (chair). • 2008: Emily Bender, Eric Breck, Lauren Collister, Eugene Fink, Adam Hesterberg, Joshua Katz, Stacy Kurnikova, Lori Levin, Will Lewis, Patrick Littell, David Mortensen, Barbara Partee, Thomas Payne, James Pustejovsky, Richard Sproat, Todor Tchervenkov, and Dragomir Radev (chair). • 2009: + Harold Somers, Xiaojin Zhu, Bozhidar Bozhanov, Kate Spriggs, others… Problem submissions • Call for problems is issued several months before the contest. • Anyone may submit a problem. • 2007 (more than 30 submissions) – Some were reserved for practice • 2008 (more than 40 submissions) • 2009 (15 so far + 15 ideas) Making a problem • • • • • Submit an idea to the problem committee Draft the problem Review by the problem committee Test on high school and college students Refine and re-submit 2007 A. English (Molistic): clustering, semantics B. English (Encyclopedia): information retrieval C. Ancient Greek: grammar D. Hmong: writing system, grammar E. English: verb morphology F. English: spelling correction, OCR G. Huishu: phonology H. English (Garden Path): psycholinguistics, sentence processing, 2008 (A-E Open; F-L Invitational) A. Apinaye (Brazil): grammar B. Hindi: word sense disambiguation, alignment C. Ilocano (Philippines): D. Swedish and Norwegian: translation, alignment E. Aymara (South America): grammar F. Japanese: phonology, semantics G. Manam Pile (Papua New Guinea): cultural concepts H. English: stemming I. Rotokas (Bougainville Island): automata, phonology J. Irish (Place names): semantics, K. Mayan: calendar systems L. English: spectrograms Running a contest and scoring Contest procedures • Email contest book to the host sites – Each site prints and copies • On the contest day: – Different start time for each time zone – Questions can be answered only by the jury by email Scoring • One person is in charge of scoring each problem. – Alone or with a team of scorers • Scoring rubric – Components of the solution and how many points to assign to each component – Practice (correct solution): about 40% – Theory (explanation): about 60% Preparing for the ILO Preparation for the ILO • Online and offline training • Live meetings • Lectures Outcomes Diversity • About half of the participants in NACLO were girls in 2007 and 2008. In 2007, 25 out of the top 50 students were female. • The two US teams that went to the ILO in 2007 included three girls, out of eight total team members (two teams of four). The 2008 teams include only one girl • Some random statistics: (a) of the top 20 students in 2008, 14 are from public schools, (b) 26 states, 3 Canadian provinces, and the District of Columbia were represented in 2008 • Canada participated for the first time in 2008 (about 20 students from Toronto, a handful from Ottawa and one from Vancouver). Two students did really well at the 2008 Open (one ranked second and two tied for 13th) but were not in the top 20 at the Invitational. Outcomes and lessons learned • • • • • • Tremendous interest among students A number of clubs started A large amount of positive feedback Australian contest Success at the ILO A huge team effort NACLO 2007 • 195 participants • 3 university sites – CMU – Cornell – Brandeis • 20 high school sites 2007 winners 1. Rachel Zax, Ithaca, NY 2. Ryan Musa, Ithaca, NY 3. Adam Hesterberg, Seattle, WA 4. Jeffrey Lim, Arlington, MA 5. (tie) Rebecca Jacobs, Encino, CA 5. (tie) Michael Gottlieb, Tarrytown, NY 7. (tie) Mitha Nandagopalan, San Jose, CA 7. (tie) Josh Falk, Pittsburgh, PA Alternate. Anna Tchetchetkine, San Jose, CA ILO 2007 • Held in Russia (St. Petersburg) • Two rounds: team and individual • Problems – Turkish/Tatar – Braille – Ndom (Papua New Guinea) – Movima (Bolivia) – Georgian (Caucasus) – Hawaiian ILO 2007 • Team contest (tied for first place): – USA Team 2: Rebecca Jacobs, Anna Tchetchetkine, Josh Falk, and Michael Gottlieb – Rebecca and Josh are on the 2008 team • Individual contest (highest score): – Adam Hesterberg – Now at Princeton NACLO 2008 • 763 participants • Top 115 were invited to the second round • 13 university sites – CMU, Cornell, Brandeis – Penn, Columbia, Michigan, Wisconsin, Illinois – Oregon, MTSU, SJSU – Ottawa, Toronto • 65 high school sites 2008 NACLO winners 1. Guy Tabachnick, New York, NY 2. Jeffrey Lim, Arlington, MA 3. Josh Falk, Pittsburgh, PA 4. Anand Natarajan, San Jose, CA 5. Jae-Kyu Lee, Andover, MA 6. Rebecca Jacobs, Encino, CA 7. Hanzhi Zhu, Shrewsbury, MA 8. Morris Alper, San Jose, CA ILO 2008 • In Bulgaria • August 4-8 2008 • Problems in: – Drehu, Cemuhí, Micmac, Old Norse, Chinese dialects ILO 2008 • Many medals: two team golds • 1 individual gold: Hanzhi Zhu • 2 individual silvers: Morris Alper and Anand Natarajan • 3 individual bronzes: Guy Tabachnick, Rebecca Jacobs, and Jeffrey Lim NACLO 2009 • 21 university sites signed up (new ones in Seattle, Vancouver, Dallas, Memphis, Washington, Baltimore, Lethbridge, Great Falls, Mankato, Princeton). Future ILOs • 2009 in Poland • 2010 in Sweden? • 2011 in the US? Plans • SGER for improving computational problem types. • Automated scoring • Reach out to the endangered languages community (note that the ILO avoids very common languages but we still have a long tail of 6,000+ languages to work with) • Interactive on-line problems • Fundraising • Become a non-profit organization • Staying in touch with our students Acknowledgments • • • • We want to thank everyone who helped turn NACLO into a successful event. Specifically, Amy Troyani from Taylor Allderdice High School in Pittsburgh, Mary Jo Bensasi of CMU, all problem writers and graders (which include the PC listed above as well as Rahel Ringger and Julia Workman) and all local contest organizers (James Pustejovsky, Lillian Lee, Claire Cardie, Mitch Marcus, Kathy McKeown, Barry Schiffman, Lori Levin, Catherine Arnott Smith, Richard Sproat, Roxana Girju, Steve Abney, Sally Thomason, Aleka Blackwell, Roula Svorou, Thomas Payne, Stan Szpakowicz, Diana Inkpen, Elaine Gold). James Pustejovsky was also the sponsorship chair, with help from Paula Chesley. Ankit Srivastava, Ronnie Sim and Willie Costello co-wrote some of the problems with members of the PC. Eugene Fink helped with the solutions booklets, Justin Brown worked on the web site, and Adam Hesterberg was an invaluable member of the team throughout. Other people who deserve our gratitude include Cheryl Hickey, Alina Johnson, Patti Kardia, Josh Cannon, Christina Hunt, Jennifer Wofford, and Cindy Robinson. Finally, NACLO couldn’t have happened without the leadership and funding provided by NSF and Tanya Korelsky in particular as well as the generous sponsorship from Google, Cambridge University Press, and the North American Chapter of the ACL (NAACL) and our other sponsors. The authors of this paper are also thankful to Martha Palmer for giving us feedback on an earlier draft. NACLO was partially funded by the National Science Foundation under grant IIS 0633871 Planning Workshop for a Computational Linguistics Olympiad. Join us in preparing NACLO 2009!