LING 388: Language and Computers Sandiway Fong Lecture 1 Administrivia • Where and when – AME S314 Tuesdays/Thursdays 3:30-4:45PM • No Class • November 11th (Veterans Day) • November 27th (Thanksgiving) • Office Hours – catch me after class, or – drop by my office (or make appt.) – Location: Douglass 311 • TA: – Benjamin Martin bamartin@email.arizona.edu Administrivia • Email – sandiway@email.arizona.edu • Homepage – http://dingo.sbs.arizona.edu/~sandiway – or Google me by my first name “sandiway” • Lecture slides: – available on homepage during and after each class (may be updated) – in both PowerPoint (.pptx) and Adobe PDF formats • .pptx slides may contain animation Administrivia Administrivia Administrivia • Tips on how to take this class – No required textbook – Lecture slides contain everything you need to know in order to do the homeworks • To understand the slides, • you need to attend classes to “grok” the concepts – Unclear on something? • You are encouraged to ask questions in or after class • Ask while the question is still fresh in your mind • Review lecture video (NEW this year) – Practice • You can’t get good at computers just by reading a text • This is a hands-on class, try the exercises Administrivia • Course Objectives – Theoretical • Introduction to natural language processing techniques – Practical • Be able to write a natural language grammar that runs on a computer • Get an idea of what’s hard and what’s easy to do on a computer Outcome: by the end of the course, you will have built a small machine translation engine Administrivia • This semester, we will explore two parallel tracks – one: roll our own… learn how to write grammar rules – two: learn to use software available for language analysis Grammar Rules • Rules of Syntax: – – – – S NP VP S = sentence NP = noun phrase VP = verb phrase • Example: • VP V NP Grammar Rules • Rules of Syntax: – – – – – – – – – S NP VP VP V NP NP D N S = sentence NP = noun phrase VP = verb phrase V = Verb N = Noun D = Determiner • Example: Grammar Rules • It can get much more complicated (with modern theories) but the essential idea is the same … Grammar Rules • Rules: – S NP VP – VP V NP – NP D N • On a computer: Prolog language http://www.learnprolognow.org Grammar-based Translator • Example… English grammar Japanese grammar “glue” Grammar-based Translator Grammar-based Translator • Idioms (language-specific): – John kicked the bucket gomasuri (ごますり) Grammar-based Translator • gomasuri (ごますり) – – – – – – taroo-ga sensei-ni goma-o sutta taroo-nom teacher-dat sesame-acc grinded “John flattered the teacher” taroo-ga Hanako-ni goma-o sutta taroo-nom Hanako-dat sesame-acc grinded “John flattered Mary” Grammar-based Translator • What about state of the art systems? Administrivia • Laboratory Exercises – Some lectures will be laboratory sessions – We will do exercises in class • use your own laptop – Homework questions will be handed out in these sessions – Homework questions are designed to extend the exercises done in the lab Administrivia • Grading – 6~7 homeworks – Mandatory and Extra Credit Questions: • extra credit questions may be applied to the current homework • they may also bump you up a grade if you are borderline at the end of the semester – Homeworks are typically due one week after they are handed out – Homeworks must be submitted by email to me (by midnight) – Example: • a homework given out on Tuesday will be due next Tuesday at midnight • Ethics – You may discuss the homeworks with your classmates – However, you must do the work and write them up independently – Sources must be acknowledged (students, webpage) – UA Code of Academic Integrity • http://deanofstudents.arizona.edu/co deofacademicintegrity Administrivia • Late Policy – All homeworks are mandatory – deduction if handed in late – If you know you’re going to be late or have an upcoming emergency, let me know ahead of time • Homework tips – Homeworks are based on lab exercises • I don’t take attendance but practice is essential to understanding – Nightmare strategy: wait until the evening homework is due, scratch your head over the lecture notes, have tons of questions and start panicking • your computer crashes, the net goes down … Natural Language Processing (NLP) = Human Language Technology (HLT) = Computational Linguistics • Research Question: – What methods can we use to process natural languages on a computer? • Intersects with: – – – – – Computer science (CS) Mathematics/Statistics Artificial intelligence (AI) Linguistic Theory Psychology: Psycholinguistics • e.g. the human sentence processor Applications • Information retrieval – information is stored and accessed using language (keywords etc.) – document classification (email, news) • Machine translation – babelfish (now Bing) • http://babelfish.altavista.com/ – Google • http://translate.google.com • Language Comprehension – document summarization • Jeopardy (Quiz show) • Speech – automated 800 toll-free directory (800 555 1212) – cellphones (handsfree dialing) – car navigation (voice-synthesized directions) Applications – technology is still under development, in its infancy … • computers can’t really understand language (yet) – see google webpage translation – at least it’s free! • even if we are willing to pay... – machine translation has been worked on since after World War II (1950s), still not perfected today – why? – what are the properties of human languages that make it hard? Natural Language Properties • which ones are going to be difficult for computers to deal with? • Grammar (Rules for putting words together into sentences) – How many rules are there? • 100, 1000, 10000, more … – Portions learnt or innate – Do we have all the rules written down somewhere? • Lexicon (Dictionary) – How many words do we need to know? • 1000, 10000, 100000 … Computers vs. Humans • Knowledge of language – Computers are way faster than humans • They kill us at arithmetic and chess – But human beings are so good at language, we often take our ability for granted • Processed without conscious thought • Do pretty complex things Examples • Ambiguity – Where can I see the bus stop? – stop: verb or part of the noun-noun compound bus stop – Context (Discourse or situation) Let’s explore what computer software can do Examples Available online http://nlp.stanford.edu:8080/parser/ Examples Examples • Changes in interpretation • John is too stubborn to talk to • John is too stubborn to talk to Bill Examples • Stanford parser: Examples Examples Examples