CSE5525:SpeechandLanguage Processing Instructor:AlanRitter AdministrativeDetails • CourseWebpage – http://aritter.github.io/courses/5525_fall16.html AdministrativeDetails AdministrativeDetails • Piazza(discussionandresources) – https://piazza.com/osu/spring2016/5525/home • Carmen(homeworksubmissionandgrades) – https://carmen.osu.edu/d2l/home/11684583 CourseDetails • Book – Jurfasky andMartin(2nd edition) – Willalsouseselectionsfromthe3rd edition (unpublished) – Someotherreadingsaswell • Prerequisits /Assumptions BasicProbability BasicLinearAlgebra PythonorabilitytolearnPythonquickly Linux/Unix(Forwindowsusers: https://www.cygwin.com/) – Numpy/scipy – – – – WhattoExpect • Lotsofmathandprogramming • AbitofLinguistics • ComputingResources: – Experimentscouldtakehourstorundepending theefficiencyofyourcode.Werecommendyou startearly. • Questions? TheRoleofNLPin ArtificialIntelligence • Goal:givecomputerstheability tothink(andspeak) • Longhistoryincomputer science • TuringTest(AlanTuring1950) • Loebner Prize – LittleinterestfromNLP community… – Verysimpleprogramscanfool somejudgessomeofthetime TheChineseRoom(Searle1980) NLPinScienceFiction Today’sNLPApplications • • • • • • SpeechInterfaces MachineTranslation SearchEngines InformationExtraction Summarization … NLPToday:SpeechInterfaces SpeechInterfaces • AutomaticSpeechRecognition – AudioIn,textout – SOTA:0.3%errorfordigitstrings,5%dictation 50%+forTV NaturalLanguageUnderstanding • Convertwordsintosemanticrepresentation thatcanbeusedtoqueryadatabase cuisine location WherecanIfind Indianfood inClintonville? SpeechInterfaces • TexttoSpeech – TextinAudioOut QuestionAnswering Summarization Userreviews: InformationExtraction “Yess!Yess!ItsofficialNintendoannounced todaythat theyWillreleasetheNintendo3DSinnorthAmerica march27for$250” InformationExtraction “Yess!Yess!ItsofficialNintendo announced todaythat theyWillreleasetheNintendo3DSinnorthAmerica march27for$250” InformationExtraction “Yess!Yess!ItsofficialNintendo announced todaythat theyWillreleasetheNintendo3DSinnorthAmerica march27for$250” COMPANY PRODUCT PRODUCTRELEASE DATE PRICE REGION InformationExtraction “Yess!Yess!ItsofficialNintendo announced todaythat theyWillreleasetheNintendo3DSinnorthAmerica march27for$250” COMPANY PRODUCT DATE PRICE REGION Nintendo 3DS March27 $250 NorthAmerica PRODUCTRELEASE InformationExtraction Samsung GalaxyS5ComingtoAllMajorU.S. Carriers • Stateoftheartismaybe80%,forsingleeasyfields: BeginningApril11th 90%+ • Redundancyhelpsalot! • Muchofhumanknowledgeiswaitingtobeharvested fromtheWeb! COMPANY PRODUCT DATE PRICE REGION Samsung Galaxy S5 April 11 ? U.S. Nintendo 3DS March27 $250 NorthAmerica PRODUCTRELEASE NLPasaField (andsubfields) • Tasks(eachhavetheirowndatasetsandevaluations) – – – – – SyntacticParsing Summarization MachineTranslation InformationExtraction … • Methods – Heuristic/Rule-basedApproaches(dominateduntilthe 1990sstatisticalrevolution) – ProbabilisticGraphicalModels – NeuralNetworks – … Q:WhyisNLPhard? • WhyisiteasyforcomputerstoparsePython, butnotEnglish? A:Ambiguity! Ambiguity Example: Some Funny News Headlines Milk Drinkers Turn to Powder ShoutingMatchEndsTeachers'Hearing TownOKsAnimalRule AgingExpertJoinsUniversityFaculty BritishLeftWafflesonFalklandIslands LocalHighSchoolDropoutsCutinHalf LayersofLinguisticAnnotation NNP VBZ NNP VBG NNS TO VB NN IN NNP CD . Syntax: UN wants Yemen'swarringpartiestoresumetalks onJan14. (PartsofSpeech) Morphology: (stemming) war+ing Words: UNwants Yemen'swarring partiestoresumetalksonJan14. (Tokenization) UTF8: UNwants Yemen'swarringpartiestoresumetalksonJan14. LayersofLinguisticAnnotation Syntax: (Constituents) NNP VBZ NNP VBG NNS TO VB NN IN NNP CD . Syntax: UN wants Yemen'swarringpartiestoresumetalks onJan14. (PartsofSpeech) LayersofLinguisticAnnotation Semantics/ Discourse: Syntax: (Constituents) Talks-Event Between: Yemen’swarringparties Mediator: UN Date: 1/14/2016 GoalsforThisClass • Bytheendoftheclassyoushould: – BeabletobuildbasicNLPtools(translate algorithmsintocode) – Beabletoread(andre-implement)current researchpapersinNLP – (hopefully)Beabletonoticegapsandpropose novelsolutions. Homework1 • ProbabilityRefresher • Shouldbefairlytrivial • DueatthebeginningofclassonFriday – Handinpapercopy