Computational Linguistics INTroduction Lecture 1 Computers and Language Course Information Course Website http://staff.um.edu.mt/mros1/lin2160 Lecturers mike.rosner@um.edu.mt ray.fabri@um.edu.mt Book Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2009, ISBN 978-0-13-504196-3 Natural Language Toolkit (NLTK) http://www.nltk.org/ Feb 2010 -- MR CLINT - Lecture 1 2 CL: Two Main Disciplines Feb 2010 -- MR language and computers LINGUISTICS CLINT - Lecture 1 COMP SCI 3 Language and Computers includes … Natural Language Processing (NLP) Human Language Technology Computational models of language analysis, interpretation, and generation. syntax/semantics interface emphasis on large-scale performance example1: Google search example2: speech technology Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts Feb 2010 -- MR CLINT - Lecture 1 4 Linguistics Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use Feb 2010 -- MR CLINT - Lecture 1 5 Noam Chomsky Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central. Chomsky has been the dominant figure in linguistics ever since. Chomsky invented the generative approach to grammar. Feb 2010 -- MR CLINT - Lecture 1 6 Generative Grammar: Some Key Points Theory of grammar includes mathematical definition of what a grammar is. A language is a (possibly infinite) set of sentences. But a grammar is finite. Grammar generates all and only sentences of a language. Undergeneration Overgeneration [source: Sag & Wasow] Feb 2010 -- MR CLINT - Lecture 1 7 Generative Power of a Grammar G L L undergeneration only but not all L G G overgeneration all but not only all and only Feb 2010 -- MR CLINT - Lecture 1 8 Formal Grammar Grammar is a set of rewrite rules Rules have the form LHS RHS LHS can be rewritten as RHS LHS & RHS are sequences made of words or symbols Lexicon specifies words and their categories Category word Category can be rewritten as word Feb 2010 -- MR CLINT - Lecture 1 9 A Simple Grammar/Lexicon grammar: S NP VP NP N VP V NP lexicon: V kicks N John N Bill Feb 2010 -- MR S NP N VP V NP N John CLINT - Lecture 1 kicks Bill 10 Formal v. Natural Languages Formal Languages Natural Languages Arithmetic 3290 1 1010101 English John saw the dog Logic x man(x) mortal(x) German Johann hat den hund gesehen URL http://www.cs.um.edu.mt Maltese Ġianni ra kelb Feb 2010 -- MR CLINT - Lecture 1 11 Some Points of Similarity Sentences are sequences of words (or symbols). Rules determine which sequences are valid sentences. Sentences have a definite structure. Sentence structure systematically related to meaning. Feb 2010 -- MR CLINT - Lecture 1 12 Structure Affects Meaning I shot an elephant in my trousers Feb 2010 -- MR CLINT - Lecture 1 13 Points of Difference Formal Languages The grammar defines the language Restricted application Non ambiguous Feb 2010 -- MR Natural Languages The language defines the grammar Universal application Highly ambiguous CLINT - Lecture 1 14 Ambiguity Morphological Ambiguity en-large-ment Lexical Ambiguity Iraqi Head Seeks Arms Syntactic Ambiguity small animals and children laugh Semantic Ambiguity every girl loves a sailor Pragmatic Ambiguity can you pass the salt? The management of ambiguity is central to the success of CL Feb 2010 -- MR CLINT - Lecture 1 15 I made her duck I cooked a duck for her I cooked a duck belonging to her I created a duck for her I created a duck that now belongs to her I caused her to lower her head I turned her into a duck Feb 2010 -- MR CLINT - Lecture 1 16 Computer Science The study of basic concepts Information Data Algorithm Program The application of these concepts to practical tasks. Implementation of computational models from other fields (meteorology,..,linguistics) Feb 2010 -- MR CLINT - Lecture 1 17 Information Data Algorithm Program Information is a theoretical concept invented by Shannon in 1948 to measure uncertainty. The units of this measure are called bits. Length – metres Weight – kilos Information – bits 1 bit is the amount of uncertainty inherent to a situation when there are exactly two possible outcomes. Example: for breakfast I will have coffee or I will have tea (nothing else). When I tell you that I have tea, I have conveyed one bit of information. The greater the number of possible outcomes, the more bits of infomation involved in the statement that indicates the actual outcome. Feb 2010 -- MR CLINT - Lecture 1 18 Information Data Algorithm Program A formalized representation of facts or concepts suitable for communication, interpretation, or processing by people or automated means. Example: a telephone directory Unlike information, which is abstract, data is concrete Data has a certain level of structure. In the telephone directory, for example, we have the structure of a list of entries, each of which has a name, an address, and a number. Feb 2010 -- MR CLINT - Lecture 1 19 Information Data Algorithm Program A completely defined procedure for the solution of a given problem in a finite number of steps Designed for a well-defined task. Finite description length. Guaranteed to terminate. Abstract Feb 2010 -- MR CLINT - Lecture 1 20 Algorithm for Chocolate Cake Feb 2010 -- MR CLINT - Lecture 1 21 Program to Add X and Y Read X and Y X = 2, Y = 3 subtract 1 from X add 1 to Y no Feb 2010 -- MR X = 0? CLINT - Lecture 1 yes Output Y 22 Computer Program A set of instructions, written in a specific programming language, which a computer follows in processing data, performing an operation, or solving a logical problem. Concrete A program can implement an algorithm. More than one program may implement the same algorithm. Not all programs express good algorithms! Feb 2010 -- MR CLINT - Lecture 1 23 Instructions vs. Execution Steps 1. 2. 3. 4. 5. Read X Read Y X = X-1 Y = Y+1 If X = 0 then Print(X) else goto 3 How many instructions? How many execution steps? Feb 2010 -- MR CLINT - Lecture 1 24 Algorithms and Linguistics Do linguistic theories in the abstract make sense? Linguistic theory explain linguistic knowledge in the form of grammar rules theories about grammar rules But performance, involves processing issues: Feb 2010 -- MR CLINT - Lecture 1 25 Computational Linguistics – Issues How are a grammar and a lexicon represented? How is the structure of a given sentence actually discovered? How can we actually generate a sentence to express a particular intended meaning? How can linguistic theory be made concrete enough to test algorithmically? Can an artificial system learn a language with limited exposure to grammatical sentences? Feb 2010 -- MR CLINT - Lecture 1 26 Computers and Language Twin Goals Scientific Goal: Contribute to Linguistics by adding a computational dimension. Technological Goal: Develop machinery capable of handling human language that can support “language engineering” Feb 2010 -- MR CLINT - Lecture 1 27 Computers and Language Tools & Resources Grammar Formalisms, e.g. Definite Clause Grammars Parsing Algorithms sentence structure Generation Algorithms structure sentence Statistical Methods Linguistic Corpora Feb 2010 -- MR CLINT - Lecture 1 28 Computers and Language: Applications Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Multimodal Interaction Machine Translation Feb 2010 -- MR CLINT - Lecture 1 29 LECTURES Feb 2010 -- MR 1 Overview 2 Chomsky Hierarchy 3 Chomsky Hierarchy 4 Chomsky Hierarchy 5 Computational Syntax 6 Agreement & Subcategorisation 7 Computational Syntax 8 Computational Syntax 9 Corpora, Tools and Techniques 10 Morphology 11 Computational Morphology 12 Computational Morphology 13 Computational Morphology 14 Revision CLINT - Lecture 1 30