A very short introduction to Natural Language Generation Kees van Deemter Computing Science University of Aberdeen Language Technology Meaning Natural Language Understanding Natural Language Generation Text Text Speech Recognition Speech Synthesis Speech Speech First: NLG from a practical perspective Goal: Use computers to express information in human-accessible form Input: some non-linguistic representation of information (e.g., tables in database, logical formulas, JAVA code, ...) Output: documents, reports, explanations, help messages, ... in some human language (Chinese, English, Dutch) Knowledge sources required: knowledge of language and of the domain; maybe of the intended audience as well Example System: FoG Function: Produces textual weather reports in English and French Input: Graphical/numerical weather depiction User: Environment Canada (Canadian Weather Service) Developer: CoGenTex. [Kitteridge, Goldberg and Driedger 1994.] Status: Fielded, in operational use since 1992 FoG: Input FoG: Output Example System: Dial Your Disc (DYD) Function: Context-sensitive descriptions of Mozart’s instrumental music Input: Music database + history of interaction Target user: Music industry, customers for music-on-demand Developer: Philips Electronics (Nat Lab – IPO, Eindhoven; 1993-6) [Van Deemter & Odijk 1995] Status: Methods reused in GOALGETTER and other systems Example System: Dial Your Disc (DYD) User composes a home-made CD Speech interface tells system what type of music the user would like to add to the CD. E.g., “I’d like some piano music”. “I’m interested in solo performances”. “piano”, “solo” System chooses one composition with solo piano. The music starts. After a while, a text is spoken The second time a piano sonata is selected, the following text might be generated: Example System: Dial Your Disc (DYD) Example of approximate output, in its most elaborate form: “The following+ composition+, from which you are going to hear a fragment+ of part three+, was written+ by Mozart in the beginning+ of seventeen+ seventy+ five+, in Munich+. The work is also+ a sonata+ in f+, like the preceding+ composition, but now+ for piano+. The KV+ number of this work is K. two+ eight+ zero+. This sonata+ consists of three+ parts+: allegro assai+, adagio+, and presto+. The presto lasts two+ minutes+ forty+ five+ seconds+. This presto is located on track six+ of first+ CD+ of volume seventeen+. The piano+ is played by Mitsuko Uchida+. The recording+ of the sonata+ was made+ in the Henry Wood+ Hall in London+, England, in the eighties+. The quality+ of its recording is DDD+. The following+ is a fragment+ of the third+ part+.” [A fragment follows] Each “+” marks a pitch accent on the preceding word When to use NLG? When there are many potential documents to be written, differing according to the context (user, situation, language) there are some general principles behind document design. Why is NLG hard? NLG involves many choices, e.g. which content to include, what order to say it in, what words to use. Linguistics does not yet provide us with a ready-made, precise theory about how to make such choices to produce coherent text Why does choice matter? The Serbian Prime Minister, Zoran Djindjic, has been assassinated in the capital, Belgrade. The pro-reform, pro-Western leader was shot in the stomach and in the back outside government offices at around 1300 (1200 gmt), and died of his wounds in hospital. (BBC news, UK edition, 12/3/03) Tasks and Architecture in NLG (Reiter 1994) Content Determination Document Structuring Aggregation Lexicalisation Generation of Referring Expressions Linguistic Realisation Physical Realisation Document Planning Microplanning Surface Realisation Second perspective: NLG as a branch of linguistics NLG systems map ideas to words Surely, this is linguistic territory! If linguists cannot say how the different stories about James Sportler differ, then who can? An NLG program might be seen as a model of language production (in terms of its output; the human production process may be very different) NLG is the smaller twin brother of NL Understanding NLG poses deep theoretical problems about language and communication NLG has great potential for applications This course: Generation of Referring Expressions