Languages: Natural and Formal CC 2007, 2011 attribution R.B. Allen Language • Definition – In math and computer science: • A lexicon & rules for combining terms from the lexicon – In common use: • Structured verbal interaction between people • Any structured interaction such as “The Language of Film” • Are computer languages a model for human natural language? CC 2007, 2011 attribution R.B. Allen Wide Variability among Natural Languages • Sentence Structure – SVO (Subject-Verb-Object) (English, Chinese) – OVS (Gaelic/Celtic) – SVO (Hindi, Japanese, Hopi) • Written – Ideographic (Chinese), – Syllabic (Thai), – Alphabetic (English) • Spoken – Tonal (Chinese) – Non-tonal (English) CC 2007, 2011 attribution R.B. Allen Layers of Natural Language • Words – Morphology, Orthography, Phonetics, Phonology • Syntax – Phrase and sentence structure based on parts of speech • Semantics – Literal meaning • Pragmatics/Discourse – Uses beyond the literal meaning CC 2007, 2011 attribution R.B. Allen Grammars • Grammars are most often associated with modeling syntax though semantic grammars are also possible. In the broadest sense, grammars are rules for languages • The most general grammars are “context-free”. That is, the structure does not depend of the context. • The grammars used for natural language syntax are usually “constituent grammars”. That is they identify the relationship of the components (constituents) of the phrase. • Grammars taught in grade school are “descriptive” grammars. Grammars in the formal analysis of language are “prescriptive” and usually “generative”. • Grammars are usually defined by rules, but statistical transition networks are also used to model the structure of language. CC 2007, 2011 attribution R.B. Allen Modeling Natural Language Syntax with Grammars • Rewrite (or production) rules (phrasestructure grammar) • A very simple example of rewrite rules S NP NP+VP N, Adj+N, VP V, V+NP CC 2007, 2011 attribution R.B. Allen Parsing • Can we identify the grammatical structure of a given statement? • Parsing is the basis of syntax checking for computer program compilers. • A parse tree is structure of a given statement given – a lexicon with parts-of-speech – a grammar S NP • A very simple sample parse tree shown at the right. This has a Verb Phrase with a Direct Object. This Direct Object is itself a Noun Adj Phrase. • Difficulties: Garden path sentences VP NP N V Adj – “The man who hunts ducks out on weekends” • Many algorithms have been developed for parsing, CC 2007, 2011 attribution R.B. Allen N Psycholinguistics • How do people process and learn language? • Chomsky’s claims for formal (discrete) grammars: • All natural languages are context free • Children have grammatical rules wired in: – “I goed to the store.” • Competence vs. performance • People know what is grammatically correct even if they make errors. • Transformational grammars describe rules for re-arranging of structure such as forming a question from a declarative sentence. • An alternative to discrete (formal) grammars is statistical (approximate) grammars. These can be learned by association. CC 2007, 2011 attribution R.B. Allen Modeling Syntax with Statistical Models • While most grammars are a rule-based representation, a statistical representation of language may m capture structure more flexibly. • In particular, Markov models can describe the transitions between different parts of speech. For instance, the Nouns are often followed by Verbs but Adjectives are rarely followed by Verbs. CC 2007, 2011 attribution R.B. Allen Words • What exactly is a word? (also matters for the design of search engines) – Sail-boat, Pennsylvania, 555-1212, F-16 • Definitions of words – Why aren’t the definitions of words in dictionaries all the same? – Are exact definitions of words possible? • Across time, across groups – How do words evolve in meaning? • Sometimes by radial categories (that is, often by metaphor) • What is the relationship between concepts and words? CC 2007, 2011 attribution R.B. Allen Beyond Traditional Dictionaries: WordNet and FrameNet • WordNet http://wordnet.princeton.edu/ – Shows hierarchical relationships for dictionary terms. Very loosely, this can be thought of as an ontology. • FrameNet http://framenet.icsi.berkeley.edu/ – Shows the elements usually associated with a concept. – For verbs show the relationship among concepts. For instance “to give” implies that there is a gift, a gifter, and a giftee. CC 2007, 2011 attribution R.B. Allen Semantics • Very different surface structures can have similar semantics. • The semantics of natural language is often judged by the meaning and relationship of the components. Subjective and contextualized meaning is considered as pragmatics which we will discuss later. • The semantics of statements in a computer programming language (i.e., a program) can be determined from its behavior. CC 2007, 2011 attribution R.B. Allen Representing Semantics • Semantic grammars – Even with different surface structure, can we develop a standard representation for the meaning. • Interlingua – A common representation for meaning across languages. This could be useful for translation. CC 2007, 2011 attribution R.B. Allen Pragmatics: Social Uses of Language • Pragmatics extends the literal semantics to consider other ways language is used. – Referential • Conveys information about some real phenomenon • This is what we think about as normal language use – Expressive • describes feelings of the speaker – Conative • attempts to elicit some behavior from the addressee – Phatic • builds a relationship between both parties in a conversation – Meta-lingual • self-references – Poetic • focuses on the text independent of reference from R. Jakobson CC 2007, 2011 attribution R.B. Allen Discourse • Sentences form macro-structures or superstructures of meaning. This includes structured language such as argumentation, negotiation, news, narrative, and explanations. • What are the components (elements) and structure of discourse. For instance, structuring messages to make it clear for listeners. • Given-New Bill (a person you know) went to the store (is in a new location) • Theme-Rheme When in Rome (theme), do as the Romans do (rheme) CC 2007, 2011 attribution R.B. Allen Argumentation • Toulmin has proposed a general structure for arguments Grounds Claim Evidence Rebuttal • There are a lot of complex structured verbal interactions – Legal arguments – Design rationale – Negotiations CC 2007, 2011 attribution R.B. Allen Explanations and Causation • An explanation consists of – Two types of phenomena being explained • Causal antecedents – How do we explain the American Civil War? • Sub-processes – How does a gasoline engine work? – Background for the person receiving the explanation needs to be considered. CC 2007, 2011 attribution R.B. Allen Stories and Narrative • (Goals + Events + Resolution) + Characters • Many stories seem highly structured – Some stories seem so structured that they have been described as “story grammars”. This is most notably true of Russian Fairy Tales • Many stories also reflect familiar human quandaries – “Romeo and Juliet” • Interactive and dynamic narrative (useful in games) – Could we become a player in an interactive “Romeo and Juliet”? CC 2007, 2011 attribution R.B. Allen Conversation • Conversation adds a social and interactive component to language • Conversational norms (Maxims) • Truthful, informative, relevant, clear • But these are routinely violated. • e.g. shaggy dog stories. • Managing conversations – Opening / Closing – Turn taking CC 2007, 2011 attribution R.B. Allen How close to Passing the Turing Test? Chatterbots IBM “Watson” plays Jeopardy. CC 2007, 2011 attribution - R.B. Allen Natural Language Processing (NLP) We will revisit natural language in a few weeks when we look at the use of natural language in information systems. CC 2007, 2011 attribution R.B. Allen Formal Languages • Programming languages • High-level languages (e.g., C++) are built to simplify the use of low-level machine language • Debugging tools typically check syntax but not semantics CC 2007, 2011 attribution - R.B. Allen