Natural Languages © 2007

advertisement

Natural Languages

© 2007

Language

• Definition of Language

– In math and computer science:

• A lexicon & rules for combining terms from the lexicon

– In common use:

• Structured verbal interaction between people

• Any structured interaction such as “The

Language of Film”

• Are computer languages a model for human natural language?

© 2007

Wide Variability among

Natural Languages

• Sentence Structure

– SVO (Subject-Verb-Object) (English, Chinese)

– OVS (Gaelic/Celtic)

– SVO (Hindi, Japanese, Hopi)

• Written

– Ideographic (Chinese),

– Syllabic (Thai),

– Alphabetic (English)

• Spoken

– Tonal (Chinese)

– Non-tonal (English)

© 2007

Layers of Natural Language

• Words

– Morphology, Orthography, Phonetics, Phonology

– Words are categorized into parts of speech

• Syntax

– Phrase and sentence structure based on parts of speech

• Semantics

– Literal meaning

• Pragmatics/Discourse

– Uses beyond the literal meaning

© 2007

Grammars

• Grammars are most often associated with modeling syntax though semantic grammars are also possible.

In the broadest sense, grammars are rules for languages

• The most general grammars are “context-free”. That is, the structure does not depend of the context.

• The grammars used for syntax are usually “constituent grammars”. That is they identify the relationship of the components (constituents) of the phrase.

• Grammars taught in grade school are “descriptive” grammars. Grammars in the formal analysis of language are “prescriptive” and usually “generative”.

• Grammars are usually defined by rules, but statistical transition networks are also used to model the structure of language.

© 2007

Modeling Natural Language

Syntax with Grammars

• Rewrite (or production) rules (phrasestructure grammar)

• A very simple example of rewrite rules

S NP+VP

NP N, Adj+N,

VP V, V+NP

© 2007

Parsing

• Can we identify the grammatical structure of a given statement?

• Parsing is the basis of syntax checking for computer program compilers.

• A parse tree is structure of a given statement given

– a lexicon with parts-of-speech

– a grammar

NP

S

• A very simple sample parse tree

VP shown at the right. This has a Verb Phrase with a Direct Object.

NP

This Direct Object is itself a Noun Adj N V

Phrase.

• Difficulties: Garden path sentences

Adj

– “The man who hunts ducks out on weekends”

• Many algorithms have been developed for parsing,

N

© 2007

Psycholinguistics

• What do we know about how people process and learn language?

• Are all languages context free?

• Language learning

– Children sometimes seem to over-apply rules. “I goed to the store”

• Competence vs. performance

• Transformational grammars are a model that allows re-arrangement of structure.

© 2007

Modeling Syntax with

Statistical Models

• While most grammars are a rule-based representation, a statistical representation of language may more capture structure more flexibly.

• In particular, Markov models can describe the transitions between different parts of speech. For instance, the Nouns are often followed by Verbs but

Adjectives are rarely followed by Verbs

© 2007

Words

• What exactly is a word?

– Sail-boat, Pennsylvania, 555-1212, F-16

• Definitions of words

– Why aren’t the definitions of words in dictionaries all the same?

– Are exact definitions of words possible?

• Across time, across groups

– Words evolve in meaning

• Sometimes by radial categories (that is, often by metaphor)

• What is the relationship between concepts and words?

© 2007

Tools beyond Traditional Dictionaries:

WordNet and FrameNet

• WordNet http://wordnet.princeton.edu/

– Shows hierarchical relationships for dictionary terms. Very loosely, this can be thought of as an ontology.

• FrameNet http:// framenet.icsi.berkeley.edu

/

– Verbs show the relationship among concepts. For instance “to give” implies that there is a gift, a gifter, and a giftee.

© 2007

Semantics

• Very different statements can have similar semantics.

• The semantics of statements in a computer programming language (i.e., a program) can be determined from its behavior.

• The semantics of natural language is often judged by the meaning and relationship of the components. Subjective and contextualized meaning is considered as pragmatics which we will discuss later.

© 2007

Representing Semantics

• Semantic grammar

– Even with different surface structure, can we develop a standard representation for the meaning.

• Interlingua

– A common mediator for meaning across languages. This could be useful for translation.

© 2007

Pragmatics:

Social Uses of Language

Referential

• Conveys information about some real phenomenon

• This is what we think about as normal language use

Expressive

• describes feelings of the speaker

Conative

• attempts to elicit some behavior from the addressee

Phatic

• builds a relationship between both parties in a conversation

Meta-lingual

• self-references

Poetic

• focuses on the text independent of reference from R. Jakobson

© 2007

Discourse

• Sentences form macro-structures or superstructures of meaning. This includes structured language such as argumentation, negotiation, news, narrative, and explanations.

• What are the components (elements) and structure of discourse. For instance, structuring messages to make it clear for listeners

• Given-New

Bill (a person you know) went to the store (is in a new location)

• Theme-Rheme

When in Rome (theme), do as the Romans do (rheme)

© 2007

Argumentation

• Toulmin has proposed a general structure for arguments

Grounds Claim

Evidence Rebuttal

• There are a lot of complex structured verbal interactions

– Legal arguments

– Design rationale

– Negotiations

© 2007

Explanations and Causation

• What an explanation consists of

– Two types of phenomena being explained

• Causal antecedents

– How do we explain the American Civil War?

• Sub-processes

– How does a gasoline engine work?

– Background for the person receiving the explanation needs to be considered.

© 2007

Stories and Narrative

• (Goals + Events + Resolution) + Characters

• Many stories seem highly structured

– Some stories seem so structured that they have been described as “story grammars”. This is most notably true of Russian Fairy Tales

• Many stories also reflect familiar human quandaries

– “Romeo and Juliet”

• Interactive and dynamic narrative (useful in games)

– Could we become a player in an interactive

“Romeo and Juliet”?

© 2007

Conversation

• Conversation adds a social and interactive component to language

• Conversational norms (Maxims)

• Truthful, informative, relevant, clear

• But these are routinely violated

• Managing conversations

– Opening / Closing

– Turn taking

• In Native American councils, the person holding the talking stick controlled the floor

© 2007

Download