ppt - Robert B. Allen

advertisement
Languages:
Natural and Formal
CC 2007, 2011 attribution R.B. Allen
Language
• Definition
– In math and computer science:
• A lexicon & rules for combining terms from
the lexicon
– In common use:
• Structured verbal interaction between people
• Any structured interaction such as “The
Language of Film”
• Are computer languages a model for
human natural language?
CC 2007, 2011 attribution R.B. Allen
Wide Variability among
Natural Languages
• Sentence Structure
– SVO (Subject-Verb-Object) (English, Chinese)
– OVS (Gaelic/Celtic)
– SVO (Hindi, Japanese, Hopi)
• Written
– Ideographic (Chinese),
– Syllabic (Thai),
– Alphabetic (English)
• Spoken
– Tonal (Chinese)
– Non-tonal (English)
CC 2007, 2011 attribution R.B. Allen
Layers of Natural Language
• Words
– Morphology, Orthography, Phonetics, Phonology
• Syntax
– Phrase and sentence structure based on parts of speech
• Semantics
– Literal meaning
• Pragmatics/Discourse
– Uses beyond the literal meaning
CC 2007, 2011 attribution R.B. Allen
Grammars
• Grammars are most often associated with modeling
syntax though semantic grammars are also possible.
In the broadest sense, grammars are rules for
languages
• The most general grammars are “context-free”. That
is, the structure does not depend of the context.
• The grammars used for natural language syntax are
usually “constituent grammars”. That is they identify
the relationship of the components (constituents) of the
phrase.
• Grammars taught in grade school are “descriptive”
grammars. Grammars in the formal analysis of
language are “prescriptive” and usually “generative”.
• Grammars are usually defined by rules, but statistical
transition networks are also used to model the
structure of language.
CC 2007, 2011 attribution R.B. Allen
Modeling Natural Language
Syntax with Grammars
• Rewrite (or production) rules (phrasestructure grammar)
• A very simple example of rewrite rules
S
NP
NP+VP
N, Adj+N,
VP
V, V+NP
CC 2007, 2011 attribution R.B. Allen
Parsing
• Can we identify the grammatical structure of a given statement?
• Parsing is the basis of syntax checking for computer program
compilers.
• A parse tree is structure of a given statement given
– a lexicon with parts-of-speech
– a grammar
S
NP
• A very simple sample parse tree
shown at the right. This has
a Verb Phrase with a Direct Object.
This Direct Object is itself a Noun
Adj
Phrase.
• Difficulties: Garden path sentences
VP
NP
N
V
Adj
– “The man who hunts ducks out on weekends”
• Many algorithms have been developed for parsing,
CC 2007, 2011 attribution R.B. Allen
N
Psycholinguistics
• How do people process and learn language?
• Chomsky’s claims for formal (discrete)
grammars:
• All natural languages are context free
• Children have grammatical rules wired in:
– “I goed to the store.”
• Competence vs. performance
• People know what is grammatically correct even if they make
errors.
• Transformational grammars describe rules for re-arranging of
structure such as forming a question from a declarative sentence.
• An alternative to discrete (formal) grammars is
statistical (approximate) grammars. These can
be learned by association.
CC 2007, 2011 attribution R.B. Allen
Modeling Syntax with
Statistical Models
• While most grammars are a rule-based representation,
a statistical representation of language may m capture
structure more flexibly.
• In particular, Markov models can describe the
transitions between different parts of speech. For
instance, the Nouns are often followed by Verbs but
Adjectives are rarely followed by Verbs.
CC 2007, 2011 attribution R.B. Allen
Words
• What exactly is a word? (also matters for the
design of search engines)
– Sail-boat, Pennsylvania, 555-1212, F-16
• Definitions of words
– Why aren’t the definitions of words in dictionaries
all the same?
– Are exact definitions of words possible?
• Across time, across groups
– How do words evolve in meaning?
• Sometimes by radial categories (that is, often by
metaphor)
• What is the relationship between concepts and
words?
CC 2007, 2011 attribution R.B. Allen
Beyond Traditional Dictionaries:
WordNet and FrameNet
• WordNet http://wordnet.princeton.edu/
– Shows hierarchical relationships for dictionary
terms. Very loosely, this can be thought of as an
ontology.
• FrameNet http://framenet.icsi.berkeley.edu/
– Shows the elements usually associated with a
concept.
– For verbs show the relationship among concepts.
For instance “to give” implies that there is a gift, a
gifter, and a giftee.
CC 2007, 2011 attribution R.B. Allen
Semantics
• Very different surface structures can have
similar semantics.
• The semantics of natural language is often
judged by the meaning and relationship of the
components. Subjective and contextualized
meaning is considered as pragmatics which
we will discuss later.
• The semantics of statements in a computer
programming language (i.e., a program) can
be determined from its behavior.
CC 2007, 2011 attribution R.B. Allen
Representing Semantics
• Semantic grammars
– Even with different surface structure, can
we develop a standard representation for
the meaning.
• Interlingua
– A common representation for meaning
across languages. This could be useful for
translation.
CC 2007, 2011 attribution R.B. Allen
Pragmatics:
Social Uses of Language
• Pragmatics extends the literal semantics to consider
other ways language is used.
– Referential
• Conveys information about some real phenomenon
• This is what we think about as normal language use
– Expressive
• describes feelings of the speaker
– Conative
• attempts to elicit some behavior from the addressee
– Phatic
• builds a relationship between both parties in a conversation
– Meta-lingual
• self-references
– Poetic
• focuses on the text independent of reference
from R. Jakobson
CC 2007, 2011 attribution R.B. Allen
Discourse
• Sentences form macro-structures or superstructures of meaning. This includes
structured language such as argumentation,
negotiation, news, narrative, and
explanations.
• What are the components (elements) and
structure of discourse. For instance,
structuring messages to make it clear for
listeners.
• Given-New
Bill (a person you know) went to the store (is in a new location)
• Theme-Rheme
When in Rome (theme), do as the Romans do (rheme)
CC 2007, 2011 attribution R.B. Allen
Argumentation
• Toulmin has proposed a general structure for
arguments
Grounds
Claim
Evidence
Rebuttal
• There are a lot of complex structured verbal
interactions
– Legal arguments
– Design rationale
– Negotiations
CC 2007, 2011 attribution R.B. Allen
Explanations and Causation
• An explanation consists of
– Two types of phenomena being explained
• Causal antecedents
– How do we explain the American Civil War?
• Sub-processes
– How does a gasoline engine work?
– Background for the person receiving the
explanation needs to be considered.
CC 2007, 2011 attribution R.B. Allen
Stories and Narrative
• (Goals + Events + Resolution) + Characters
• Many stories seem highly structured
– Some stories seem so structured that they have
been described as “story grammars”. This is most
notably true of Russian Fairy Tales
• Many stories also reflect familiar human
quandaries
– “Romeo and Juliet”
• Interactive and dynamic narrative (useful in
games)
– Could we become a player in an interactive
“Romeo and Juliet”?
CC 2007, 2011 attribution R.B. Allen
Conversation
• Conversation adds a social and
interactive component to language
• Conversational norms (Maxims)
• Truthful, informative, relevant, clear
• But these are routinely violated.
• e.g. shaggy dog stories.
• Managing conversations
– Opening / Closing
– Turn taking
CC 2007, 2011 attribution R.B. Allen
How close to Passing the Turing Test?
Chatterbots
IBM “Watson” plays Jeopardy.
CC 2007, 2011 attribution - R.B. Allen
Natural Language
Processing (NLP)
We will revisit natural language in a
few weeks when we look at the use
of natural language in information
systems.
CC 2007, 2011 attribution R.B. Allen
Formal Languages
• Programming languages
• High-level languages (e.g., C++) are
built to simplify the use of low-level
machine language
• Debugging tools typically check syntax
but not semantics
CC 2007, 2011 attribution - R.B. Allen
Download