Contextual Vocabulary Acquisition: From Algorithm to Curriculum William J. Rapaport*

advertisement
Contextual Vocabulary Acquisition:
From Algorithm to Curriculum
William J. Rapaport* and Michael W. Kibby**
(*)Department of Computer Science & Engineering,
Department of Philosophy, Department of Linguistics,
and Center for Cognitive Science
(**)Department of Learning & Instruction
and Center for Literacy & Reading Instruction
http://www.cse.buffalo.edu/~rapaport/CVA/
Contextual Vocabulary Acquisition
• Active, conscious acquisition of a meaning for a word,
as it occurs in a text, by reasoning from “context”
• CVA = what you do when:
–
–
–
–
–
You’re reading
You come to an unfamiliar word
It’s important for understanding the passage
No one’s around to ask
Dictionary doesn’t help
•
•
•
•
No dictionary
Too lazy to look it up :-)
Word not in dictionary
Definition of no use
– Too hard (& you’d need to do CVA on the definition!)
– Not relevant to the context
• So, you “figure out” a meaning for the word “from context”
– “figure out” = infer (compute) a hypothesis about
what the word might mean in that text
– “context” = ??
What Does ‘Brachet’ Mean?
(From Malory’s Morte D’Arthur [page # in brackets])
1.
2.
3.
4.
10.
18.
There came a white hart running into the hall with a
white brachet next to him, and thirty couples of black
hounds came running after them. [66]
As the hart went by the sideboard,
the white brachet bit him. [66]
The knight arose, took up the brachet and
rode away with the brachet. [66]
A lady came in and cried aloud to King Arthur,
“Sire, the brachet is mine”. [66]
There was the white brachet which bayed at him fast.
The hart lay dead; a brachet was biting on his throat,
and other hounds came behind. [86]
[72]
What Is the “Context” for CVA?
• “context” ≠ textual context
– surrounding words; “co-text” of word
• “context” = wide context =
– “internalized” co-text …
• ≈ reader’s interpretive mental model of textual “co-text”
– … “integrated” with reader’s prior knowledge…
•
•
•
•
“world” knowledge
language knowledge
previous hypotheses about word’s meaning
but not including external sources (dictionary, humans)
– … via belief revision
• infer new beliefs from internalized co-text + prior knowledge
• remove inconsistent beliefs
 “Context” for CVA is in reader’s mind, not in the text
Prior Knowledge
PK1
PK2
PK3
PK4
Text
Prior Knowledge
PK1
PK2
PK3
PK4
Text
T1
Integrated KB
internalization
PK1
I(T1)
PK2
PK3
PK4
Text
T1
B-R Integrated KB
internalization
PK1
I(T1)
PK2
inference
PK3 P5
PK4
Text
T1
B-R Integrated KB
Text
internalization
PK1
I(T1)
PK2
inference
PK3 P5
PK4
P6
I(T2)
T1
T2
B-R Integrated KB
Text
internalization
PK1
I(T1)
PK2
T1
T2
inference
PK3 P5
PK4
I(T2)
P6
I(T3)
T3
B-R Integrated KB
Text
internalization
PK1
I(T1)
PK2
T1
T2
inference
PK3 P5
PK4
I(T2)
P6
I(T3)
T3
Note: All “contextual” reasoning is done in this “context”:
B-R Integrated KB
(the reader’s mind)
internalization
PK1
P7
Text
I(T1)
PK2
T1
T2
inference
PK3 P5
PK4
I(T2)
P6
I(T3)
T3
Meaning of “Meaning”
• “the meaning of a word” vs. “a meaning for a word”
– “the” 
– “of ” 
– “a”

single, correct meaning
meaning belongs to word
many possible meanings
• depending on textual context,
reader’s prior knowledge, etc.
– “for” 
reader hypothesizes meaning
from “context”, & gives it to word
• “The meaning of things lies not in themselves
but in our attitudes toward them.”
– Antoine de Saint-Exupéry, Wisdom of the Sands (1948)
• “Words don’t have meaning; they’re cues to meaning!”
“Words might be better understood as operators, entities that
operate directly on mental states in what can be formally
understood as a dynamical system.”
– Jeffrey L. Elman, “On Words and Dinosaur Bones: Where Is Meaning?” (2007)
• “We cannot locate meaning in the text…;
[figuring out meaning is an] active, dynamic process…,
existing only in interactive behaviors
of cultural, social, biological, and physical environment-systems.”
– William J. Clancey, “Scientific Antecedents of Situated Cognition”
(forthcoming)
CVA & Vocabulary Instruction
•
People do “incidental” (unconscious) CVA
–
Possibly best explanation of how we learn vocabulary
•
•
•
Given # of words high-school grad knows (~45K),
& # of years to learn them (~18) = ~2.5K words/year
But only taught ~10% in 12 school years
Students are taught “deliberate” (conscious) CVA
in order to improve their vocabulary
–
But not taught well
Why not use a dictionary?
Because:
• People are lazy (!)
• Dictionaries are not always available
• Dictionaries are always incomplete
• Dictionary definitions are not always useful
– ‘chaste’ =df clean, spotless / “new dishes are chaste”
– ‘college’ =df a body of clergy living together and
supported by a foundation
• Most words are learned via incidental CVA,
not via dictionaries
• Most importantly:
– Dictionary definitions are just more contexts!
State of the Art: Computational
Linguistics
• Information extraction systems
• Autonomous intelligent agents
• There can be no complete lexicon
• Such systems/agents shouldn’t have to
stop to ask questions
25
QuickTime™ and a
GIF decompressor
are needed to see this picture.
State of the Art: Computational Linguistics
•
Granger 1977: “Foul-Up”
– Based on Schank’s theory of “scripts” (schema theory)
– Our system not restricted to scripts
•
Zernik 1987: self-extending phrasal lexicon
– Uses human informant
– Ours system is really “self-extending”
•
Hastings 1994: “Camille”
– Maps unknown word to known concept in ontology
– Our system can learn new concepts
•
Word-Sense Disambiguation:
– Given ambiguous word & list of all meanings,
determine the “correct” meaning
• Multiple-choice test :-)
• Our system: given new word, compute its meaning
– Essay question :-)
26
QuickTime™ and a
GIF decompressor
are needed to see this picture.
State of the Art: Vocabulary Learning
(I)
• Elshout-Mohr/van Daalen-Kapteijns
1981,1987:
– Application of Winston’s AI “arch” learning theory
– (Good) reader’s model of new word = frame
• Attribute slots, default values
• Revision by updating slots & values
– Poor readers update by replacing entire frames
– But: EM & vDK used:
• Made-up words
• Carefully constructed contexts
– Presented in a specific order
27
QuickTime™ and a
GIF decompressor
are needed to see this picture.
Elshout-Mohr & van Daalen-Kapteijns
Experiments with neologisms in 5 artificial contexts
1.
When you are used to a view it is depressing when you live in a room
with kolpers.
–
Superordinate information
2.
At home he had to work by artificial light because of those kolpers.
3.
During a heat wave, people want kolpers, so sun-blind sales increase.
–
Contexts showing 2 differences from the superordinate
4.
I was afraid the room might have kolpers, but plenty of sunlight came
into it.
5.
This house has kolpers all summer until the leaves fall out.
–
Contexts showing 2 counterexamples due to the 2 differences
28
QuickTime™ and a
GIF decompressor
are needed to see this picture.
State of the Art: Psychology
• Johnson-Laird 1987:
– Word understanding  definition
– Definitions aren’t stored
– “During the Renaissance, Bernini cast a
bronze of a mastiff eating truffles.”
29
QuickTime™ and a
GIF decompressor
are needed to see this picture.
State of the Art: Psychology
• Sternberg et al. 1983,1987:
– Cues to look for (= slots for frame):
•
•
•
•
•
•
•
Spatiotemporal cues
Value cues
Properties
Functions
Cause/enablement information
Class memberships
Synonyms/antonyms
– To acquire new words from context:
• Distinguish relevant/irrelevant information
• Selectively combine relevant information
• Compare this information with previous beliefs
30
QuickTime™ and a
GIF decompressor
are needed to see this picture.
Sternberg
• The couple there on the blind date
was not enjoying the festivities in
the least. An acapnotic, he
disliked her smoking; and when he
removed his hat, she, who
preferred “ageless” men, eyed his
increasing phalacrosis and
grimaced.
31
QuickTime™ and a
GIF decompressor
are needed to see this picture.
Overview of CVA Project
1. From Algorithm…
•
Implemented computational theory of how to
figure out (compute) a meaning for an unfamiliar word
from “wide context”
…to Curriculum
2.
•
Convert algorithms to an improved, teachable
curriculum
1. Computational CVA
• Implemented in SNePS
(Shapiro 1979; Shapiro & Rapaport 1992)
– Intensional, propositional semantic-network
knowledge-representation, reasoning, & acting system
• “intensional”:
– can represent fictional objects
• “propositional”:
– can represent sentences in a text
• “semantic network”:
– labeled, directed graph with nodes linked by arcs
– indexed by node:
» from any node, can describe rest of network
– Serves as model of the reader (“Cassie”)
1. Computational CVA (cont’d)
• KB: SNePS representation of reader’s prior knowledge
• I/P: SNePS representation of word in its co-text
• Processing (“simulates”/“models”/is?! reading):
– Uses logical inference, generalized inheritance, belief revision
to reason about text integrated with reader’s prior knowledge
– N & V definition algorithms deductively search this
“belief-revised, integrated” KB (the wide context)
for slot fillers for definition frame…
• O/P: Definition frame
– slots (features): classes, structure, actions, properties, etc.
– fillers (values): info gleaned from context (= integrated KB)
Cassie learns what “brachet” means:
Background info about: harts, animals, King Arthur, etc.
No info about:
brachets
Input:
formal-language (SNePS) version of simplified English
A hart runs into King Arthur’s hall.
• In the story, B12 is a hart.
• In the story, B13 is a hall.
• In the story, B13 is King Arthur’s.
• In the story, B12 runs into B13.
A white brachet is next to the hart.
• In the story, B14 is a brachet.
• In the story, B14 has the property “white”.
• Therefore, brachets are physical objects.
(deduced while reading;
PK: Cassie believes that only physical objects have color)
--> (defineNoun "brachet")
Definition of brachet:
Class Inclusions: phys obj,
Possible Properties: white,
Possibly Similar Items:
animal, mammal, deer, horse,
pony, dog,
I.e., a brachet is a physical object that can be white
and that might be like an animal, mammal, deer,
horse, pony, or dog
A hart runs into King Arthur’s hall.
A white brachet is next to the hart.
The brachet bites the hart’s buttock.
[PK: Only animals bite]
--> (defineNoun "brachet")
Definition of brachet:
Class Inclusions: animal,
Possible Actions: bite buttock,
Possible Properties: white,
Possibly Similar Items: mammal, pony,
A hart runs into King Arthur’s hall.
A white brachet is next to the hart.
The brachet bites the hart’s buttock.
The knight picks up the brachet.
The knight carries the brachet.
[PK: Only small things can be picked up/carried]
--> (defineNoun "brachet")
Definition of brachet:
Class Inclusions: animal,
Possible Actions: bite buttock,
Possible Properties: small, white,
Possibly Similar Items: mammal, pony,
A hart runs into King Arthur’s hall.
A white brachet is next to the hart.
The brachet bites the hart’s buttock.
The knight picks up the brachet.
The knight carries the brachet.
The lady says that she wants the brachet.
[PK:
Only valuable things are wanted]
--> (defineNoun "brachet")
Definition of brachet:
Class Inclusions: animal,
Possible Actions: bite buttock,
Possible Properties: valuable, small,
white,
Possibly Similar Items: mammal, pony,
A hart runs into King Arthur’s hall.
A white brachet is next to the hart.
The brachet bites the hart’s buttock.
The knight picks up the brachet.
The knight carries the brachet.
The lady says that she wants the brachet.
The brachet bays at Sir Tor.
[PK: Only hunting dogs bay]
--> (defineNoun "brachet")
Definition of brachet:
Class Inclusions: hound, dog,
Possible Actions: bite buttock, bay, hunt,
Possible Properties: valuable, small, white,
I.e. A brachet is a hound (a kind of dog) that can bite, bay, and hunt,
and that may be valuable, small, and white.
General Comments
• Cassie’s behavior  human protocols
• Cassie’s definition  OED’s definition:
= A brachet is “a kind of hound which hunts by scent”
The Algorithms
1. Generate initial hypothesis by
“syntactic manipulation”
•
•
Algebra: Solve an equation for unknown value X
Syntax: “Solve” a sentence for unknown word X
–
–
“A white brachet (X) is next to the hart”
 X (a brachet) is something that is next to the hart and
that can be white.
I.e., “define” node X in terms of immediately connected nodes
2. Deductively search wide context to update hypothesis
•
I.e., “define” word X in terms of some (but not all) other connected nodes
3. Return definition frame.
Noun Algorithm
• Generate initial hypothesis by syntactic manipulation
• Then find or infer from wide context:
– Basic-level class memberships
(e.g., “dog”, rather than “animal”)
• else most-specific-level class memberships
• else names of individuals
–
–
–
–
–
–
–
Properties of Xs (else, of individual Xs) (e.g., size, color, …)
Structure of Xs (else …) (part-whole, physical structure…)
Acts that Xs perform (else …) or that can be done to/with Xs
Agents that do things to/with Xs
… or to whom things can be done with Xs
… or that own Xs
Possible synonyms, antonyms
Verb Algorithm
• Generate initial hypothesis by syntactic manipulation
• Then find or infer from wide context:
– Class membership (e.g., Conceptual Dependency)
• What kind of act is X-ing
• What kinds of acts are X-ings
(e.g., walking is a kind of moving)
(e.g., sauntering is a kind of walking)
– Properties/manners of X-ing (e.g., moving by foot, slow walking)
– Transitivity/subcategorization information
• Return class membership of agent, object, indirect object, instrument
– Possible synonyms, antonyms
– Causes & effects
• [Also: preliminary work on adjective/adverb algorithm]
Belief Revision
•
To revise definitions of words used inconsistently
with current meaning hypothesis
•
SNeBR (ATMS; Martins & Shapiro 1988, Johnson 2006):
–
If inference leads to a contradiction, then:
1.
SNeBR asks user to remove culprit(s)
2.
& automatically removes consequences inferred from culprit
Revision & Expansion
• Removal & revision being automated via SNePSwD by ranking all propositions
with kn_cat:
most
certain
intrinsic
story
least
certain
info re: language; fundamental background info
(“before” is transitive)
info in text
(“King Lot rode to town”)
life
background info w/o variables or inference
(“dogs are animals”)
story-comp
info inferred from text (King Lot is a king, rode on a horse)
life-rule.1
everyday commonsense background info
(BearsLiveYoung(x)  Mammal(x))
life-rule.2
specialized background info
(x smites y  x kills y by hitting y)
questionable already-revised life-rule.2; not part of input
Belief Revision: ‘smite’
• Misunderstood word:
– Initially believe that ‘smite’ means:
“kill by hitting”
• Read “King Lot smote down King Arthur”
– Infer that King Arthur is dead
• Then read: “King Arthur drew his sword Excalibur”
– Contradiction!
– Weaken definition to: “hit and possibly kill”
• Then read more passages in which smiting ≠> killing
– Hypothesize that ‘smite’ means “hit”
Belief Revision: ‘to dress’
• Well-entrenched word…
– Believe ‘to dress’ means “to put clothes on”
– Commonsense belief:
• Spears don’t wear clothing
• … used in new sense:
– Read “King Claudius dressed his spear”
• Infer that spear wears clothing
• Contradiction!
• Modify definition to:
“to put clothes on OR to do something else”
• Read “King Arthur dressed his troops before battle”
– Infer that ‘dress’ means:
“to put clothes on OR to prepare for battle”
• Eventually: Induce more general definition:
– “to prepare” (for the day, for battle, for eating…)
A Computational Theory of CVA
1.
2.
A word does not have a unique meaning.
A word does not have a “correct” meaning.
Author’s intended meaning for word doesn’t need to be known by reader
in order for reader to understand word in context
Even familiar/well-known words can acquire new meanings in new contexts.
Neologisms are usually learned only from context
a)
b)
c)
3.
Every co-text can give some clue to a meaning for a word.
•
4.
Generate initial hypothesis via syntactic/algebraic manipulation
But co-text must be integrated with reader’s prior knowledge
Large co-text + large PK  more clues
Lots of occurrences of word allow asymptotic approach to stable meaning hypothesis
a)
b)
5.
CVA is computable
CVA is “open-ended”, hypothesis generation.
a)
•
b)
6.
7.
CVA ≠ guess missing word (“cloze”);

CVA ≠ word-sense disambiguation
Some words are easier to compute meanings for than others (N < V < Adj/Adv)
CVA can improve general reading comprehension (through active reasoning)
CVA can & should be taught in schools
2. From Algorithm to Curriculum
•
State of the art in classroom CVA:
–
Mauser 1984: “context” = definition!
–
Clarke & Nation 1980: a “strategy” (algorithm?):
1. Determine part of speech of word
2. Look at grammatical context
•
Who does what to whom?
3. Look at surrounding textual context
•
Search for clues (as we do)
4. Guess the word; check your guess
CVA: From Algorithm to Curriculum
•
“guess the word”
=
“then a miracle occurs”
• Surely, computer scientists
can “be more explicit”!
• And so should teachers!
From Algorithm to Curriculum (cont’d)
• We have explicit, rule-based (symbolic) AI theory of CVA
 Teachable!
• Goal:
– Not:
teach people to “think like computers”
– But:
explicate computable & teachable methods
to hypothesize word meanings from context
• AI as computational psychology:
– Devise computer programs that faithfully simulate
(human) cognition
– Can tell us something about (human) mind
• Joint work with Michael Kibby (UB Reading Clinic)
– We are teaching a machine, to see if what we learn in
teaching it can help us teach students better
“Contextual Semantic Investigation” (CSI):
A Curriculum Outline
1.
2.
3.
4.
5.
Teacher models CSI
Teacher models CSI with student participation
Students model CSI with teacher assistance
Students do CSI in small groups
Students do CSI on their own
CSI: The Basic Algorithm
I.
Become aware of word X
& of need to understand X
II. Repeat:
A.
B.
Generate hypothesis H about X’s meaning
Test H
until H is a plausible meaning for X
in the current “wide” context
IIB. Test H
1. Replace all occurrences of X in sentence
by H
2. If Sentence (X := H) makes sense
then proceed with reading
else generate new H
IIA. Generate H
1.
Make an “intuitive” guess H
2.
If H fails or you can’t guess,
then do in any order:
a)
if you have you read X before
& if you (vaguely) recall its meaning,
then test that earlier meaning
b)
if you can generate a meaning from X’s morphology,
then test that meaning
c)
if you can make an “educated guess” (next slide),
then test it
IIA2c: Make an “Educated Guess”
i.
ii.
iii.
iv.
v.
vi.
Re-read X’s sentence slowly & actively
Determine X’s part of speech
Summarize entire text so far
Activate your PK about the topic
Make inferences from text + PK
Generate H based on all this
IIA3: If all previous steps fail,
then do CVA
a) “Solve for X”
b) Search context for clues
c) Create H
IIA3a: Solve for X
i.
Syntactically manipulate X’s sentence
so that X is the subject
ii. Generate a list of possible synonyms
(as “hypotheses in waiting”)
IIA3b: Search context for clues
i.
If X is a noun,
then search context for clues about X’s…
•
•
•
•
•
•
•
class membership
properties
structure
acts
agents
comparisons
contrasts
IIA3b: Search context for clues
ii.
If X is a verb,
then search context for clues about X’s…
•
class membership


what kind of act Xing is
what kinds of acts are Xings
•
properties of Xing (e.g., manner)
•
transitivity

•
look for agents and objects of Xing
comparisons & contrasts
IIA3b: Search context for clues
iii. If X is an adjective or adverb,
then search context for clues about X’s…
•
class membership

•
is it a color adjective, a size adjective, a shape adjective, etc.?
contrasts

•
is it an opposite or complement of something else mentioned?
parallels

is it one of several otherwise similar modifiers in the sentence?
IIA3c: Create H
• Aristotelian definitions:
– What kind of thing is X?
– How does it differ from other things of that kind?
• Schwartz & Raphael definition “map”
– What is X?
– What is it like?
– What are some examples?
• Express (“important” parts of) definition frame
in a single sentence
– Cf. Collins COBUILD
Computation & Philosophy
• Computational philosophy =
– Application of computational (i.e., algorithmic) solutions
to philosophical problems
• Philosophical computation =
– Application of philosophy to CS problems
CVA as Philosophical Computation
• Origin of project:
– Rapaport, “How to Make the World Fit Our Language” (1981)
• (Intensional) theory of a word’s meaning for a person
as the set of contexts in which person has heard or seen word.
• Could that notion be made precise?
• Semantic-network theory offered a computational tool
• Developed into Karen Ehrlich’s CS Ph.D. dissertation (1995)
• Later, learned that computational linguists,
reading educators, L2 educators, psychologists,…
were all interested in this
– A really interdisciplinary cognitive-science problem
CVA as Computational Philosophy
1. CVA & holistic semantic theories:
–
Semantic networks:
•
“Meaning” of a node is its location in the entire network
– Holism:
•
–
Meaning of a word is its relationships to all other words in the language
Problems (Fodor & Lepore):
•
•
•
•
•
•
No 2 people ever share a belief
No 2 people ever mean the same thing
No 1 person ever means the same thing at different times
No one can ever change his/her mind
Nothing can be contradicted
Nothing can be translated
– CVA offers principled way to restrict “entire network”
to a useful subnetwork
•
•
That subnetwork can be shared across people, individuals, languages,…
Can also account for language/concept change
–
Via “dynamic”/“incremental” semantics
CVA as Computational Philosophy & Philosophical Computation (cont’d)
2. CVA and the Chinese Room
–
Searle’s CR argument from semantics:
1.
2.
3.

–
How would Searle-in-the-Room figure out the meaning of an
unknown squiggle?
•
–
Computer programs are purely syntactic
Cognition is semantic
Syntax alone does not suffice for semantics
No purely syntactic computer program can exhibit semantic cognition
By CVA techniques!
“Syntactic Semantics”
•
(Rapaport 1985ff)
Syntax does suffice for the kind of semantics needed for NLU in the CR
– All input—linguistic, perceptual, etc.—is encoded in a single network
(or: in a single, real neural network: the brain!)
– Relations—including semantic ones—among nodes of such a network
are manipulated syntactically
» Hence computationally (CVA helps make this precise)
CVA as Cognitive Science
• AI
–
–
–
–
•
•
•
•
•
knowledge representation
reasoning
natural-language understanding
acting(?)
Philosophy
Linguistics
Psychology
Reading
Education
Download