An Investigation into Computational Recognition of Children’s Jokes

advertisement
An Investigation into Computational Recognition of Children’s Jokes
Julia M. Taylor and Lawrence J. Mazlack
Applied Artificial Intelligence Laboratory
University of Cincinnati
Cincinnati, Ohio
tayloj8@email.uc.edu and mazlack@uc.edu
Overview1
rest of the words of the second sentence, triggers the second script. This second script has an overlap with the first
script, yet their effects are different. In this paper, the difference in effects is defined as script opposition of SSTH.
It should be noted that since the jokes are taken from
children’s books, the situations that are described may be
atypical for adult literature and normal situations. For example, abstract or nonliving entities act or are described as
living. And living things gain extra benefits such as animals talking. For this reason, the standard scripts are
modified to accommodate the children’s world.
The purpose of this paper is to describe an investigation
into an ontology-based computational recognition of children’s jokes. While humor has been studied for centuries,
computational humor has received very little attention.
This may in part be due to the difficulty of the task: at the
very least, it requires formal methods for humor generation/recognition and ‘‘being able to produce/interpret natural language, being capable of subtle and flexible inferences, and having a vast store of knowledge about the real
world.” [Ritchie, 2004] There are some humor generators
(see Ritchie [2004] for a review) and a handful of humor
recognizers. Yet, “If computers are ever going to communicate naturally and effectively with humans, they must be
able to use humor.” [Binsted, 2006]
We are interested in recognition, not generation of humor. Recognition of all verbally expressed humor is an
overly broad task. To narrow the task, only jokes for young
children are considered. The reduction of the domain size
to young children’s jokes is expected to decrease the complexity and sophistication of the language to be analyzed.
This in turn decreases the knowledge that needs to be captured for text interpretation.
To further narrow the domain, only two-sentence-long
jokes that are based on phonological ambiguity will be
considered. The first sentence points to a typical situation,
or a script; the second sentence provides a “reasonable”
explanation or an answer to the first sentence by utilizing
similar sounding words related to the described situation.
The second sentence may or may not follow the same
situation as the first sentence. If it does not, a different
situation or script is found to accommodate the
explanation.
According to Script-based Semantic Theory of Humor
(SSTH) [Raskin, 1985], a text is a joke if it is fully or partially compatible with two scripts that overlap and oppose.
In the subset of jokes that has been chosen for this project,
the first sentence provides the first script. The second
sentence usually contains an utterance that is incompatible
with the first script. A word (or utterance) that sounds
similar to a word in the second sentence, together with the
Concepts, Relationships, Scripts
Background knowledge and data meaning (semantics) can
be provided by ontologies. Ontologies provide the capability to represent objects, concepts and other entities that
exist in an area of interest as well as the relationships that
hold among them. Since we are only interested in children’s jokes for this project, the knowledge in the ontology
is mostly collected from a children’s dictionary and a collection of children’s texts. The ontology is manually created, description logics are used for knowledge representation.
The concepts derived from words in a children’s dictionary, containing approximately 2500 entries. The concept
hierarchy is modeled from WordNet [Fellbaum, 1998].
Each noun in the dictionary is an instance of a concept in
the concept hierarchy. Each concept describes common
properties of a collection of instances. An instance belongs
to each concept to a certain degree (membership value).
The degree of membership assigned to the relationship is
based on fuzzy logic [Zadeh, 1965].
Semantic relationships between concepts are added from
a collection of children’s texts and definitions in a children’s dictionary as well. These relationships are represented as roles in description logic. The chosen texts are of
the same genre as the jokes to be recognized. In other
words, if the jokes are about animals, the texts are about
animals as well.
The scripts required to recognize jokes are constructed
from ontological concepts and relationships. The scripts
are defined as graphs, with the concepts representing vertices, and relationships representing edges. The scripts are
divided into condition and effect. Just as each instance
c 2007, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
1904
(word) has a membership value of a concept, it also has a
membership value of a script. A word belongs to a script if
it has a high membership value of a concept, and this concept is part of a script. If a parent concept is a part of a
script, all child concepts will be considered to be a part of a
script as well.
To find these concepts, a set difference is computed between a set of all concepts and a set of found concepts of
one of the selected scripts s1. All instances of the concepts
in the set difference are compared with the instances of the
concepts of each selected script si. Whenever an instance i
has a high membership value of a concept in the set difference and a low membership value of a concept in one of
the scripts si, i is compared to other words in the text. If
there is a word w that sounds similar to i, it is assumed that
the pair (w, i) is what the joke is based on. Furthermore,
since i is an instance of a concept of s1, and w is an instance
of a concept of si, s1 and si are the two scripts that overlap
and oppose.
Similar Sounding Words
We are interested in recognition of jokes that are based on
similar sounding words or homophones. To recognize
such jokes, the pronunciation of all words in each joke is
found.
A Carnegie Mellon University Pronouncing Dictionary
[Weide, 1994] is used to find pronunciations of words in
the joke. It is also used to find other words that are instances of concepts in the ontology, and have similar pronunciations to words in the joke. The pronunciation similarity is determined using a cost table by Hempelmann
[2003].
Initial forms of the words that have highly similar
pronunciation are added as instances of concepts with low
degree of membership to this concept. This means that the
membership function is not only semantic in nature, but
also has phonological weight. For example, if i1 is an instance of concept a, and pronunciation of some word i2 is
close to pronunciation i1, then i2 can be added as an instance to concept a with medium degree of membership.
The degrees of membership of i2 is computed based on the
degree of membership of i1 and degree of similarity of pronunciation between i1 and i2.
Epilogue
The project aims at computational recognition of children’s
jokes that are based on phonological similarity of words.
The texts are considered jokes when two scripts are found
that contain most of the concepts in the text; and the scripts
overlap in script condition and oppose in effect. The
similar sounding words have to belong to two different
scripts.
It is expected, that once the project is complete, it can be
used to help children and second-language learners master
the language. Additionally, it can help in evaluation of
language performance of native speakers as well as second-language learners.
Once the recognizer is able to discern these types of
jokes, it should be possible to expand the domain to nonchildren’s jokes that are similar in form but require more
knowledge. With a successful natural language understanding system, it could then be possible to expand the
recognizer to other types of humor.
Joke Recognition
In this project, a text is considered a joke iff:
References
x The text contains two scripts, s1 and s2, that overlap and
oppose
x If (w1, w2) is a pair of similar sounding words, then w1 is
an instance of a concept of s1, and w2 is an instance of a
concept of s2. In other words, the transition from w1 to
w2 triggers the switch from s1 to s2.
If both scripts are not found, or the similar sounding words
do not fit the script, a text will not be considered a joke.
The success of joke recognition depends on finding the
two scripts that optimally fit the text. To do so, all concepts that contain at least one instance (word) of a joke will
be considered. The entire set of these concepts will be
used for heuristic script selection. The heuristic is based
on salience of concepts in a given script. All selected
scripts may have similar condition, but have different effect. The selection of scripts is manually verified.
The scripts that are chosen as potential candidates for
“joke scripts” are unlikely to contain all concepts that are
usually present in the “perfect” scripts. The presence of
some concepts is not required for joke recognition. Other
“absent” concepts may be triggered by wordplay.
Binsted, K., Bergen, B., Coulson, S., Nijholt, A., Stock, O.,
Strapparava, C., Ritchie, G., Manurung, R., Pain, H.,
Waller, A., and O'Mara, D. 2006. Computational Humor.
IEEE Intelligent Systems (special sub-issue) 21: 59-69.
C. Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press.
Hemplemann, C. 2003. Paronomasic Puns: Target Recoverability Towards Automatic Generation, Doctoral dissertation. Purdue University, Indiana.
Raskin, V. 1985. Semantic Mechanisms of Humor.
Dordrecht: Reidel.
Ritchie, G. 2004. The Linguistic Analysis of Jokes,
Routledge: London and New York.
Weide,
R.
CMU
Pronouncing
Dictionary.
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Zadeh, L. 1965. Fuzzy Sets. Information and Control.
8:338-353.
1905
Download