An Investigation into Computational Recognition of Children’s Jokes

An Investigation into Computational Recognition of Children’s Jokes Julia M. Taylor and Lawrence J. Mazlack Applied Artificial Intelligence Laboratory University of Cincinnati Cincinnati, Ohio tayloj8@email.uc.edu and mazlack@uc.edu Overview1 rest of the words of the second sentence, triggers the second script. This second script has an overlap with the first script, yet their effects are different. In this paper, the difference in effects is defined as script opposition of SSTH. It should be noted that since the jokes are taken from children’s books, the situations that are described may be atypical for adult literature and normal situations. For example, abstract or nonliving entities act or are described as living. And living things gain extra benefits such as animals talking. For this reason, the standard scripts are modified to accommodate the children’s world. The purpose of this paper is to describe an investigation into an ontology-based computational recognition of children’s jokes. While humor has been studied for centuries, computational humor has received very little attention. This may in part be due to the difficulty of the task: at the very least, it requires formal methods for humor generation/recognition and ‘‘being able to produce/interpret natural language, being capable of subtle and flexible inferences, and having a vast store of knowledge about the real world.” [Ritchie, 2004] There are some humor generators (see Ritchie [2004] for a review) and a handful of humor recognizers. Yet, “If computers are ever going to communicate naturally and effectively with humans, they must be able to use humor.” [Binsted, 2006] We are interested in recognition, not generation of humor. Recognition of all verbally expressed humor is an overly broad task. To narrow the task, only jokes for young children are considered. The reduction of the domain size to young children’s jokes is expected to decrease the complexity and sophistication of the language to be analyzed. This in turn decreases the knowledge that needs to be captured for text interpretation. To further narrow the domain, only two-sentence-long jokes that are based on phonological ambiguity will be considered. The first sentence points to a typical situation, or a script; the second sentence provides a “reasonable” explanation or an answer to the first sentence by utilizing similar sounding words related to the described situation. The second sentence may or may not follow the same situation as the first sentence. If it does not, a different situation or script is found to accommodate the explanation. According to Script-based Semantic Theory of Humor (SSTH) [Raskin, 1985], a text is a joke if it is fully or partially compatible with two scripts that overlap and oppose. In the subset of jokes that has been chosen for this project, the first sentence provides the first script. The second sentence usually contains an utterance that is incompatible with the first script. A word (or utterance) that sounds similar to a word in the second sentence, together with the Concepts, Relationships, Scripts Background knowledge and data meaning (semantics) can be provided by ontologies. Ontologies provide the capability to represent objects, concepts and other entities that exist in an area of interest as well as the relationships that hold among them. Since we are only interested in children’s jokes for this project, the knowledge in the ontology is mostly collected from a children’s dictionary and a collection of children’s texts. The ontology is manually created, description logics are used for knowledge representation. The concepts derived from words in a children’s dictionary, containing approximately 2500 entries. The concept hierarchy is modeled from WordNet [Fellbaum, 1998]. Each noun in the dictionary is an instance of a concept in the concept hierarchy. Each concept describes common properties of a collection of instances. An instance belongs to each concept to a certain degree (membership value). The degree of membership assigned to the relationship is based on fuzzy logic [Zadeh, 1965]. Semantic relationships between concepts are added from a collection of children’s texts and definitions in a children’s dictionary as well. These relationships are represented as roles in description logic. The chosen texts are of the same genre as the jokes to be recognized. In other words, if the jokes are about animals, the texts are about animals as well. The scripts required to recognize jokes are constructed from ontological concepts and relationships. The scripts are defined as graphs, with the concepts representing vertices, and relationships representing edges. The scripts are divided into condition and effect. Just as each instance c 2007, Association for the Advancement of Artiﬁcial Copyright Intelligence (www.aaai.org). All rights reserved. 1904 (word) has a membership value of a concept, it also has a membership value of a script. A word belongs to a script if it has a high membership value of a concept, and this concept is part of a script. If a parent concept is a part of a script, all child concepts will be considered to be a part of a script as well. To find these concepts, a set difference is computed between a set of all concepts and a set of found concepts of one of the selected scripts s1. All instances of the concepts in the set difference are compared with the instances of the concepts of each selected script si. Whenever an instance i has a high membership value of a concept in the set difference and a low membership value of a concept in one of the scripts si, i is compared to other words in the text. If there is a word w that sounds similar to i, it is assumed that the pair (w, i) is what the joke is based on. Furthermore, since i is an instance of a concept of s1, and w is an instance of a concept of si, s1 and si are the two scripts that overlap and oppose. Similar Sounding Words We are interested in recognition of jokes that are based on similar sounding words or homophones. To recognize such jokes, the pronunciation of all words in each joke is found. A Carnegie Mellon University Pronouncing Dictionary [Weide, 1994] is used to find pronunciations of words in the joke. It is also used to find other words that are instances of concepts in the ontology, and have similar pronunciations to words in the joke. The pronunciation similarity is determined using a cost table by Hempelmann [2003]. Initial forms of the words that have highly similar pronunciation are added as instances of concepts with low degree of membership to this concept. This means that the membership function is not only semantic in nature, but also has phonological weight. For example, if i1 is an instance of concept a, and pronunciation of some word i2 is close to pronunciation i1, then i2 can be added as an instance to concept a with medium degree of membership. The degrees of membership of i2 is computed based on the degree of membership of i1 and degree of similarity of pronunciation between i1 and i2. Epilogue The project aims at computational recognition of children’s jokes that are based on phonological similarity of words. The texts are considered jokes when two scripts are found that contain most of the concepts in the text; and the scripts overlap in script condition and oppose in effect. The similar sounding words have to belong to two different scripts. It is expected, that once the project is complete, it can be used to help children and second-language learners master the language. Additionally, it can help in evaluation of language performance of native speakers as well as second-language learners. Once the recognizer is able to discern these types of jokes, it should be possible to expand the domain to nonchildren’s jokes that are similar in form but require more knowledge. With a successful natural language understanding system, it could then be possible to expand the recognizer to other types of humor. Joke Recognition In this project, a text is considered a joke iff: References x The text contains two scripts, s1 and s2, that overlap and oppose x If (w1, w2) is a pair of similar sounding words, then w1 is an instance of a concept of s1, and w2 is an instance of a concept of s2. In other words, the transition from w1 to w2 triggers the switch from s1 to s2. If both scripts are not found, or the similar sounding words do not fit the script, a text will not be considered a joke. The success of joke recognition depends on finding the two scripts that optimally fit the text. To do so, all concepts that contain at least one instance (word) of a joke will be considered. The entire set of these concepts will be used for heuristic script selection. The heuristic is based on salience of concepts in a given script. All selected scripts may have similar condition, but have different effect. The selection of scripts is manually verified. The scripts that are chosen as potential candidates for “joke scripts” are unlikely to contain all concepts that are usually present in the “perfect” scripts. The presence of some concepts is not required for joke recognition. Other “absent” concepts may be triggered by wordplay. Binsted, K., Bergen, B., Coulson, S., Nijholt, A., Stock, O., Strapparava, C., Ritchie, G., Manurung, R., Pain, H., Waller, A., and O'Mara, D. 2006. Computational Humor. IEEE Intelligent Systems (special sub-issue) 21: 59-69. C. Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press. Hemplemann, C. 2003. Paronomasic Puns: Target Recoverability Towards Automatic Generation, Doctoral dissertation. Purdue University, Indiana. Raskin, V. 1985. Semantic Mechanisms of Humor. Dordrecht: Reidel. Ritchie, G. 2004. The Linguistic Analysis of Jokes, Routledge: London and New York. Weide, R. CMU Pronouncing Dictionary. http://www.speech.cs.cmu.edu/cgi-bin/cmudict Zadeh, L. 1965. Fuzzy Sets. Information and Control. 8:338-353. 1905

An Investigation into Computational Recognition of Children’s Jokes

Related documents

Products

Support

An Investigation into Computational Recognition of Children’s Jokes

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib