729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 That’s what she said: Double entendre identification 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Summary This paper addresses automatic humor identification by describing and evaluating one approach to solve a certain sub problem called the that’s what she said – problem (TWSS). The approach is called DEviaNT and focuses on euphemisms as metaphors for sexually explicit nouns and on the basic structure of TWSS jokes. How and why euphemisms are used in DEviaNT to detect metaphors in natural language is described alongside the rest of the agent’s properties. I go into detail about the learning boot camp of DEviaNT together with the comparison to other agents and in the end a conclusion and discussion. The benchmark agents were using only n-gram features deducted entirely from corpora. DEviaNT was found to be 12% better than any other agent at precision-recall of possible TWSS jokes in non-sexually explicit texts. 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Contents Summary ................................................................................................................................................. 2 Background.............................................................................................................................................. 4 Introduction............................................................................................................................................. 5 Overview of the approach ....................................................................................................................... 5 The DEviaNT approach ............................................................................................................................ 6 Word class analysis.............................................................................................................................. 7 Noun sexiness .................................................................................................................................. 7 Adjective sexiness............................................................................................................................ 7 Verb sexiness ................................................................................................................................... 8 Features ............................................................................................................................................... 8 Learning algorithm .............................................................................................................................. 9 Data sets ................................................................................................................................................ 10 Performance and Results ...................................................................................................................... 10 Case study.............................................................................................................................................. 12 Excercises .......................................................................................................................................... 12 Conclusion ............................................................................................................................................. 12 Discussion .............................................................................................................................................. 13 Final words ............................................................................................................................................ 14 Works Cited ........................................................................................................................................... 15 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Background Natural language is used as a way of representing cognitive phenomena in a shared way. The understanding of natural language is a considerable part of cognitive science and one of the toughest challenges of the field of artificial intelligence. Besides the fact that many words have different meanings according to different context there also exists what we call figuresof-speech or metaphors. These are made to describe something else than they literally represent and they have a way of making the language more colorful. See that I just made a metaphor right there? A language can’t have colors, yet you understand my point. A metaphor is a word or a phrase that in a specific context does not have the meaning of its lexical definition, like the language and the color example. This is done by using the terminology of one domain (colors and artistry) to describe something in a different domain (understanding and nuancing natural language). I have studied a paper where the authors have constructed an agent for recognizing a certain type of metaphor; namely the that’s what she said-jokes (TWSS). A TWSS joke is made when a sentence is said by someone that can be interpreted in a sexual way and then someone else answers: “That’s what she said!” It is often called double entendre from the French verb entendre – meaning to hear, or sometimes adianoeta from Greek. It is important that the sentence which makes the base of the TWSS joke is not by default a sexually explicit sentence or a sentence which is purposely erotic. Otherwise by saying TWSS would be to point out the obvious and you would make a fool out of yourself. Example: While reviewing some notes I said without thinking about it, "It's not that it's hard, it just all came in at once." (twssstories.com) From what we above learned about metaphors the original sentence is not sexual in its literal interpretation but it has an analogical mapping to a more sexual domain. This means that the words could be interpreted as sexual taking their non-literal meanings. The short explanation: The recently uttered words could have been said by a woman in a sexual context and the joke is made by pointing that out. 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Introduction This article is a review of the article That’s what she said: Double entendre identification written by Kiddon and Brun 2011. Kiddon and Brun have designed a TWSS recognition agent to detect possible places in normal text where inserting the TWSS would complete the joke. This is a sub problem to understanding artificial humor, which is at part of natural language. This phenomenon of understanding natural language is interesting since it requires profound semantic and cultural understanding. But you could also argue why it is important for science to research this, I personally find it hilarious that some people put large amounts of knowledge and computing into researching juvenile behavior. I just had to go deep into understanding all of its perks, advantages and its contribution to the total field of artificial intelligence. Overview of the approach Kiddon and Brun (2011) have designed a TWSS recognition agent that relies heavily on two basic patterns of speech. The first distinguishing characteristic is the use of nouns that are euphemisms for sexually explicit nouns. A euphemism is by definition: A mild or indirect word or expression for one too harsh or blunt when referring to something unpleasant or embarrassing (www.google.com). An example could be disciplined for beaten up, or passed away for died. The second distinguishing characteristic of the TWSS joke is the structure of short sentences common in the erotic domain. For example: [subject] put [object] in [object] or I could eat [object] all day (Brun, 2011). Kiddon and Brun (2011) choose to call their agent based on these features DEviaNT for Double Entendre via Noun Transfer and their study shows that DEviaNT is 12% better precision in recall than any other TWSS classifier. Kiddon and Brun (2011) argue that this is a new approach to TWSS detection. Earlier agents have been trained to identify metaphors by the unlikeliness of the words being used. This means that if a very uncommon word is used in a certain context it is more likely to be a metaphor (Shutova, 2010). 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 The DEviaNT approach To design DEviaNT, Kiddon and Brun used one corpus to for sexual content and one for nonsexual content. The corpus with sexual content they used was one unparsed erotic corpus with 1,5M sentences from www.txtfiles.com/sex/Erotica. They parsed this corpus with the Stanford parser (Stanford_parser) and tidied up the tags a bit to make it more generic. By tidying up I mean they lumped groups of tags together to make the computation and programming of DEviaNT easier. The pre parsed Brown standard corpus with 57K sentences was used for the non-sexual content and was also made a bit more generic by making some tags include all possible sub-tags. For example all the tags NN, NN$, NN+BEZ etc. all gets the tag NN, see Table 1. How to classify between sexual and non-sexual content was never discussed and therefore none of the sentences used as one or the other was assumed to belong to both quantities. The authors Kiddon and Brun (2011) derived a set of 76 sexually explicit nouns ๐๐ divided into 9 different categories from frequent use in the texts with sexual contexts. 61 of these made up a set of nouns which were likely targets for euphemisms,๐๐โบ. Important for DEviaNT is also a set of body parts that were approximated to 98 pcs. To determine whether a non-sexually explicit noun could be used as a euphemism for sexually explicit nouns they focused on adjectives frequently used to modify sexually explicit nouns. The non-sexually explicit nouns that also was frequently modified by these certain adjectives was set to be possible euphemisms. Table 1 – word class tags used by DEviaNT Tag Explanation Example NNP Proper Nouns London, Sarah, Microsoft CD Numbers 3,6,0 SN Sexually explicit nouns Penis BP Body parts Leg, hand, lip NN Remaining nouns Door, dog, ditch 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Word class analysis Noun sexiness The adjective count for each noun, n, is set to be the real value of all the adjectives that could possibly modify n in a proper sentence and is a part of both the Brown corpus and the erotic corpus. To determine the erotic undertones of a noun, it’s noun sexiness NS(n), the similarity of a noun, ๐ ∉ ๐๐โบ is calculated to each of the nouns โ ๐๐โบ and taking the maximum value of NS(n). The similarity is found by calculating the cosine similarities between the nouns adjective counts. This is done by using the Euclidian dot product between the vectors and then calculate the cosine of the angle between them. The interpretation in three dimensions is the percentage of the vectors that are pointing in the same direction. cos ๐ = ๐ด∗๐ต = โฅ ๐ด โฅ∗โฅ ๐ต โฅ ๐ด ∗๐ต ∑ (๐ด ) ∗ ∑ (๐ต) This formula could theoretically give you a value between -1 and 1 but in terms of information retrieval the value will never be negative, it will range from 0 to 1. The reason for this is the often used tf-idf weight function only works when figures are positive. The tf-idf weight function gives less frequent nouns more relative importance by weighting them with larger constants than more frequent ones(www.tfidf.com). The tf-idf function is calculated with respect to the frequency of n, the size of the document and the total number of documents. According to Wikipedia there are different versions of this formula, how and why they differ will not be reasoned in this paper since their ideas are the same. When evaluating noun sexiness of each noun from the training data they found a large variance between the nouns. To make a distinction they set a cut value to use as threshold for uncommon nouns. The focal value of NS(n)= 10-7 was set for nouns not common or not sexy enough to make the cut. Examples of very sexy non-sexually explicit nouns were meat and rod. Adjective sexiness Adjective sexiness, AS(a) defines the probability of the adjective a to modify a sexy noun ๐ โ ๐๐โบ. AS(a) is determined by calculating the relative frequency of a in sentences, s, in the erotic corpus which also contain at least one sexy noun, ๐ โ ๐๐โบ. Adjectives with relatively high AS(a) are for example hot or wet. If this is an accurate way of pinpointing the sexiness of an adjective could be argued, but that’s the way Kiddon and Brun defined them. 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Verb sexiness Let SE= The set of sentences in the erotic corpus with noun, ๐ โ ๐๐. SB = The set of all the sentences in the Brown corpus. Sentence s contains the verb v in the verb phrase v which is bound by a noun or {I, you, it, me}. If none of these the border of v will be represented by the verb v itself. Verb sexiness, VS(v) is defined as the approximated probability of v existing in sentence s which in turn is a part of an erotic or non-erotic context, calculated with respect to SE and SE respectively. (P(๐ฏ โ s|s โ ๐ ), (P(๐ฏ โ s|s โ ๐ ) The uneven sizes of the corpora make it important to normalize the probability of s existing in each of the corpora, constructing the following equivalent. P(s โ ๐ ) = P(s โ ๐ ) VS(v) is the probability of the described action to metaphorically be an action in an erotic context. ๐๐(๐) = ๐(๐ โ ๐ |๐ โ ๐) = ๐ต๐๐ฆ๐๐ ๐กโ๐๐๐๐๐ = ๐(๐ โ ๐ |๐ โ ๐ )๐(๐ โ ๐ ) ๐(๐ โ ๐ ) Features To not bore my readers too much I won’t go too deep into the feature details of DEviaNT, but the detection of euphemisms is based on the whether the sentence s contains a sexy noun SN, a body part BP, very unsexy nouns and the average sexiness of all the other nouns NN, possible of being euphemisms. To detect the structure of TWSS jokes DEviaNT uses a two-part approach. The first part is based on whether the sentence s contains verbs or verb phrases not found in the erotic training data, the average sexiness of the verbs and adjectives in the sentence and the existence of any unfitting adjectives, the ones that got the focal value of 10-7. The second part of the structure detection module is intuitively tricky to understand the importance of but readers will later find out that the efficiency of this part is way beyond imagination. This second part focuses on punctuation, counts of word classes and the 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 existence of a noun or a pronoun that could be the subject of the sentence. The approximation is that the first noun of the sentence will be the subject. This second part is what they call the basic structure version of DEviaNT which was, as you will later see, also benchmarked against other top n-gram models. Learning algorithm The DEviaNT approach uses a SVM (Support Vector Machine) classifier from the WEKA machine learning package as its needed vessel for real world testing. SVMs are programs made to make binary classifications between n-dimensional vectors and the WEKA package is an easy over-the-counter turn-key application which also makes it possible to do precisionrecall analysis. The precision-recall analysis is important because of the nature of the TWSS jokes. Since few people will notice when you miss an opportunity to make a TWSS joke the cost of a false negative is low. A false positive on the other hand is considered tacky and stupid. A false positive is when you try to make the joke by saying that’s what she said! in the wrong instant, when it is not possible to make a the domain transfer. DEviaNT uses a metaclassifier called MetaCost to make the trade-off of tolerating more false negatives to yank the precision of recall by making false positives one hundred times more costly than false negatives. The MetaCost algorithm is put around the key classifier to make the metaclassification. This means that is it easily removed and it is compatible with all classifiers since it is not affected by the classifier itself, only its results. The MEtaCost metaclassifier uses bagging – boot strap aggregating - to make the classifier cost sensitive. Boot strapping is itself a metaphor. It is basically a statistical method made to fine tune the results of a study without making more observations. The general idea of Boot strapping is that by generating new sets of data points in a vector space from each of the originally observed data points with respect to the variance of their distribution you can get a richer, and still valid, set of results. A richer set of results makes more actions, like Källa: http://calameda.wordpress.com/ 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 metacost classification, possible. It is like synthetizing eggs to make new hens or like rescuing yourself from a swamp by pulling yourself up in your own boot straps. SVMs are able to classify data points in any dimension by using a kernel trick. A kernel trick is a mathematical formula for evaluating datasets in higher dimensions. With respect to the kernel trick it is mostly possible to find a (linear) line of demarcation between positive and negative data points in a dimension one or more steps higher than the original one. Data sets When training DEviaNT to identify TWSS jokes Kiddon and Brun made sure to make the positive and the negative samples very similar to each other making the distinction between them hard for DEviaNT. The 50% negative training examples were taken from the internet including funny sentences, risqué sentences and famous historical quotes. None of these were humanly evaluated as negative but were assumed negative since classification of what an erotic context is and is not wasn’t part of this study. The testing was made with about 20000 parsed sentences where 99% of the examples were negative. Performance DEviaNT uses the above mentioned characteristics of euphemisms and the structural similarities between the two different domains, sexual and non-sexual context, to identify possible TWSS jokes. There are other agents made to identify metaphors, most of them using an n-gram model strictly deducted from the training data. Kiddon and Brun (2011) argue that DEviaNT can compete “where it matters the most” meaning high precision recall. Exactly where the trade-off between false positives and false negatives is optimal is a discussion too subjective for this paper to make a standing point. DEviaNT was compared to seven other classifiers, six of them being SVM models and Naïve Bayes models and one being a stripped version of DEviaNT. The stripped version of DEviaNT used only the last part of features, the basic structure of the TWSS jokes. The benchmark classifiers had features automatically deducted entirely from corpora. They include thousands of features of which none were tinkered with by human touch. The study tested classifiers trained on unigram and bigram features, with or without the MetaCost addon. 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 There is nothing startling about DEviaNT being superior in their own study but the stripped verson of DEviaNT, the basic structure, was surprisingly enough second best at precision recall. As noted earlier this makes no sense to me, it is satisfying to know that there is always more to understand. They tested adding unigram feature to DEviaNT to see if this improved performance, but it did not. This is probably because of the large amount of false positives that the unigram featured models returns. As for the Naïve Bayes model, the naïve part of it might have something to do with the low precision recall. Figure 1 – Precision-recall curves for DEviaNT and and baseline competitors. Kiddon & Brun 2011 Results DEviaNT returned 28 TWSS, of these were about 72% true positives. The best competitor returned 130 TWSS with almost 60% accuracy. This makes DEviaNT 12% better than the second best competitor with regards to precision-recall. Why precision is more important than total number of recalls is because of the nature of the TWSS joke and discussed above. The authors seem pleased with the results given the relatively low percentage of positive examples of the testing data. They predict that in a 50/50 testing data the precision would be about 99%. 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Case study Visiting the private part of www.holgerspexet.se there is a list of memorable quotes said by members of Holgerspexet through the years. By taking the top one of the list I find “ooh, I’m so satisfied…” This is a quote from a meeting where a mutual agreement was met regarding the choice of a specific song being played in the show, and apparently the decision pleased the person concerned. This could absolutely be a TWSS moment because of a number of reasons. Firstly, there is a subject in the beginning of this sentence. Secondly, the sentence is short and made as a claim of something affecting the subject. Thirdly the word satisfied is uncommon is the context of decisions being made. Lastly, the word satisfied is very likely to be a euphemism, in this case in the context of sexual exhaustion. Excercises Are the following sentences possible TWSS or not? 1. My sister said that she could be a bit late this afternoon. 2. When I was a newlywed we had sex every single day for about a month. 3. I'm just going to lay back while you drive me. Conclusion The field of natural language is big and difficult to artificially understand since there are too many parameters to address and too many exceptions to any rules. The subject of artificial humor is a part of this which is especially difficult because of the nature of humor. A classical approach to humor is that you start your joke by leading your listeners down the wrong path as long as you can before you make an unsuspected twist to your story. The twist is often times made with a shift in paradigm between two different understandings of the same statement. The shift could be a shift of subject or intent, in this article they have focused on the shift between the literal and non-literal understanding of a single statement. It is not possible to make the DEviaNT approach general to all kinds of jokes since it is specialized on a single type of joke. One of the main parts of DEviaNT is the use of euphemisms. Euphemisms are used in our culture when discussing subjects too risqué to be comfortably explicit. Exactly where the line goes for an expression to be too much is very subjective and affected by culture. The DEviaNT has these definitions hard-coded into the system which makes it not just applicable in only one culture; it also makes it incapable of gradual change in humor preference. 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Discussion The up side of this article is the fact that DEviaNT actually preforms better at precision-recall than benchmark models. This does not only tell us that euphemisms are used in natural language and that some humor is based on them. The article also makes computation of euphemisms possible by stating a formal definition of what a euphemism could be. I am not completely satisfied with their approach to some of Kiddon and Bruns definitions about word sexiness and especially the adjective sexiness definition. The adjective sexiness calculations are completely deducted from statistical data. I would have wanted them to make an evaluation from a human point of view. By using a kind of a unigram approach this only makes DEviaNT partly better than the benchmark models, it could definitely become better still. The publication of this article is by all means not an obvious step closer to world peace but it is one small step in understanding artificial humor - a subject still largely unexplored. As a student of artificial intelligence this has not given me enough answers relative to all the new questions I’ve gotten along the way. I have become wiser by realizing how little both myself and mankind actually know about ourselves and how fatuous computers still are. After making an effort to thoroughly understand this work made in the name of science and research I’m still not sure in what way this makes us smarter as a species except the fact that we now have statistical evidence that some jokes are made by violating unspoken social rules. In this special case the social rule is: not offending people with explicit risqué statements. I am not sure if the fact that advanced mathematics like algebra and regression analysis is used in this way is pleasing me or annoying me. I mean that it is good that highly educated people still have a sense of humor but all the money spent on this research could have been spent in a better way. But then again, research doesn’t work that way. Nobody knows exactly what needs to be studied or how to find the next break through. Maybe this will someday come in handy when the line between man and machine is about to vanish. The total field of cognitive science is big and still young. 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Final words This has been a very interesting project. It took me quite some time to narrow my focus down enough to finally find this subject. It is not easy and isn’t meant to be and just like all of these experiences; it is worth it in the end. I’ve read a lot of different articles and in spite of the narrow approach of the assignment I’ve gotten a much broader view of the total filed of artificial intelligence. I have probably studied and learned more than I would have if the curriculum was set from the beginning like in many other courses. This is partly because it’s interesting and fun and partly because you as a person does what gets measured, if you don’t know you need to cover more possible areas. 729G11 Artificiell intelligens II IDA - Institutionen för datavetenskap Gustav Nygren Gusny960 2012-09-16 Works Cited Books Anton, H. (2005). Elementary linear algebra. Wiley. Articles Brun, C. K. (2011). That'swhat she said:Double entendre identification. Shutova, E. (2010). Automatic metaphor interpretation as a paraphrasing task. Proceedings of human language technologies. Web pages http://www.google.com http://nlp.stanford.edu:8080/parser/ http://www.tfidf.com