Document 13259107

advertisement
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
That’s what she said:
Double entendre identification
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Summary
This paper addresses automatic humor identification by describing and evaluating one
approach to solve a certain sub problem called the that’s what she said – problem (TWSS).
The approach is called DEviaNT and focuses on euphemisms as metaphors for sexually
explicit nouns and on the basic structure of TWSS jokes. How and why euphemisms are used
in DEviaNT to detect metaphors in natural language is described alongside the rest of the
agent’s properties. I go into detail about the learning boot camp of DEviaNT together with the
comparison to other agents and in the end a conclusion and discussion. The benchmark agents
were using only n-gram features deducted entirely from corpora. DEviaNT was found to be
12% better than any other agent at precision-recall of possible TWSS jokes in non-sexually
explicit texts.
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Contents
Summary ................................................................................................................................................. 2
Background.............................................................................................................................................. 4
Introduction............................................................................................................................................. 5
Overview of the approach ....................................................................................................................... 5
The DEviaNT approach ............................................................................................................................ 6
Word class analysis.............................................................................................................................. 7
Noun sexiness .................................................................................................................................. 7
Adjective sexiness............................................................................................................................ 7
Verb sexiness ................................................................................................................................... 8
Features ............................................................................................................................................... 8
Learning algorithm .............................................................................................................................. 9
Data sets ................................................................................................................................................ 10
Performance and Results ...................................................................................................................... 10
Case study.............................................................................................................................................. 12
Excercises .......................................................................................................................................... 12
Conclusion ............................................................................................................................................. 12
Discussion .............................................................................................................................................. 13
Final words ............................................................................................................................................ 14
Works Cited ........................................................................................................................................... 15
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Background
Natural language is used as a way of representing cognitive phenomena in a shared way. The
understanding of natural language is a considerable part of cognitive science and one of the
toughest challenges of the field of artificial intelligence. Besides the fact that many words
have different meanings according to different context there also exists what we call figuresof-speech or metaphors. These are made to describe something else than they literally
represent and they have a way of making the language more colorful. See that I just made a
metaphor right there? A language can’t have colors, yet you understand my point. A metaphor
is a word or a phrase that in a specific context does not have the meaning of its lexical
definition, like the language and the color example. This is done by using the terminology of
one domain (colors and artistry) to describe something in a different domain (understanding
and nuancing natural language).
I have studied a paper where the authors have constructed an agent for recognizing a certain
type of metaphor; namely the that’s what she said-jokes (TWSS). A TWSS joke is made
when a sentence is said by someone that can be interpreted in a sexual way and then someone
else answers: “That’s what she said!” It is often called double entendre from the French verb
entendre – meaning to hear, or sometimes adianoeta from Greek. It is important that the
sentence which makes the base of the TWSS joke is not by default a sexually explicit
sentence or a sentence which is purposely erotic. Otherwise by saying TWSS would be to
point out the obvious and you would make a fool out of yourself.
Example: While reviewing some notes I said without thinking about it, "It's not that it's hard, it
just all came in at once." (twssstories.com)
From what we above learned about metaphors the original sentence is not sexual in its literal
interpretation but it has an analogical mapping to a more sexual domain. This means that the
words could be interpreted as sexual taking their non-literal meanings.
The short explanation: The recently uttered words could have been said by a woman in a
sexual context and the joke is made by pointing that out.
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Introduction
This article is a review of the article That’s what she said: Double entendre identification
written by Kiddon and Brun 2011. Kiddon and Brun have designed a TWSS recognition agent
to detect possible places in normal text where inserting the TWSS would complete the joke.
This is a sub problem to understanding artificial humor, which is at part of natural language.
This phenomenon of understanding natural language is interesting since it requires profound
semantic and cultural understanding. But you could also argue why it is important for science
to research this, I personally find it hilarious that some people put large amounts of
knowledge and computing into researching juvenile behavior. I just had to go deep into
understanding all of its perks, advantages and its contribution to the total field of artificial
intelligence.
Overview of the approach
Kiddon and Brun (2011) have designed a TWSS recognition agent that relies heavily on two
basic patterns of speech. The first distinguishing characteristic is the use of nouns that are
euphemisms for sexually explicit nouns.
A euphemism is by definition: A mild or indirect word or expression for one too harsh or
blunt when referring to something unpleasant or embarrassing (www.google.com). An
example could be disciplined for beaten up, or passed away for died.
The second distinguishing characteristic of the TWSS joke is the structure of short sentences
common in the erotic domain. For example: [subject] put [object] in [object] or I could eat
[object] all day (Brun, 2011).
Kiddon and Brun (2011) choose to call their agent based on these features DEviaNT for
Double Entendre via Noun Transfer and their study shows that DEviaNT is 12% better
precision in recall than any other TWSS classifier.
Kiddon and Brun (2011) argue that this is a new approach to TWSS detection. Earlier agents
have been trained to identify metaphors by the unlikeliness of the words being used. This
means that if a very uncommon word is used in a certain context it is more likely to be a
metaphor (Shutova, 2010).
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
The DEviaNT approach
To design DEviaNT, Kiddon and Brun used one corpus to for sexual content and one for nonsexual content. The corpus with sexual content they used was one unparsed erotic corpus with
1,5M sentences from www.txtfiles.com/sex/Erotica. They parsed this corpus with the
Stanford parser (Stanford_parser) and tidied up the tags a bit to make it more generic. By
tidying up I mean they lumped groups of tags together to make the computation and
programming of DEviaNT easier. The pre parsed Brown standard corpus with 57K sentences
was used for the non-sexual content and was also made a bit more generic by making some
tags include all possible sub-tags. For example all the tags NN, NN$, NN+BEZ etc. all gets
the tag NN, see Table 1. How to classify between sexual and non-sexual content was never
discussed and therefore none of the sentences used as one or the other was assumed to belong
to both quantities.
The authors Kiddon and Brun (2011) derived a set of 76 sexually explicit nouns ๐‘†๐‘ divided
into 9 different categories from frequent use in the texts with sexual contexts. 61 of these
made up a set of nouns which were likely targets for euphemisms,๐‘†๐‘โŽบ. Important for
DEviaNT is also a set of body parts that were approximated to 98 pcs. To determine whether
a non-sexually explicit noun could be used as a euphemism for sexually explicit nouns they
focused on adjectives frequently used to modify sexually explicit nouns. The non-sexually
explicit nouns that also was frequently modified by these certain adjectives was set to be
possible euphemisms.
Table 1 – word class tags used by DEviaNT
Tag
Explanation
Example
NNP
Proper Nouns
London, Sarah, Microsoft
CD
Numbers
3,6,0
SN
Sexually explicit nouns
Penis
BP
Body parts
Leg, hand, lip
NN
Remaining nouns
Door, dog, ditch
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Word class analysis
Noun sexiness
The adjective count for each noun, n, is set to be the real value of all the adjectives that could
possibly modify n in a proper sentence and is a part of both the Brown corpus and the erotic
corpus. To determine the erotic undertones of a noun, it’s noun sexiness NS(n), the similarity
of a noun, ๐‘› ∉ ๐‘†๐‘โŽบ is calculated to each of the nouns โˆŠ ๐‘†๐‘โŽบ and taking the maximum value
of NS(n). The similarity is found by calculating the cosine similarities between the nouns
adjective counts. This is done by using the Euclidian dot product between the vectors and then
calculate the cosine of the angle between them. The interpretation in three dimensions is the
percentage of the vectors that are pointing in the same direction.
cos ๐œƒ =
๐ด∗๐ต
=
โˆฅ ๐ด โˆฅ∗โˆฅ ๐ต โˆฅ
๐ด ∗๐ต
∑ (๐ด ) ∗ ∑ (๐ต)
This formula could theoretically give you a value between -1 and 1 but in terms of
information retrieval the value will never be negative, it will range from 0 to 1. The reason for
this is the often used tf-idf weight function only works when figures are positive. The tf-idf
weight function gives less frequent nouns more relative importance by weighting them with
larger constants than more frequent ones(www.tfidf.com). The tf-idf function is calculated
with respect to the frequency of n, the size of the document and the total number of
documents. According to Wikipedia there are different versions of this formula, how and why
they differ will not be reasoned in this paper since their ideas are the same.
When evaluating noun sexiness of each noun from the training data they found a large
variance between the nouns. To make a distinction they set a cut value to use as threshold for
uncommon nouns. The focal value of NS(n)= 10-7 was set for nouns not common or not sexy
enough to make the cut.
Examples of very sexy non-sexually explicit nouns were meat and rod.
Adjective sexiness
Adjective sexiness, AS(a) defines the probability of the adjective a to modify a sexy noun
๐‘› โˆŠ ๐‘†๐‘โŽบ. AS(a) is determined by calculating the relative frequency of a in sentences, s, in the
erotic corpus which also contain at least one sexy noun, ๐‘› โˆŠ ๐‘†๐‘โŽบ. Adjectives with relatively
high AS(a) are for example hot or wet. If this is an accurate way of pinpointing the sexiness of
an adjective could be argued, but that’s the way Kiddon and Brun defined them.
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Verb sexiness
Let
SE= The set of sentences in the erotic corpus with noun, ๐‘› โˆŠ ๐‘†๐‘.
SB = The set of all the sentences in the Brown corpus.
Sentence s contains the verb v in the verb phrase v which is bound by a noun or {I, you, it,
me}. If none of these the border of v will be represented by the verb v itself.
Verb sexiness, VS(v) is defined as the approximated probability of v existing in sentence s
which in turn is a part of an erotic or non-erotic context, calculated with respect to SE and SE
respectively.
(P(๐ฏ โˆŠ s|s โˆŠ ๐‘† ), (P(๐ฏ โˆŠ s|s โˆŠ ๐‘† )
The uneven sizes of the corpora make it important to normalize the probability of s existing in
each of the corpora, constructing the following equivalent.
P(s โˆŠ ๐‘† ) = P(s โˆŠ ๐‘† )
VS(v) is the probability of the described action to metaphorically be an action in an erotic
context.
๐‘‰๐‘†(๐’—) = ๐‘ƒ(๐‘  โˆŠ ๐‘† |๐’— โˆŠ ๐‘†) = ๐ต๐‘Ž๐‘ฆ๐‘’๐‘  ๐‘กโ„Ž๐‘’๐‘œ๐‘Ÿ๐‘’๐‘š = ๐‘ƒ(๐’— โˆŠ ๐‘ |๐‘  โˆŠ ๐‘† )๐‘ƒ(๐‘  โˆŠ ๐‘† )
๐‘ƒ(๐’— โˆŠ ๐‘ )
Features
To not bore my readers too much I won’t go too deep into the feature details of DEviaNT, but
the detection of euphemisms is based on the whether the sentence s contains a sexy noun SN,
a body part BP, very unsexy nouns and the average sexiness of all the other nouns NN,
possible of being euphemisms.
To detect the structure of TWSS jokes DEviaNT uses a two-part approach. The first part is
based on whether the sentence s contains verbs or verb phrases not found in the erotic training
data, the average sexiness of the verbs and adjectives in the sentence and the existence of any
unfitting adjectives, the ones that got the focal value of 10-7.
The second part of the structure detection module is intuitively tricky to understand the
importance of but readers will later find out that the efficiency of this part is way beyond
imagination. This second part focuses on punctuation, counts of word classes and the
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
existence of a noun or a pronoun that could be the subject of the sentence. The approximation
is that the first noun of the sentence will be the subject. This second part is what they call the
basic structure version of DEviaNT which was, as you will later see, also benchmarked
against other top n-gram models.
Learning algorithm
The DEviaNT approach uses a SVM (Support Vector Machine) classifier from the WEKA
machine learning package as its needed vessel for real world testing. SVMs are programs
made to make binary classifications between n-dimensional vectors and the WEKA package
is an easy over-the-counter turn-key application which also makes it possible to do precisionrecall analysis. The precision-recall analysis is important because of the nature of the TWSS
jokes. Since few people will notice when you miss an opportunity to make a TWSS joke the
cost of a false negative is low. A false positive on the other hand is considered tacky and
stupid. A false positive is when you try to make the joke by saying that’s what she said! in the
wrong instant, when it is not possible to make a the domain transfer. DEviaNT uses a
metaclassifier called MetaCost to make the trade-off of tolerating more false negatives to
yank the precision of recall by making false positives one hundred times more costly than
false negatives. The MetaCost algorithm is put around the key classifier to make the
metaclassification. This means that is it easily removed and it is compatible with all classifiers
since it is not affected by the classifier itself, only its results.
The MEtaCost metaclassifier uses
bagging – boot strap aggregating - to
make the classifier cost sensitive. Boot
strapping is itself a metaphor. It is
basically a statistical method made to
fine tune the results of a study without
making more observations. The general
idea of Boot strapping is that by
generating new sets of data points in a
vector space from each of the originally
observed data points with respect to the
variance of their distribution you can get a
richer, and still valid, set of results. A richer
set of results makes more actions, like
Källa: http://calameda.wordpress.com/
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
metacost classification, possible. It is like synthetizing eggs to make new hens or like rescuing
yourself from a swamp by pulling yourself up in your own boot straps.
SVMs are able to classify data points in any dimension by using a kernel trick. A kernel trick
is a mathematical formula for evaluating datasets in higher dimensions. With respect to the
kernel trick it is mostly possible to find a (linear) line of demarcation between positive and
negative data points in a dimension one or more steps higher than the original one.
Data sets
When training DEviaNT to identify TWSS jokes Kiddon and Brun made sure to make the
positive and the negative samples very similar to each other making the distinction between
them hard for DEviaNT. The 50% negative training examples were taken from the internet
including funny sentences, risqué sentences and famous historical quotes. None of these were
humanly evaluated as negative but were assumed negative since classification of what an
erotic context is and is not wasn’t part of this study.
The testing was made with about 20000 parsed sentences where 99% of the examples were
negative.
Performance
DEviaNT uses the above mentioned characteristics of euphemisms and the structural
similarities between the two different domains, sexual and non-sexual context, to identify
possible TWSS jokes. There are other agents made to identify metaphors, most of them using
an n-gram model strictly deducted from the training data. Kiddon and Brun (2011) argue that
DEviaNT can compete “where it matters the most” meaning high precision recall. Exactly where the trade-off between false positives and false negatives is optimal is a discussion too
subjective for this paper to make a standing point.
DEviaNT was compared to seven other classifiers, six of them being SVM models and Naïve
Bayes models and one being a stripped version of DEviaNT. The stripped version of
DEviaNT used only the last part of features, the basic structure of the TWSS jokes. The
benchmark classifiers had features automatically deducted entirely from corpora. They
include thousands of features of which none were tinkered with by human touch. The study
tested classifiers trained on unigram and bigram features, with or without the MetaCost addon.
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
There is nothing startling about DEviaNT being superior in their own study but the stripped
verson of DEviaNT, the basic structure, was surprisingly enough second best at precision
recall. As noted earlier this makes no sense to me, it is satisfying to know that there is always
more to understand. They tested adding unigram feature to DEviaNT to see if this improved
performance, but it did not. This is probably because of the large amount of false positives
that the unigram featured models returns. As for the Naïve Bayes model, the naïve part of it
might have something to do with the low precision recall.
Figure 1 – Precision-recall curves for DEviaNT and and baseline
competitors. Kiddon & Brun 2011
Results
DEviaNT returned 28 TWSS, of these were about 72% true positives. The best competitor
returned 130 TWSS with almost 60% accuracy. This makes DEviaNT 12% better than the
second best competitor with regards to precision-recall. Why precision is more important than
total number of recalls is because of the nature of the TWSS joke and discussed above. The
authors seem pleased with the results given the relatively low percentage of positive examples
of the testing data. They predict that in a 50/50 testing data the precision would be about 99%.
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Case study
Visiting the private part of www.holgerspexet.se there is a list of memorable quotes said by
members of Holgerspexet through the years. By taking the top one of the list I find “ooh, I’m so satisfied…” This is a quote from a meeting where a mutual agreement was met regarding
the choice of a specific song being played in the show, and apparently the decision pleased the
person concerned. This could absolutely be a TWSS moment because of a number of reasons.
Firstly, there is a subject in the beginning of this sentence. Secondly, the sentence is short and
made as a claim of something affecting the subject. Thirdly the word satisfied is uncommon is
the context of decisions being made. Lastly, the word satisfied is very likely to be a
euphemism, in this case in the context of sexual exhaustion.
Excercises
Are the following sentences possible TWSS or not?
1. My sister said that she could be a bit late this afternoon.
2. When I was a newlywed we had sex every single day for about a month.
3. I'm just going to lay back while you drive me.
Conclusion
The field of natural language is big and difficult to artificially understand since there are too
many parameters to address and too many exceptions to any rules. The subject of artificial
humor is a part of this which is especially difficult because of the nature of humor. A classical
approach to humor is that you start your joke by leading your listeners down the wrong path
as long as you can before you make an unsuspected twist to your story. The twist is often
times made with a shift in paradigm between two different understandings of the same
statement. The shift could be a shift of subject or intent, in this article they have focused on
the shift between the literal and non-literal understanding of a single statement.
It is not possible to make the DEviaNT approach general to all kinds of jokes since it is
specialized on a single type of joke. One of the main parts of DEviaNT is the use of
euphemisms. Euphemisms are used in our culture when discussing subjects too risqué to be
comfortably explicit. Exactly where the line goes for an expression to be too much is very
subjective and affected by culture. The DEviaNT has these definitions hard-coded into the
system which makes it not just applicable in only one culture; it also makes it incapable of
gradual change in humor preference.
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Discussion
The up side of this article is the fact that DEviaNT actually preforms better at precision-recall
than benchmark models. This does not only tell us that euphemisms are used in natural
language and that some humor is based on them. The article also makes computation of
euphemisms possible by stating a formal definition of what a euphemism could be. I am not
completely satisfied with their approach to some of Kiddon and Bruns definitions about word
sexiness and especially the adjective sexiness definition. The adjective sexiness calculations
are completely deducted from statistical data. I would have wanted them to make an
evaluation from a human point of view. By using a kind of a unigram approach this only
makes DEviaNT partly better than the benchmark models, it could definitely become better
still.
The publication of this article is by all means not an obvious step closer to world peace but it
is one small step in understanding artificial humor - a subject still largely unexplored. As a
student of artificial intelligence this has not given me enough answers relative to all the new
questions I’ve gotten along the way. I have become wiser by realizing how little both myself
and mankind actually know about ourselves and how fatuous computers still are.
After making an effort to thoroughly understand this work made in the name of science and
research I’m still not sure in what way this makes us smarter as a species except the fact that we now have statistical evidence that some jokes are made by violating unspoken social rules.
In this special case the social rule is: not offending people with explicit risqué statements. I
am not sure if the fact that advanced mathematics like algebra and regression analysis is used
in this way is pleasing me or annoying me. I mean that it is good that highly educated people
still have a sense of humor but all the money spent on this research could have been spent in a
better way. But then again, research doesn’t work that way. Nobody knows exactly what needs to be studied or how to find the next break through. Maybe this will someday come in
handy when the line between man and machine is about to vanish. The total field of cognitive
science is big and still young.
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Final words
This has been a very interesting project. It took me quite some time to narrow my focus down
enough to finally find this subject. It is not easy and isn’t meant to be and just like all of these
experiences; it is worth it in the end. I’ve read a lot of different articles and in spite of the
narrow approach of the assignment I’ve gotten a much broader view of the total filed of
artificial intelligence. I have probably studied and learned more than I would have if the
curriculum was set from the beginning like in many other courses. This is partly because it’s interesting and fun and partly because you as a person does what gets measured, if you don’t know you need to cover more possible areas.
729G11
Artificiell intelligens II
IDA - Institutionen för datavetenskap
Gustav Nygren
Gusny960
2012-09-16
Works Cited
Books
Anton, H. (2005). Elementary linear algebra. Wiley.
Articles
Brun, C. K. (2011). That'swhat she said:Double entendre identification.
Shutova, E. (2010). Automatic metaphor interpretation as a paraphrasing task. Proceedings of human
language technologies.
Web pages
http://www.google.com
http://nlp.stanford.edu:8080/parser/
http://www.tfidf.com
Download