Document 17865270

>> Michel Galley: So … let’s get started. So as a former student of Kathy McKeown, it’s my immense pleasure to welcome her to Microsoft Research. So … there is no need, of course, to introduce Kathy, but she’s done a lot of work in generation, summarization, and question answering. So she’s a professor at Columbia University, also the director of newly-founded Institute for Data Sciences and Engineering. And so today, she’ll be talking about natural language applications from fact to fiction. Kathy? >> Kathleen McKeown: Okay, thanks Michel. So I’m going … [coughs] sorry. I have a cold, so I hope I won’t cough too much. I’m going to be talking today about a work that’s been done in my group over about a ten-year period, where we started looking at data that was primarily about the world—fact— and we have moved sort of full circle and have recently been looking at work that is fiction. So I’ll be talking today about the work of some of my current students, who I have here. I want to acknowledge them upfront, but you’ll see occasionally they’ll pop up at different points during the talk. It also builds on the work of my past students, who are situated at various places around the country and actually, around the world. So I’m going to be talking about work that we’ve done with data that falls along a continuum from fact to fiction. So at the top, you see texts—kinds of texts that are more factual in nature: they refer to things that either happened in the world or they report on fact as we know from scientific experiment— and as we move down the continuum, we come to data which is more subjective in nature. We’re likely to get things like opinions expressed; it’s likely to be a bit more informal; and when we get to the bottom of the continuum, then we have data that’s actually not about the real world at all in any way and has to do more with what people have written. So one could argue about whether scientific journal articles or news are more factual, but in order to align the continuum with the order in which we’ve done the work, I’m going to put news first. And if we look at the different kinds of genres that we have here, one of the things that we can see is that the language of genres is remarkably different. I’m going to be using a lot of examples from references to hurricane Sandy, which for us in New York, was a pretty big deal and we still continue to talk about it. If you look at all that comes down, you can see that it’s quite different, and even from the language that’s used, you can probably identify which genre is which, but let’s just try it as we go through. So you can call out whether it’s social media, scientific journal, news, or novel. So the first one: what do people think? [murmurs] Social media. The second one? [murmurs] Scientific paper. The third one? [murmurs] News, yes. And the fourth? [murmurs] Novels. ‘Kay? Now this, just in overview before we get into the talk itself, is a word cloud of the kind of work that we’ve done in my group, and so you can see summarization is there right front-and-center—that’s been a lot of the focus of our work—this comes … this word cloud was created from the abstracts of my group’s papers over the last … I think it was about fifteen years. In addition to summarization, we can see some other things, like generation; we’ve done some multilingual work, so translation shows up; a little bit of work with patient … I’m not sure why speech is so large—we … I guess that’s all due to Michel Galley, here in front. So our vision is to be able to generate presentations that connect things from events from these different sources. So to be able to connect information about events, opinions about those events, personal accounts about the same events and their impact on the world. We’d like to be able to link to supporting science, and ultimately to link to fiction—that’s a little bit more in the future. So let’s start with news, and as you all know, there’s been a big effort on news within the natural language community—a lot of work within the linguistic data consortium—on collecting the news, annotating it, making it available to research groups to use for their research, and for that reason, a lot of the initial progress, both in my own group and in other groups around the world, was on news. Certainly in my group, a lot of the work that we’ve done has been on summarization, and shown here is a page out of our newsblaster system which was developed … gee, almost fifteen years ago—a little bit less than that—we made it go live on 2001. And this … here we are looking to generate summaries of multiple articles about the same event. So here you see one on hurricane Sandy, and as we saw, this is where the initial example came from. “Hurricane Sandy churned about two hundred and ninety miles off the mid-Atlantic coast Sunday night …” and so forth. Now the first question you may ask is: why is summarization hard? Well, it seems to require both interpretation and generation of text. Furthermore, it seems to require doing that in unrestricted domain; we don’t know ahead of time what topic we’re gonna get text on and what kind of domain information we need. We need to be able to handle those documents, though, robustly, and thus, it seems that we need to be able to operate without full semantic interpretation. And in the summarization field, this has led many researchers to use what is called sentence selection, where sentences are selected out of the input documents on the basis of salience or importance, and then the sentences are strung together to form the summary. And this is a way to get a system working quickly. It also, though, can lead to some problems, where sentences placed side-by-side may create some misconceptions, or they may have missing information. For example, who is being referred to in the article may be very clear, but when you put it in the summary, it’s not. So our approach at Columbia has been to use sentence selection, but then to edit the selected sentences. And the kinds of things that we worked on have been to correct references to people that are infelicitous. So the first time you have a reference to a person, you’d like to have a full reference so that we can understand who the person is, and if we continue referring to that person throughout the summary, then we might have a pronoun. So this was work done by Ani Nenkova. Compression is still a big topic, and perhaps even a bigger topic now, because when we extract sentences—especially from news—they can be quite long, and so what we’d like to do is to be able to remove extraneous material from those sentences to make them shorter so we have a concise summary. And particularly, if we think now about doing summarization on mobile devices, we might like something quite short—just a single sentence—to appear there. This is work that we’ve … an area that we’ve worked in for quite a while from Hongyan Jing in 2000, from Michel Galley in his dissertation, and it’s something that we’re continuing to work on now. We … given that a lot of the work that we’ve done has been done in a multilingual environment, and we’ve had to generate either summaries or answers from translated texts which are often disfluent—as a matter of fact, the first … yes … the first time we began working in this area, and we saw what we were going to have to generate answers from, we were like, “are you kidding?” But of course, things have gotten a lot better; nonetheless, when we do summarization or question answering from translated text, we have a lot more context from the task that was not available at the time of translation, and we use that information to make fluent sentences from disfluent translated ones. And then, we’ve been doing work on generating new sentences by selecting phrases from the input sentences through fusion. Now this is walking a fine line: it’s easy to make a good sentence bad. If you extract it, somebody has written it. There has been quite a bit of work though now in the field of text to text generation, and I won’t go through it, but you can see that a lot of different people have been looking at this problem. In our current work, which is the work of Kapil Thadani, we’re modeling text transformation as a structured prediction problem. So the input is one or more sentences with parses, and the output is a single sentence with a parse. And we’re doing it in multi-view structured predictions, so we can take different kinds of informations into account at the same time—simultaneously—as we re-word the sentence. So we’re doing joint inference over word choice—constraints about word choice—using information from n-gram ordering and dependency structure. We do this in a supervised fashion; so we start with a data set for compression, and you can see here one of the very long sentences is in input, and then in output, we have removed all of the information in white—so, “flying towards Europe yesterday” which may not be as important for this summary as just the fact that air force fighters were scrambling to intercept a Libyan airliner. So the framework that has been developed can be used for different kinds of text to text generation tasks. It can be used for sentence compression: here the input is a single sentence, and the output will be a shorter sentence with salient information. And we use as data the kind of data that I showed in the previous slide—so we have a lot of cases where we have a longer sentence and shorter sentences, and we’re using a baseline from Clarke and Lapata in 2008, who’ve done a lot of work. And you can see here the two different perspectives on the sentence: the ngrams which are shown in yellow—so this shows you the pairs of words which are likely to occur together—and dependency structure—so we take dependency relations between pairs of words and these also form constraints on the output. We can take the same framework and use it for sentence fusion, simply by changing the data set and by changing some of the features that we use. So here, the input is multiple sentences, and the output will be a sentence with common information from both. So it’s a way of taking multiple sentences and finding what is most salient, because it has been repeated across sentences. We use here a data set that is created from summarization evaluations—I don’t know if people are aware of the pyramid data, which has been used for evaluating summaries. But here, we have a case where we have many different summaries of the same data, and it’s been coded down to the phrase level of what is common across human summaries, and therefore should appear in an output. And we use that for the … for training our fusion. And then we have changed the features in some way, so we have some fusion-specific features—for example, repetition may be one of them. And you can see an example here where we take a part of the first sentence, another part of the second sentence, go back to the first, and end up at the second to get a sentence like: “Six years later, independent booksellers’ market share was seventeen percent.” We’re also using sentence fusion for machine translation, and here we sit at the end of a pipeline, where different systems have done translations—we’ve been doing this in a joint project with Martha Palmer, Kevin Knight, Dan Gildea, and Nianwen Xue, but I’m showing it here with translations that are available off the web, and in fact, we do often experiment with that. So you have a Chinese sentence at the top, the reference sentence so you can see what should have been translated, and then you see three different translations by online engines: Google, Bing, and Systran. And no one of them is perfect, but nonetheless, there are phrases in each of them that are good, and so our goal is to be able to fuse these three sentences and to take phrases which can be used and improve the final output. So we do two types of sentence-level fusion. One is what’s called a sentence-level combination, and that’s where we look at the output, and we want to choose the sentence that is best. For that, we have developed a structural language model that we can use on the translations—this is a supertag language model—and so it can capture sort of long-distance dependencies between the phrases, and we use this to rank the different outputs and choose the one that is best … followed by this. Now, that serves as our … what we’re calling the backbone sentence, and now we use phrase-level combination, given this best sentence, to pick the phrases from the different other translations that we have. And we use a variety of different kinds of feature-based scoring functions; we do look at consensus in the different translations on phrases; we have syntactic indicators on whether a phrase works well; and then we have information about the semantic role labels between the source and the target translation. So if we go back to this combination, we’re first going to choose one of them as the best using the language model, and this would be the Systran one, and now we’ll choose phrases from each of them to decide how to put it together. And we use a paraphrase lattice, which shows the paraphrases between the different systems—I’m showing just this part of it, where we’ve got translations from each of them—and we’re going to choose the top as the best sort of n-gram as we go through, and as we go through the second part of it, “is unlikely to”—we’ll choose this, because it gives us the best … the syntactic features show it to be the best in that case. So what have we learned from this work? With monolingual compression, we get a five percent increase in n-gram recall when we use joint inference with dependency relations over previous baselines. In our case of multilingual fusion, we get an increase of one point in BLEU score for combined MT output. Of course, things can go wrong. As I said, we’re not relying on how people wrote the input, and in fact, one of the things that often goes wrong is when we have multiple people from the same family who appear in the news. So this example is from way back when—when newsblaster was first picked up by the press and was appearing in the press—and of course, the journalists wanted to find everything that was wrong with it. And on the death of the Queen Mother in England, newsblaster had Queen Elizabeth attending her own funeral when we re-wrote the references. ‘Kay, so let’s turn now to scientific journal articles. We’re dealing with a number of different kinds of articles, some are from nature—they tend to be mostly scientific—we have a number from Elsevier as well, but it’s full texts. The first thing that you might ask is: how are articles different? We saw that the language of the genre was different, and we can see right away, just from the beginning page of it, that we have some structured information that we didn’t have before: we have titles, authors, and abstracts—as shown here below—we have citations, typically; and of course the language is different. So one of the first things we can see that we can get out of a set of scientific articles that we can’t get elsewhere is a citation network. So if we’re working with this group, we can start at the bottom and see that in fact from Yarowsky, quite a bit of work came out of that same genre. This is all citing back to paper … other papers which eventually cite Yarowsky on the same topic. If we look in at the text, we can see that we can get things out of the language of the text which is not available otherwise. So in the project that we’re working on with scientific articles—Mony Teufel from Cambridge is working with us— we can identify, for each sentence in the article, the purpose of that sentence in relation to the overall article and in relation to how publications go. So here is the aim of the article—one of the points or contributions that it’s trying to make. Given that there are citations within an article, we can also discover the text around that citation and what kind of sentiment it has towards it: whether it’s aligning itself with it, whether it’s positive, negative, or whether it’s simply … it’s objective. And so one of the problems, of course, is identifying what scope of text around the citation refers to it, but also sentiment. So here, this is a negative one: “we argue that this approach misses biologically important phenomena,” and the citation would be in there where the dot, dot, dots are. So in this project, we’re working on prediction of scientific impact. Our input is a term to represent a concept in the field, and … or a document. We extract indicators from the full texts of documents that are related to the term. So we first generate a set of documents that all have to do with a particular concept, and then our goal is to predict the prominence of either the term or that particular scientific article. And one of the kinds of features that we use is time series, and you can see here time series for two different terms: “climate change” and “climate model.” You can see that “climate change” shows a burst in this particular feature later than “climate model,” and continues to climb while “climate model” goes down. So that’s just one of the many clues that we use for this, although our work is primarily focused on indicators from full texts. As part of this project, we’re generati … we want to be able to explain why the system came up with its prediction, and we do this in sort of two parts: one is a summary which defines what the technical concept was that corresponds to the term, and then we also do a justification of the prediction using the output from machine learning—so that’s an original generation task with … where our input is basically the features or attributes that were used and how. So this gives you two examples in “what is tissue engineering?”—this is, sort of, one of the terms that we got in input—and this is done by doing summarization—query-focused summarization—and this is very large number of articles. We will typically have at least a thousand in our input, and from that, we have to go down to a single sentence which will tell us what the concept is. For tissue engineering, this gives you an idea of what our output looks like. This is a very small portion of it—it’s actually relatively long, and we’re continuing to work on this. Here we have: “we predict a prominence of point seven three.” We give information about the most prominent indicators, and then we provide some description about each one of them, so “overall the sentiment in the set of documents is overwhelmingly objective,” which … >>: Sorry, so … question … so yeah. So if you were to compare the result that you got from summarization of—you know—of this lot, compared with the Wikipedia summ … you know, kind of active. Is there any preference for subjective … people to look for them [indiscernible] supposed to do is summarize … >> Kathleen McKeown: Yeah, we did not … when we run the system, we do not have access to … we only have access to the scientific documents. But we could use that as a sort of ground truth for how well it’s done. We actually do use Wikipedia to construct an ontology of the concepts that we’ll have and we use that in the … >>: After [indiscernible] these examples, I probably will think this as good as Wikipedia. >> Kathleen McKeown: Uh huh. Well, we haven’t measured it that way, but it would be a good way to do it. I’ll just say a word about our summarization approach here. It’s an unsupervised approach; we begin by selecting non-subjective sentences from the text, and we use the argumentative labels to do that—so background sentences. We then rank by similarity and centrality in a similarity graph, so sentences appear at the nodes, and they’ll be connected to sentences that are more lexically related, and then we choose the most central of those, and then we re-rank the top candidates by using definition-specific heuristics. So what have we learned? Well, different forms of summarization are definitely needed when we change genres. We want information about terms and justifications, and while I didn’t talk about it, we have looked at this in medical scientific articles as well, where we needed work that was tailored to the reader, whether a physician or a patient. We have information we can exploit which we did not have in news: the structure of the article itself and the networks—so we have explicit networks based on citations, and sentiment toward cited work plays a role. So I’m gonna turn now to our work on online discussion forums, and this would be the kind of data that we might have available to answer questions. Twitter … but we’re also looking at online discussion where … from discussion forums, where we would have a bit more information. So how is online discussion different from the kinds of genres we looked at so far? First, it provides an unedited perspective from the everyday person, so the language is going to be more informal. It’s often in the form of dialogue, so we will have some back-and-forth about what people are saying. It contains a lot of opinions, viewpoints, emotion. And of course, the language of social media is not the same as the language of news. We’re looking at being able to answer questions about events, so this would be—and we’re looking at, sort of, these open-ended questions—this would be a case of what we also refer to as query-focused summarization. So here’s one case. This was in the course of developing the system. One answer that the system generated at one point in time—we don’t always get such nice answers. >>: So this is actual system output as well? >> Kathleen McKeown: This is. But it’s … >>: That’s actually pretty cool. It’s almost like novel-style writing, you know? Like something literary … >> Kathleen McKeown: Yes, but this is one where it chose a larger chunk. So a lot of it comes from the same chunk, and I can’t claim that all system output looks like this. Here, what is the effect of hurricane Sandy on New York City? And you can see: “It’s dark. There’s minor price gouging. There are restaurants selling hot food through their bay windows. The police are doing an amazing concern … job with traffic concerns. Many stores have set up recharging stations.” That was actually kind of interesting when you were there, because when you walk through that dark part, it looked like there were fires and people were all huddled around them, and when you got close, you saw that it was actually electrical outlets with … the glowing was from the phones. Some of the things that are hard here is that very often there’s no word overlap between the question and the answer. So we had “hurricane Sandy,” “New York City,” “effect”—none of that appears in the response, so we need to be able to do some inference about how things are related. So our approach has been to start with a small amount of manually-annotated seed data, where we have query-sentence pairs—where the answers were … we used Amazon Mechanical Turk to get sentences from the documents that were actually related—we then augmented this with nine years of unlabeled data from newsblaster, and we made the assumption that the summary headline was a query about the event, and the summary was approximately an answer. Of course, it would contain many sentences that are relevant to that query, and a small number of irrelevant sentences. And then we also looked for query-answer pairs on sites like Yahoo Answers or Quora. Then we developed this semisupervised method that used multiple simple classifiers, for example: keyword overlap, named entity overlap, and we experimented with different kinds of semantic relatedness. We also have expanded that earlier approach to use more unlabeled data that we can get on the web, and so we’ve gone to features that we draw from DBpedia. So we have over one point eight billion facts that we extracted … that were extracted from Wikipedia info boxes. These have been encoded in the semantic web, and you can see we have there sort of triples here, and this lets us see relationship, for example, between hurricane Sandy and location, which helps us to determine—in some cases—relevance. So our goal here, in moving forward, is to be able to decompose articles or online discussion into a main event—this initial impact of the hurricane—and sub-events—what happened afterwards: the Manhattan blackout, Breezy Point fire, public transit outage. Of course, you may ask if we’re constructing answers like this from on-line discussion, when should … can we assume that an individual post—or pieces in it—are reliable enough to be able to answer a question. One factor in this is influence, and that’s something else that we’re looking at in the context of online discussion. So the research question is: we want to be able to detect online influencers; what conversational features are important towards that task, and how can we identify situational influence that is made apparent by the conversation, not by the links between who follows who? So we want don’t particularly want to identify Justin Bieber, for example. So in our work, an influencer is somebody whose opinions or ideas profoundly affect the listener or the reader. We’re doing some of this work in discussion forums like Wikipedia discussions—so these are online discussions that take place between editors of the Wikipedia forums, and there’s a lot of sort of back-and-forth about how editing should take place. So here we have—we can see—we have conversation: we have a person who makes an initial post, responder, person who responds to that and so forth. And the discussion here is about whether Ahmadinejad was lying about having served in the Iran-Iraq war, and he’s recommending to work in a certain piece of information into Wikipedia. The woman replies, “That’s a very weak source. I’d like to ignore it,” and the original poster agrees: “Thanks, I guess we’ll have to wait and see.” So here the influencer is the woman. We’re doing this with cascaded machine learning, so we have a number of features at the right, some of which are fairly complex, and we need to learn them themselves, and then those pass up into a machine learner for influence. So if we look at a couple of examples for dialogue patterns—let’s say we have a structure of posts like we see on the left—our pattern—here, the feature irrelevance—would tell us that posters … posts that have no replies, and which therefore seem irrelevant, are coming from people who are less likely to be influencers—that would be our intuition, and it’s one feature—whereas someone who takes the initiative or incites a lot of response would be more likely to be the influencer. Now, agreement and disagreement is another factor in that, and this is something that we also worked on fairly early with Michel Galley, and we’re continuing to look at. It’s hard; you can see here with disagreement: “That’s a very weak source. I’d ignore it”—at no point in time does the speaker say no. So we have developed some machine … a machine learning component which will look at the kinds of features that we need to be order … in order to determine whether we’ve got agreement or disagreement—sentiment plays a major role in that. So what have we learned here? In our work so far, we do significantly better than a baseline, but detecting influencers is really hard. If we look at F-measure, we … we’re still … we have a long way to go. We can gain intuition about language use in social contexts, and so for Wikipedia, we found that agreement is more useful than the dialogue patterns, but in some of the other online discussion that we’ve looked at—some blogs—it’s the other way around. And we can validate some hypothesis we have about which conversational features are more important for different genres. Yes? >>: Are you ignoring the up-votes and down-votes in various discussion media, where viewers can give a posting a plus- or a minus-vote? ‘Cause that’d be an easy one. >> Kathleen McKeown: Yes. In the … in this particular case—the data that we’ve looked at—yes, we did ignore that. We have gone on to look at … we’re looking right now at—so each of the forums are different and the kind of information that you can get—right now, we’re looking at CreateDebate, which is sort of interesting because in the data, you have pros and cons, and so from that, we can easily see agreement and disagreement, and we can gather a lot of data about it. So follow-on commenters are typically … Okay. So going forward, the question that I showed is a kind of question that you might normally get when you’re looking at the news. If we were looking at other kinds of questions that we might want to pose when we’re looking at online discussion forums, they’re of a different nature—so what is the reaction? And if we look into the blogs, we can see things like emotion plays a role: “I’m still speechless at the widespread damage.” How do people expect Sandy to impact the election—here, we have a lot of opinionated information: “I can only imagine how this will make the nightmare of voting even worse.” And then, experiential questions, so it gives us the opportunity to look at a lot of different kinds of language analysis than we have when we are looking at news. So I’ll move now to our work on personal narrative. The Autobiography of Malcom X would be an example of the kind of thing, but not so long. We’re not looking at novels here, we’re looking at short, online … But we’re looking at things that provide a coherent telling of a story; they have a component that is particularly compelling—almost shocking—in terms of what happens, so it’s grabbing your attention. Unlike the online discussion forums, we typically have a monologue, but like the online discussion forums, we have informal language. So here is an example of what we would get, and we can sort of chunk it into different areas. So in the front, we have some information that orients us as to what was going on: “We were sitting down to a late-night dinner on Monday night, when the storm was supposed to hit.” We then have sort of a sequence of events that is happening, and we end up with this very compelling, almost shocking element to the story: “He went upstairs to get a tool, and in those few seconds, ocean waves broke the steel door lock, and flooded the basement six feet high in minutes.” So our goal in this work is to be able to look at these kind of narratives that occur online and predict when a narrative is interesting. When will it go viral? Or when should it be selected as part of a larger story? And we’re approaching the work initially by analyzing the narratives through William Labov’s theory of narrative. And this theory proposes different structures that should be present in a narrative which will be interesting. We expect to find background information; we expect to find a series of these complicating actions; and most importantly, we expect to find this reportable event—this sort of really compelling, shocking statement about what happened. I’m not gonna say much about how we’ve done; we’re still at very early stages, but we have developed a supervised approach to be able to identify a structure that is consistent with Labov’s theory, and we’re getting about seventy point four F-measure. You’ll note that the kind of labels that we’re putting on structure—I don’t … if you’re familiar with the discourse relations that come from the Penn Discourse Treebank—it’s different because we’re not looking at relations between sentences, rather we’re labelling blocks, but we do use the relations from the Penn Discourse Treebank to help us in that process. ‘Kay, so let me close by looking at our work on narrative. And here, we have looked at a corpus of novels that come from the nineteenth century, and it—that’s because they’re available through … online through Google books—why we chose them. If you look at the language of novels, again, you’ll see it’s quite different. We do have a lot of conversation with different people speaking, and often talking to each other. So here: “‘What is the matter?’ I cried. ‘A wreck! Close by!’ I sprung out of bed, and asked, ‘What wreck?’ ‘A schooner from Spain or Portugal, loaded with fruit and wine.’” Hey, so we’re using a corpus of novels from the nineteenth century, and we’re working together with faculty from comparative literature. And we looked at whether we could use the analysis of novels in order to decide what theories of … literary theories are of interest—we discussed what collections we could work with in order to provide evidence before or against them—and there has been quite a bit of work in the comparative literature on these various theories, but using what is called a close read—one or two novels in a lot of detail. So what we wanted to do here was use what is called a distant read—look at a lot of novels to make our conclusion. So we were looking at whether we could provide evidence for or against literary theory; we did this by using social network extraction from literature, and as I said, this corpus of nineteenth-century British literature, where the network was based on the conversation that happened in the speech. So we want to have a method where we identify who talks to whom, and then extract features from the graph to evaluate hypotheses about literary theory. So each node would correspond to the characters; a link will come between them if they’re talking to each other. If we looked in literary theory, the kinds of things that this particular person in comparative literature was interested in was a hypothesis that had been developed that said that as the novel moved from rural settings, with a very small number of characters, to the cities, with very large number of characters, then the network tends to be less connected. And you can see this in quotes from various people: Franco Moretti, who said, “At ten or twenty characters, it’s possible to include distant or … and openly hostile groups;” Terry Eagleton, who says, “In a large community, most of our encounters consist of seeing rather than speaking, glimpsing each other as objects rather than conversing.” So we want to look at whether we can show empirically that conversational networks with fewer people are more closely connected. So to construct the network, the nodes are … represent people who said something, and we did this work in triple AI with quote attribution, which we could do with eight-three percent accuracy. So the idea is: can you identify for a particular quote in a novel who said it? Even that is not trivial. And our edges are people who are talking to each other, and here we use quote adjacency as a heuristic for detecting conversations. So … I should mention that even goes to the point—often you have in novels where you have conversation where it alternates between one person and the next—you have no identification of who’s speaking, but you’ll have a quote, quote, quote, and so that was something that we could pick up. And then we set the edge weight to the share of the detected conversations, so we sort of have a … information on how much talking they did. We’re able to identify these links with very high precision, but only fifty percent recall, so that’s something that could still use some more work. Nonetheless, as you’ll see, this should work against us in what we aim to provide. When we look at the network size, as the number of named characters increases, given our hypothesis, we would expect to find same or less total speech. And in fact, we found that, but with a weak “yes”: the number—normalized number—of quotes was flat. We would expect to find a less lopsided distribution of quotes among speakers, and yes we did find that: the share of quotes by the top three speakers decreases. As the number of named characters increases, we would expect to find lower density, if our hypothesis is true—that is, each person would have fewer conversational partners as a percentage of the population—and we did not find this. We found that larger networks are more connected. We would expect to find same or fewer cliques, like smaller of these groups that actually have conversation, and we did not find this—in fact, we found that the clique … three-clique rate increased and larger networks form cliques more often. As the number of speakers increases, we would—so that was with named characters, now let’s look at just the people who are speaking; perhaps that would give us the information we were looking for—so as the number of speakers increases, we would also expect to find less overall dialogue—this is this glimpsing rather than speaking—and we find that not to be the case. Larger networks are more talkative. And we would also expect to find lower density, and again, we find that not to be the case: in larger networks, speakers know more of their neighbors. We did find an alternative explanation in the data that we were looking at, and that was text perspective, which dominates the network’s shape. So in third-person tellings, as opposed to firstperson, we did find significant increases in the normalized number of quotes, the average degree—that is, the number of people they were … each person was speaking to—the graph density—that is, the percentage of the population that they were … we saw them speaking to—and in the rate of threecliques, and this was with no significant difference in the number of characters or speakers. Our hypothesis here is that, when we have a first-person narrator, they’re not privy to the other characters’ conversations with each other; they see things only from their own viewpoint. And we can see this in the graph, so this is a network from a third-person narrative novel, Jane Austen, Persuasion, and we can see it’s—well, Anne is talking to a lot of people. When we go to what is called a close third—so it’s told in the third-person, but from the perspective of one character—all we can see the shape changes: Robert dominates, and most of the links are between Robert and other people, we’re not seeing who else they’re talking to. And when we go to the first-person narrative, we get a dramatic change, so everything is seen through I. So what have we learned here? Well, we’ve learned that high-precision conversational networks can be extracted from literature, and I think that while natural language has avoided looking at novels, I think the time is right now to do more exploratory analysis of fiction. And we’re beginning by combining the work that we’ve done by looking at the social function of characters and see how that plays a role. ‘Kay, so to conclude, we’ve looked at a lot of different sources of data and what we can do with them. Our goal is actually to do some integration—I’m not sure yet how novels fit in, but we’d like to bring the rest in, and we’re at early stages with personal narrative. So I’d like to show you now a mock-up of where we want to go with this. Let’s see if I can … [pause] of course, things that work when you start … [pause] I may have to just do it like this. [pause] I can see I don’t have … well I had this all set up, but … we may just have to go through it by hand. Here is a sort of timeline of hurricane Sandy, where we have the introductory information about hurricane Sandy approaching, and then we move through to provide information along a time and space, which will tell us about what hap … what has been happening. So here, we have the personal narrative, the compelling event. We then move along, Tuesday eight PM, where New York City has become dark, again drawing from social media. A little bit later, in conversation, on postpone the vote. And finally, a month later, where people are beginning to work on the impact of it, and … so drawing from scientific articles. ‘Kay? So to conclude, that’s our goal, and I’m, at this point, just ending with showing you a picture of our research group. So thank you. [applause] So … I don’t know if there are any questions. Yes? >>: I’d be really curious what kind of reaction you got from the literary theorists, because they can be sometimes a little detached from reality. So I wonder how they react to—you know—empirical [indiscernible] >> Kathleen McKeown: Well, I was … we were working with the chairman of the comparative literature department at Columbia, and he was really interested in this. In fact, he did not worry that our evidence came out against the theory. He found that very interesting, and I have to say the evidence is only against—it doesn’t disprove it, it just provides some information that suggests that it may not be as true as people thought. The field of comparative literature is moving towards doing more empirical analysis of texts; Franco Moretti at Stanford—who provided one of that … those quotes—that is what he does. They tend to use less sophisticated tools than you have in natural language processing, so—you know—I think there’s a lot of room for interaction. And since we did that work, the department at Columbia hired a faculty member in computational English, and I thought—you know—how cool to have Columbia, who is so conservative, hire in that area. And he’s a person who actually does use fairly sophisticated tools—he does topic modelling and, you know, various other kinds of things. Yeah? >>: So there—I mean, just for confirmation—there is—I don’t know—nineteenth century, what we’re studying, right? So I’m thinking we show author writes in the second person: “you did that,” you … I mean, basically puts you there. I mean, nothing of that would … if you looked at different authors, the styles … how much more of a difference there is between—I know—authors by themselves versus the type of novel: with lots of characters, fewer characters … >> Kathleen McKeown: There’s a whole lot you can do. I mean, so you could look … we talked about a number of the different things that we could look at. For example, we talked—at one point before starting—about whether we would look at difference between genres, and of course, when you’re working with someone in comparative literature, they have a lot more nuanced view of the genres than I do. But there’s mystery novels and whether … but we didn’t have enough data on each kind of genre to do that, and that was one of the reasons—you know, you need to be able to get a large enough corpus that is available online. We had about sixty thousand novels from that time period. >>: It’s a very interesting presentation. One of the things that I wanted to ask you—for your insight about—is: literature is very structured in the way that it … and very little sarcasm and very much normalized. A lot of the—sort of—the social postings today—like Twitter or stuff like that—there’s facetisms, a lot of sarcasm, and very negatives. So I want to ask you for your thoughts about how you think this type of approach may work in—sort of—the newer social media types of language. >> Kathleen McKeown: Well, we are … I mean, we are working with social media, and different portions of the work that I talked about, we use different techniques. So what we’re doing with novels, we’re not doing with online discussion forum. In our work with online discussion, sentiment analysis plays a big role. In order—for example, whether somebody has influence can depend in part on positive or negative reaction to what they’ve said. There is work at Columbia going on on detection of sarcasm; we—my group—is not yet using it, but it’s being done by Weiwei Guo in … with Mona Diab and—there’s another researcher, Smaranda Muresan—who are looking at that and were working together as a team. So it’s something that we could eventually fold in. I think sarcasm is hard to detect, but of course, it negates whatever sentiment—you know—the person has expressed, so it’s important. >>: Did the other … I’m sorry to follow a question like that. Do you have any insights about loss of context when the—sort of—the message is so short? Because a lot of the things—you mention these things—depend on knowing all the stuff around it. When you have tweets that are so short and no context around it, it sometimes can be very difficult to determine, like, what are they even talking about? Do you have any … >> Kathleen McKeown: Yes. So I have to say: that’s why I stress a little bit that we are looking more at online discussion forums, where the context is longer. So in our work on influencer detection, we found a number of very nice sites on political discussion. Even in our work on disaster, we find some. But with … we have also done work on Twitter, and within Twitter, we’ve sort of focused on finding threads where there is conversation. So you can see some back-and-forth as opposed to individual posts, because otherwise, I do agree with you—like, I’m … obviously, I’m not of the younger generation, and I’m not always sure of the value of Twitter. Although clearly in the context of disaster, it is important when—you know—things first happen, and sort of to see the progression of events. Yeah? >>: How do you—on the subject of influencers—how do you distinguish influencers from trolls you also in general have in discussion? >> Kathleen McKeown: Ah. That’s a good question—we don’t right now. I … and I can’t give you a good answer about how we would do it. We’re having a hard enough time with influencers, so we’re just assuming for the moment that that doesn’t exist. >>: Been thinking about the research that just came out talking about trolls and how they’re aligned with the dark tetrad of psychological attributes. >>: Well … I was actually wondering whether analyzing the text to identify, that might actually … >>: Yeah, so … >>: Identify trolls for the future … >>: That’s a question that I’ve been asking inside the science community—and so has she—about how we might be able to identify and sideline the influencing trolls by recognizing: “gee, these are the kinds of psychological models, and then what kind of language and behavior aligns with those?” I think that’s a long time in the future. I did have a question about influence model. >> Kathleen McKeown: Mmhmm? >>: I was curious: the work you do, is it normally in online discussion forums within one forum? Because there is—when you evaluate the influence and the authority of someone—if they come from Reddit, I evaluate them very differently to if they come from somewhere else. So I’m curious about … >> Kathleen McKeown: So, we are looking across different kinds of discussion forums. I agree they can be quite different. We had earlier started out with live journal blogs, mainly because they had a lot of metadata available about the poster, so we could get more information. We then moved to Wikipedia discussion forums—those are very different. We’re now looking at online political discussion forums, and we have worked some with Twitter. And part of what we’re doing is a bit of domain adaptation, so with our different features, which are learned, we can … as we … we test, as we move from one genre to another, how well it … how well we do. For example, do we need to retrain on the new genre? Can we use a combination of training material from both? And so forth. But we haven’t looked specifically at what you’re raising now, which is where the person’s natural habitat is. Yeah? >>: In your summary—a Bolivian airliner—what percent of your readers would read “air force” and assume it’s US Air Force? >> Kathleen McKeown: Why did we drop that? >>: Right. >> Kathleen McKeown: So in other words, why did we drop “US Air Force?” >>: Italian, pardon. >> Kathleen McKeown: Um … you know, I can’t give you a logical rationale as to why. It’s learned, so it happens in the data that we’re looking at; when people did summaries, they dropped “US Air Force.” >>: They dropped US air, but did not drop Italian Air Force. >> Kathleen McKeown: Yeah. So it may have been assuming a US readership. I think all the people doing the data—the majority of the people … this came from data also that was done by people … must have been from the US. Yeah? >>: So the analysis that you show on the literature, constructing the network of the found people, who are populating. So is that … what kind of practical application do you have for this kind of construction? >> Kathleen McKeown: We don’t. >>: No? It might help to the … for the reading of novels—you know, big novel … people get confused, you know [indiscernible] >> Kathleen McKeown: Yeah. I—sort of—I—you know—I don’t do patents as much as I should, if at all. And I know it’s very different from being in a company. We actually end up putting a lot of our work online for free. I have worked with our patent office, and it was enough to make me decide no. [laughter] I don’t have to do it, I’m not going to. >>: You know, sometimes compression works, so … you stood … you started from the dot bay—is that right—from the Pyramid evaluation. So have you looked at what would be needed if you’re moving to a different language or a different domain than news. So what would you need now from news— annotated—to have similar quality, because that kind of data it won’t graft—right—because you have to have all these systems running—you know—and human evaluators getting into the loop, and it’s a big effort to have—sort of—data to use … >> Kathleen McKeown: I … right. I think we need to move more toward semi-supervised and unsupervised approaches. Either that or also using data from the web that we can find that can serve. So our work on making use of all of the newsblaster summaries to serve as question-answer pairs, which is—you know—it’s not as accurate as if you did it with human annotation, but it’s huge. And so—you know—that can compensate for some of the errors that are there. Now, for … we have thought a lot about how to find good data that we don’t have to sit and annotate to get; nothing is perfect. So we went with the Pyramid data, which actually is quite nice. We thought about and have looked at using simple Wikipedia, where you go from longer to shorter sentences—which was partly okay, but wasn’t clean data, like we were having to do a lot work in figuring out which were good pairs and which were not. In previous work—this is medical—where we’ve gone from—you know—Reuters posts, news releases—when a journal article is released—and so there you have a pair with a short lay version of the new … of the journal article. It’s helpful—but again, not perfect, because the data’s sparse. So I think that would … that’s the direction that we’re going in, is: how can we look to find the kind of data that we need as opposed to annotate it? Okay? Thank you very much. [applause]

Document 17865270

Products

Support

Document 17865270

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib