23760 >> Michael Gamon: It's my pleasure to introduce Souneil... today from Korea via Bellevue. And he's in computer...

advertisement

23760

>> Michael Gamon: It's my pleasure to introduce Souneil Park, who is here today from Korea via Bellevue. And he's in computer science at KAIST in South

Korea, received his BS in computer science from Soongsil University in Korea and the MS in computer science at KAIST and just finished his Ph.D.

And he's done a lot of work on media and news. And today he's going to talk to us about detecting media bias.

>> Souneil Park: Thank you. Thank you very much for having me here. I'm

Souneil from KAIST Korea. Very glad to talk about my Ph.D. work, which was about understanding the different views of news media and exposing media to diverse viewpoints.

What I'm going to talk about today is the overall context and the big picture of this work for the first five minutes, then I'm going to focus on two papers, which I believe to be relevant to this group.

Although a lot of material in this work is about dealing with traditional news media, I believe the results and ideas provides many implications and has possibility of extension to social media. So I hope you all enjoy the talk if you have any questions, feel free to interrupt.

This project was started by taking a critical view to news media focusing on even the same reality can be delivered differently depending on news producers.

It has a great impact on readers cognition. Let's think about an example together this is a pretty old example. I have to use it. It's a really symbolic one.

There are two pictures about a summit, when you first see these two pictures, you probably think these pictures just capture different moments.

Two different moments out of hundreds of other pictures taken that day. But when I saw these two pictures in the front page of the news media, it was really funny to me. To give you some context the previous president of Korea was considered as a very liberal person and an important issue that devised liberals and conservatives in Korea was the attitude towards U.S., whether being more independent or trying to be maintaining a more friendly relationship.

And the newspapers on the left are very conservative news media that consistently try to emphasize that there is a problem in the relationship between the United States and this president is causing many conflicts.

And it's amazing coincidence that they all use the same picture. They pick this picture out of thousands of others. The picture which shows a fist and stern face of President Bush. Now do these two pictures seem to communicate the same feeling? They're not just about different moments, they're about different

impression. They probably selected this on purpose in order to shape readers cognition in the same way.

Our project had an aim to design news consumption space so people can discover these different views of news media and have a more comprehensive understanding of the reality under the belief that having access to diversity and considering them would lead to a more positive are ultimate to the community.

And why I think this area can generate many research issues is because this news industry has their own well-developed process that evolve for a long time.

They have their own system, own culture. So there's a lot to look at it.

Also regarding this issue of bias, research in mass communication says this is a structural persistent problem of the media industry rather than problems of specific journalists. I'll explain by quoting one experienced journalist who said:

News production process self as continuous objective valuation process. And as he said during news gathering they choose what events and topics to cover.

During news writing, they choose the style of coverage. During news editing, they choose the style of presentation which article should go to the top, which photo to select.

For each decision making a fair choice is never easy because they're always influenced by various factors own political ideological views, external forces, owners, advertisers.

So a news article can be simply seen as a document written in front of a desk, but if we look at this entire process, we can see what kind of various activities happens before actual writing. And this generates many research issues.

For example, a lot of work on news article analysis opinion mining, sentiment analysis. They narrow their view down to the writing style. Expression of opinion was sent in text. What this process tells us is that viewpoint can be expressed also before writing in the selection of topic or agenda. We commonly see different media pushing different agendas, and it can be expressed also after writing, in the way they create the presentation.

So the layout of contents. This stage has many other factors in writing. Location of space, combination of stories in the layout. So taking account the structural view of the news production process can much widen the scope of the research.

And our basic idea was to look at this problem from a reader's perspective. We thought it's possible to support readers to overcome specific media's view and this would mitigate the effects of bias to some extent.

And the effective way to do this is to we thought an effective way is to help readers to always recognize the existence of different views and compare them.

So we tried to develop automatic news browsing tools that expose readers to diversity and change their daily news browsing behavior. And I think this is really a practical approach. There could be other approaches if we take a wider view to

multiple disciplines. For example, we can think about the efforts of journalists at they news production stage.

They try to observe journalism ethics and standards. They try alternative reporting formats, point, counterpoint, roundtable discussions.

But these efforts require significant change in the news production process and success depends on individual journalists.

And after production there are some researchers, experts who try to manually monitor and diagnose bias and contents. There are some limited trials trying to measure or correct bias.

But the point I want to make is increasing diversity would be a practical way compared to these approaches, because other approaches like measurement correction inherently involves difficult or vague questions like defining what bias is or defining what unbiased news is.

It's a very difficult question which is hard to come up with a clear definition, because of its subjective nature. But it's still possible to say there is a difference between viewpoints and just reveal the viewpoints without judging a specific news bias.

By doing so we can avoid these difficult questions. And I think this also fits well with our goal. Our goal is not to assume an unbiased state and pursue that, but it's to encourage readers to have access to diversity and considering them.

Does that make sense? All right. And before I finish the discussion about this big picture, I want to provide an overview of this research space and talk about the parts that my work covered. I think there can be about four main dimensions that characterize this research space. First the domain of article you deal with, because the characteristic of this, of course, is going to be different.

And second the framing method. There are many ways our viewpoints are expressed in news articles. Third, the development of the technique for news article analysis. And fourth, design and evaluation of the user interface.

And my work first started by studying the most common type of article that we face in ordinary news browsing sessions which is straight news and studied closely related framing method which is fast selection, and then try to develop technique that can capture diversity for a straight news article considering the different fact selections and then study the impact and user experience. And we later extended this work to deal with news articles of specific domain that many people have interest.

And I think a lot of future work can be made by following each of this dimension example. This research could be extended by covering another domain of article like interview or editorial or it could be extended by studying a different type of framing method like photo selection, space allocation.

So I wish this work could be considered as an initial work that tried to open a big research space. Now, I'm going to narrow down to the actual work, which consists of three parts. The first part is about the system we developed for straight news article, which serves as a basic framework implementing our approach and later parts are component technology, we made for more advanced news article analysis.

And in this talk I'm mainly going to talk about the first two parts. The system we developed -- I'm just going to use an example to help you make sense of this system. Imagine that your government announced a new property tax plan, and let's assume the news articles you get to see about this tax plan are like this.

This is an actual list of news articles captured from Google News Korea and you can see a lot of articles talking about the tax increase.

When readers are only exposed to these articles, they may instantly feel that the tax is going to be increased for their houses. But this is News Cube. The articles that talk about the tax increase are clustered into a certain area and the system provides an overview, classified view of articles that provides different information about this plan.

We can find an article that covers the argument of the policy designer who says this is the normalization of property tax and how this collective tax is going to be used to suburban areas. And also there was an article that says the plan doesn't affect houses in most districts at all.

And these different articles provide different viewpoints and information about this tax plan. And we also later observed that being aware of these different stories lead to different perceptions about this plan.

So the system was designed to encourage readers to browse through these different stories. While in many news services, news articles are typically organized based on events. So when a user visits they select and read one article, the topic, and they move on to another topic.

It would be very rare, but even if the user wants to read different stories related to the same event as Michael, they would have to expand this event and click this small link and then they would find a lot of articles in a simple list interface. They have to go through and try to find different stories.

We try to change this interaction sequence, so whenever the user clicks an event the user provides an overview of different stories and even when the user moves on to a specific article the interface tries to keep awareness of the different articles that the user didn't access.

We conceptualize this way of news browsing aspect level news browsing, which means it helps readers to browse through different aspects of the selected event that are not covered in a single story.

So the goal of the system was to lay a basic framework that can be used to test if the diversity can be achieved through automatic news browsing interfaces.

Although this is a single design goal, there were a wide variety of issues we had to deal with in development. A breadth of issues you get to deal with when you build an automatic system. So in the first part of this work we focused on characterizing these issues and find feasible solution for each issue rather than going deep and try to find an optimal solution for a specific problem.

And among those I'm going to talk about, mainly talk about three issues that we dealt with in this study. First one is how this concept diversity should be interpreted for straight news articles, and the second is how the classification methods should capture this diversity, and the third one is whether this presentation of diversity had impact on user experience.

Now a fundamental issue is how this diversity should be interpreted for straight news. Which is what should be the criteria when we judge the difference between articles?

This is a very important but difficult question, because it can be many properties related to viewpoints, for example, nuance of specific words. Tone, style of writing. So we try to find a solution that provides a basis considering the characteristic of our data. Straight news is a typically form of news that mainly intend to deliver facts to inform readers and doesn't include much journalist commentary or opinion.

We try to interpret this diversity from an information selection view, which means different viewpoint would lead to different selection of information.

As in these sample articles, although they cover the same event, the central theme articulated in the story that we call aspect is different among articles. And this also fits well with a widely quoted definition on news framing on mass communication literature, which says different selection of aspects is an important way of expressing viewpoints.

Well, we could consider other factors like tone or style of writing, but these factors are inherently subjective to some extent and has subtleties. So it's hard to interpret clearly. So we first focused on this different selection of aspects.

And why I believe this interpretation provides a basis not just because it's related to a mass communication theory, but we also try to -- we also verified the effectiveness of this interpretation experimentally.

We tried to see if people have agreement when they interpret diversity in this way, which means if we give this criteria, do people interpret it in the same manner.

We did two experiments. The first one was we asked people to interpret the articulated aspect for a sample of article. We asked them to select the core sentences that seem to describe the articulated aspect.

So if they interpret it similarly they would select the same set of sentences. And the second experiment was to see the agreement in the classification, agreement when they judged the similarity between articles.

So we gave them a set of news articles covering the same event. And then asked them to perform aspect level classification and then observe the similarity between classification results.

And in both experiments there was a high level of agreement which means interpreting diversity in this way is an effective approach. So the goal of the classification method should be to create a classification result that is similar to a manually created classification result that's agreed among different people.

And the second issue is now about this class, how we should make this classification method. It itself is a big research problem. There's a lot of work on document indexing, feature extraction, document classification.

We tried to build unique solutions specific to our data, because we deal with this specific type of documents straight news articles. We tried to build it based on close observation of news reporters, news outlets and discovered some useful heuristics that can be combined with conventional NLP techniques. I'm just going to briefly summarize the key ideas here.

We observed how journalists write stories to understand how they put the key information in the news articles. And we found this rule called the inverted pyramid style of writing which says they put the key information first and they support it in the diminishing order of importance in the main text.

So this tells us where the important terms are likely to be located and how much they're important based on how much they're supported in the main text.

And for classification, we use this framing cycle, when multiple news outlets cover the same event, there's a certain pattern. As in this example, this example shows the news articles that cover the same event. The amount of articles published over time, and the color represents the covered aspect in this article.

As you can see at the start of the event, most of the news articles cover the same stories and the colors aspects diverge over time. This is closely relate to the news gathering behavior of reporters.

At the start of the event they tend to rely on identical news sources such as press releases news as reports. And later, depending on their editing direction they reach out for different sources, different stakeholders, experts, and as they do, the cover aspects diverge.

When we see this framing cycle from a classification view, we can see a long tail like distribution of classification result. There's a big article group which include the majority of articles, which cover the early stage aspect. And there are several small groups as a result of this divergence.

Well, I'm not going to go -- I'm not going to explain the detail of this algorithm.

But knowing this kind of distribution of the classification result in advance can support the classification tasks significantly.

I think an analogy can be made to latent semantic indexing. LSI tries to use the latent meaning behind the term. But what I tried to do is use this latent distribution of classification result and the classification task.

If you have questions I can explain about the details later.

>>: Can I ask a question? I find this extremely interesting. I was wondering if, is this a very, very general item that you observe across like different types of news events, like say disasters or unexpected events versus expected events like where there's a certain outcome, or do you see differences?

>>: I have served regardless of the category, news category, and it was quite general. But the difference I observed was which news events evolve really quickly. This pattern is quite different for those, for example.

If the event transforms into a different event and evolves differently, then this like shrinks and there's another big spike later. That was the difference I observed among these events.

And we also observed that applying these kind of ideas can improve the performance. We measure the accuracy by comparing the classification result to the manually created classification result, which was obtained in the previous experiment.

Remember the experiment we asked the people to manually classify result. And we measured the similarity using the F measure and we observed that planning these ideas can improve the accuracy, the similarity of the classification result to the manually created classification results.

And the last issue is the user experience study. And these are crucial experiments because it verifies the idea whether news browsing interface can shape behavior and cognition. The experiment was two fold. We tried to see if the system encouraged readers to select and read more diverse stories, and later tried to observe whether that has an impact on reader's cognition.

And considering this behavior aspect, we traced what articles people read when they use the system. We've conducted a between-subject study. So one group used NewsCube, one used Google News, one group used RandomCube.

RandomCube which has the same interface as NewsCube but the articles were classified randomly.

By comparing NewsCube with Google News you can see if the difference of interface lead to different behavior. And by comparing NewsCube to

RandomCube, you can see if a proper classification is necessary.

And we measured the number of articles read by each user and also the diversity of read articles. And to briefly summarize the result, the NewsCube users did read more number of articles and also more diverse stories, which indicates that if the interface more prominently displays the articles that give different information about this selected event, that definitely seems to increase the chance for a user to select it.

And I think it's possible to speculate more about additional factors which may influence readers' article selection by comparing with another closely related work. There's a group in Michigan who did a very similar study. Sean Munson and Professor Paul Resnic did a study on the relationship between reader satisfaction and the amount of liberal or conservative items in the recommendation list.

And they tested some visualization technique like highlighting items or changing orders. And they somewhat reported contradictory results, says reader satisfaction didn't significantly change depending on the visualization technique.

And I thought this may reveal the different motive of users, which would be another important factor. In their work they used politically explicit content, blog posts and political commentary. Participants also had a clear political preference. What the interface is asking is close to I know you have a certain viewpoint, and here are items that express opposing viewpoints, what would you select?

But in our case we used string news articles which doesn't explicitly express political preference, and many of the participants didn't have a clear political preference.

And also we organized them based on topic. So what the interface is asking is close to here are articles that give different information about the selected event that you don't know.

Although this look may seem to be a subtle thing, but I think it can be an important one that can lead to different interface strategy. For example, people may not be generous to the content that expressed opposing view. But they may want to read the article that gives different information they don't know about the selected event.

There could be motive to seek opposing views and there could be motive to seek more context and information. And if many people show this kind of behavior being generous to articles that give different information I don't know, the interface can try to keep increase diversity further by using primarily fact-based articles or organizing based on content so helping users to find more relevant but new information.

And as such I think it's going to be possible to discover additional factors and try to find more elaborate interface strategy. And coming back to our work, considering the cognitive aspect, we try to more closely observe the opinion development process.

We recruited both people who have a clear political preference and also those who didn't have a clear political preference. And for each group half of them used NewsCube. Half of them used Google News, and we observed the difference between them.

And we did observe some promising points. I'm going to summarize them here.

Regardless of the existence of predetermined views, NewsCube users did show some difference from Google News users.

NewsCube users, when they used the system, they instantly recognized that the news events they're reading can be -- are contentious issues. It can be seen from a different perspective. But compared to that, the Google News user who didn't have a predetermined opinion they tend to just follow the viewpoint based on the article they read. This was the first difference. And another difference we observed is while we did the interview, when NewsCube users explained their viewpoints, they brought up more detailed information compared to Google News readers, because they read more articles and get better aware of that event.

Another point was when they explained their viewpoint, they brought up different views and commented about them and why they didn't follow those viewpoints.

I'm not saying that they accepted or agreed with different viewpoints, but at least they knew it and they were brought up, commented about that.

We could see they were considering it. And I would like to conclude this first part by mentioning some interesting future research questions. Considering the behavior aspect, I think it's going to be interesting to see if people consistently seeks diversity or if they return to their original behavior long term.

And as I mentioned it's going to be interesting to see, discover more interface factors and strategy. And considering the cognitive aspect, one interesting thing we observed was some people were more confident about their viewpoint after being aware of different views, because they considered it and they had reason why they picked this viewpoint.

But on the other hand, there were also people who hesitated to pick a specific position after being aware of different views. So I think it's going to be also interesting to see if there's a relation between greater exposure and paralysis and apathy.

And before I move on to the second part, if there's any questions I can answer you, if there's no question I can just move on.

Now, after building the system, we try to more focus on specific domain of article that many people have interest. And absolutely political news articles was one of that important area. And one motivation came from the responses from our user studies. Many people expected a clear opposing category of news articles like liberal or conservative; but actually when you look at news articles it's difficult to put them in a clear opposing frame, anyhow, because people expect it similar to

U.S. liberal versus conservative is a particular frame that people use to view politics.

Although this frame has many limitations, it's also used in many studies, political discourse, political science studies use this frame to characterize discourse, lots of studies on Internet content based on this frame.

There are also many studies on characterizing people's political preference. But the problem is that it's really difficult to interpret political orientation from news articles. Political discourse is very complex. It's involved in wide variety of issues.

Politician parties, government, social policy, foreign policy, even religion. And the discourse is realized differently depending on each of these issues.

So let's take, let's take some example arguments about this healthcare reform debate reported in news articles. And as you can see here, the text doesn't simply say for healthcare or against healthcare. They address various different topics around the reform, emphasize different facts, articulate different arguments.

People who have knowledge about this issue can interpret whether the arguments are supportive or negative to the reform. But it's difficult to do that computationally. What makes it more difficult in particular for a news article, that the discourse changes rapidly. A new quote, a new fact, and people respond to that update.

So the discourse rapidly, continuously changes in a daily basis or sometimes faster. Now, we can think of several directions of solution for this problem. First one would be, of course, approaching based on text analysis.

Actually, there are a number of works that employ this approach. They often pick some specific issues and conduct case studies on those issues.

But I thought this approach has somewhat limited, somewhat limited to apply in real world scenarios. Here's how they do it. They pick an issue, use some data as training data, and developed classifier and tested it in other topics addressing the same topic.

Because this classifier is specifically trained to this specific issue, whenever a new issue appears you have to do this all over again.

It's a very costly process if, especially when discourse changes rapidly as news articles. And another thing we can think of is using the metadata on news articles.

I got this question pretty frequently, which says can we identify the political orientation from the source information? And this is partly true. It's easier to find liberal stories from the New York Times and find conservative stories from Wall

Street Journal. And we conducted a quick analysis measuring the portion of

news articles published from sources which have a clear political preference.

And one, you want to guess, all right. No idea. It's about 25 percent. So for like

74 percent it's difficult to directly infer the political orientation from the new source information. Well, this was done with news articles in Korea. So it might be different for -- but anyhow. Also regarding this 26 percent, the articles published from the sources supposed to have a clear political preference. They don't always produce politically biased content. So we have to think about some more ideas.

And we thought there would be something valuable in social annotation in news articles and took an approach using them, because it includes collective knowledge of people who are able to interpret language issues. And also was political context, which is required for political view analysis.

As we specifically used the comments on news articles, actually the commentators, we thought there will be people who actively react to political orientation of news articles and provide valuable feedbacks, because people tend to have strong and clear political preference and can't be characterized either as liberal or conservative.

And these political preferences are slightly to be consistent over time and consistent over issues. I'm going to show some example comments responding to an example news event, which was about nomination of a prime minister in

Korea, the Korean president. Well, the first comment says I'm going to translate it. The president is very smart. The nominee was considered as a potential presidential candidate also from the opposition party. So the nomination looks bipartisan.

The second comment says that comments about the nominee thought about a government project. The nominee looks so naive who if he's not then he's being stupid. And the third comment respond to a story about a suspicion related to the nominee saying, well, it's basically about -- it's swearing the nominee. Any how, you can see the commentators understand the meaning of the nomination and express their thoughts in the comments.

>>: What was the medium? Is this a blog or is it a site -- is it the site where people can leave comments.

>> Souneil Park: Yeah.

>>: Okay.

>> Souneil Park: So excuse me. This gives a glimpse of our idea. Assume there's a person who actively leaves comment to many news articles. Let's assume she's liberal. There's a certain pattern to her behavior. When she reads the liberal article she tends to leave a positive comment. When she reads a conservative article she leaves a negative comment.

We call this pattern as sentiment pattern. The pattern in the sentiment of comments responding to political orientation of news articles. If this sort of

pattern is quite common, this problem can be much easier. Just by identifying the sentiment of comments we can predict the political orientation of news articles.

And identification the sentiment of comments is likely to be easier than analyzing political orientation, because comments tend to be concise. People explicitly express sentiment. And they often include explicit words like great, worse, bad.

So our study was to analyze the usefulness of this sentiment pattern for various perspectives, whether we can really observe such pattern.

And how common that pattern is and how regular the pattern is. I think the contribution of this work is majorly in the detailed empirical investigation of commenters and follow up works how they can think about how to use this.

And I think understanding and characterizing this kind of behavioral pattern will be also useful in social media, tweets. These kind of ideas can be applied to social media.

And the dataset we use for this study we collected all data from [inaudible] news which is very popular news portal in South Korea. We first sampled two sets of news articles, the general set is made by crawling the political archives which records major political issues.

The popular set is -- the general set includes articles with many comments and also those with few comments. And the popular set is made by recalling the 20 most popular political news of the day for a six-month period, and as the stories are popular, they all have many comments.

And from each set we extract the top 50 commenters who leave the most comment. And from each commenter we sampled 40 comments to trace their comment history.

And we observed this data regarding these three questions. These are the major points we need to understand to see how effective this approach can be. First whether there are active commenters, those who leave comments on a large amount of articles. If there are active commenters, it's possible to deal with a large number of news articles with small number of commenters.

Second, whether they show a clear and consistent political preference, either as liberal or conservative. Third, and the most important, is whether they really show this kind of regular sentiment pattern.

I'm going to summarize our observations regarding these points. Measured the amount of covered articles by these top 50 comments. As you can see here, these top 50 comments cover most of the popular news articles. But the coverage seems to be low for general news articles, but this is mainly because there are many articles that have few comments.

This red dotted line here shows the proportion of news articles that have at least five comments. So if we only consider these articles, the coverage is not low.

It's actually significant.

And covering the articles with many comments is important because news articles that are located in visible places like center or head of a Web page, they get many comments. And the news article that don't have, only have few comments, they were the articles that were located in unnoticeable places.

So the results indicate that these top 50 commenters are likely to cover the -- most of the articles that people are likely to read. We also observed that these top 50 commenters are active for a long time. Among the commenters of the popular set, near 74 percent left comments more than six months.

And during this period they also left comments frequently. We measured the time between two consecutive comments of commenter that was less than three days in most cases. So with 92 percent chance they come back within three days and leave another comment.

The result was pretty much similar for the general set. So I'm just going to skip this. And then we studied whether these commenters show a clear political preference. We observed the preference based on the position they expressed about the issues they discussed in their comments.

For example, if the position was consistently conservative, we classified the commenter as conservative and if the position switched at least once in our sample, we classified the commenter as a vague commenter.

And we observed that most of them do show consistency. For the general set,

54 liberal, 38 conservative. Popular set 62 conservative. Okay. 24 liberal.

>>: Gotta get going.

>> Souneil Park: Okay. This might be because -- maybe there are people who -- I'm not sure, but this is just rumor, but anyhow, there's a story that the government hires some commenters to leave comment that support their agendas.

>>: I think that's true. [laughter].

>>: The conservatives don't think to look outside --

>> Souneil Park: Pardon?

>>: Or the conservatives only look at the things that's right in front of them. They don't think enough to go beyond that.

>>: But there are some higher [inaudible].

>>: I think that's probably not the case here. Higher sort of reviewers, reviews them on -- [inaudible] commercial. I think politically -- at least I'm not aware it's --

>> Souneil Park: This data was observed under this conservative administration.

And maybe you have a lot of conservative for the popular set because maybe they target popular news articles to have access to a lot of people.

And last and most important was whether these commenters show a regular sentiment pattern. It's important to note that consistency of political orientation doesn't guarantee a regular pattern sample. Even if a person is liberal, the person may leave negative comment to any article regardless of the political orientation of the news articles.

The person may always attack conservatives regardless of the content of the news articles. That's not the pattern that we expect. What we expect is to express different sentiments depending on political orientation of the news article.

And we characterize the sentiment pattern with two types of commenting behavior. Positive match and negative match. For example, if this woman is liberal, positive match is to leave a positive comment to a liberal article and negative match is to leave a negative comment to conservative article. If this woman was conservative, they would indicate the opposite.

We observed how frequently these positive matches or negative matches appear in the common history. If they appear most of the times, that means the person shows very regular sentiment pattern.

And for each person we measured the degree of regularity using conditional probability, and we observed that some commenters do show pretty regular sentiment pattern. It's not that general but there are some portion of people who showed this kind of regular sentiment pattern.

And I'm going to talk about how we use this regular, people who show regular behavior. We model each individual commenter as a base classifier, which takes the sentiment of a comment as input and outputs the maximum likely class of the political article.

It's really simple. Very simple model. And we conducted experiments to see the accuracy of this idea. We compared three methods. Two versions of our idea and a typical text-based classification method. The difference between the two versions is how we analyze the sentiment of comment. For the first version we did it manually. So this shows the upper bound of the proposed idea. And text classification was a very typical one, TFIDF combined with an SVM classifier.

We first used 14 commenters who show regular sentiment pattern.

And the result, the method achieved around 75 percent in both general and popular set, and both versions outperformed the text analysis method.

And we also measure the coverage of the proposed idea. Recall that we only used the small number of commenters who expressed, who showed really regular behavior.

The coverage was about 31 for the popular set but it was really low for the general set, which is mainly because there are many articles with few comments.

If we only consider the articles which have more than five comments, the coverage increases to 21 percent. And in the second round, we included more commenters who show a little bit less regular behavior. So we added 25 people.

And measured the performance. And the accuracy drops a little bit as we include this less regular commenters, but still achieves around 70 percent. And both versions still outperforms text-based analysis method.

And the coverage increases significantly. It covered almost half of the news articles. And another important experiment was about the learning curve. We analyzed the accuracy while verifying the size of the training data from five samples per commenter to 20 samples per commenter.

And we observed the accuracy was quite stable regardless of the size of the training data, which means we don't need that many samples for training. This would much relieve the burden in usage scenarios.

And we also tested one extended simple extension of this idea, which is to use multiple commenters from comment on the same article. It basically aggregates the prediction of individual commenter. This could be more robust to irregular behavior of one commenter because it can be complemented by other commenters.

And this is the result, which is pretty straightforward. If we use more number of commenters, accuracy tends to increase, but coverage decreases because more number of commenters had to leave common.

So I'm going to conclude this talk by discussing some of the potential future work.

This kind of study provides meaningful implication to other types of media, like

Twitter or other social media.

First of all, I believe similar pattern can be also observed from social media. But of course the way how users reveal their preferences will be different because in social media, their identity is quite revealed. So the style of language may be different, people may have different motives. They may try to recommend some contents.

And they may show some different patterns. And I think also this kind of behavior pattern will be observed in other cases when people react to products, companies, brands. So it would be possible to extend this idea to these different domains. Also I think more sophisticated modeling of this behavior would be also possible. So we could -- we may also think how we can use other social annotation data like Facebook like button, block things, et cetera.

And I talked about an hour. So this is the part I prepare. So if there's any questions, I'd be glad to take questions.

>>: I had a question. So kind of going back to the earlier part of your talk, I'll talk about another paper and hopefully it will come back around to yours.

So there's some work from [inaudible] on cyber chondria, I'm not sure if you've seen this paper it's a tendency when you're searching about medical conditions online to very quickly think you have cancer or some totally horrible thing.

You start off searching for headache and before you know it you're reading about brain cancer and you're all paranoid or whatever.

Anyway, it's a fun paper. But I think the general part of that paper, I read it twice, which is the first part of your work, is that the probabilities that are conveyed to you in the search results are actually totally out of whack with reality. So when you search for headache, Google shows you lots of links about brain cancer that actually -- so it looks like I have, based on the links, 50 percent chance of brain cancer whenever reality is linked one in 100,000 or something like that.

So how it relates to the first part is you were pulling out some of these deferred attributes and viewpoints to different sections. So there might be a role for conveying, like what was the one about taxes? So payment is one thing. The different size. But the things like taxes there may be a role for conventional user to somehow -- something that conveys the probability of outcome or certain underlying reality of the situation. Does that make sense?

So you showed, the example there's a whole bunch about taxes, people might think, oh, my God there's going to be a big tax hike. Is there any way to find out the likelihood of taxes coming out -- so it's one thing to say here's the protections and here's the -- but pro is very likely to happen, I don't know, you bump that up or something. I don't know if there's a way to incorporate some notion of the kind of underlying, the sort of probability of these things actually happening.

>>: If you have really a multitude of projects, I mean, you showed four on this cube, on the property tax situation.

But let's say you show four but you have 20 in the background, right? You could sort of make it visually clear which one is more similar in terms of the aspect of most of the ones in the back.

>> Yeah, exactly, right. So when you pull out, you know, the counterexample or something if only two out of 100 articles are about the counter, I don't know how you convey that or somehow use that give users sort of a meta perspective.

>>: You should think about sample the portion of news articles that cover certain aspect and portion of articles that cover different aspect. We can display that kind of information to help readers to make sense about the distribution.

But, well, news keep system is actually trying to increase the likelihood of the display of like minor things that are not reported in majority of news articles.

Pushing people to -- get more variety of information.

>>: No, totally, for opinion pieces and viewpoints and absolutely. In fact, I was thinking about when we were talking about this cyber chondria stuff, I'm using that interface in these health contexts could be like -- might be kind of useful, because I think the standard interface that doesn't make any distinction between the thing that's highly unlikely and the reality and the thing that's very likely, you know, is actually doing that and does a real disservice.

>> Souneil Park: I wanted to show you this chart, which is why I stick on this idea of increasing diversity.

It shows market share and newspaper market share in South Korea. And this three lines shows the market share of the three very conservative induced media which I showed at the first slide. And the top line here shows the sum of these three media, showing how much the market is concentrated to these three media. While they hold about like 60 percent of the market share consistently for decades, these poor two lines show the market share that are considered as stable.

You can see an amazing clear imbalance. It's like having three Fox News channel in the country.

>>: What's remaining is neutral or balanced, remaining 30 percent or so?

>> Souneil Park: Those are not like clear -- which doesn't have a clear -- but mostly like most of the newspapers are conservative side in South Korea.

>>: Let's say the electoral split and the population between liberal and conservative, is it sort of half half? Government elections.

>> Souneil Park: Pretty much close to half and half.

>>: So not representing the population --

>> Souneil Park: Right.

>>: Similar trend with the TV channels. So three major TV channels. So if any, like, popular news, method, take vitamin C every day, all the B channels showing that everyone, everyone in Korea basically loves it. So it's not quite sure here.

So that's the best --

>>: Interesting.

>>: TV channel, very well-known TV channel. I had one more question. You mentioned about the reverse primitive structure, the newspaper and that makes it easier to extract some of the aspects. Then would it apply to other, I don't know, like paper. Our research paper, you know? Because there are like certainly

viewpoints where the system let's not use the -- they're like very controversial sometimes, and they hate each other. Or they're very -- so does it only apply to the newspaper or it can be generalized to the other materials, like research papers or some journals or --

>>: There would be another kind of structure for the other type of document pretty much. And I think there actually are works that use specific structure for research papers. A lot of people use terms in the abstract section more important.

>>: So as long as you find out about proper structure, pertaining to that content, then it's -- then you can apply it to general?

>>: Yes, I guess. Inverted pyramid structure is like -- it's been like an implicit standard for a long time when journalists write stories. Because it's related to like when you write a story and then you don't have enough time to edit it but you have to put it, but when you get another advertisement, you have to cut the article from the bottom, simply cut it, that's why like --

>>: It's also cognitive probably easier, not just make the main point first and -- obviously we all read from --

>>: Yes, exactly.

>>: You want to get the most important information out because you don't know if people are going to read that. That's what automatic summarization is. The simplest baseline is to take the first of first two sentences and it's really hard to do that.

>>: People do. 11 research papers. Research papers, there's a clear structure, right. There's the abstract, and then you've got an intro, here's the related work here's the discussion.

>>: I think there are many works who try to analyze the relationship between paper is based upon the related work section. In the sense of how they -- in the sense when they port each other.

>>: The NewsCube website, that public.

>>: What was the response? Did you get some media coverage? What was the traffic light.

>> It was like more than maybe 40 years. But at that time it didn't make -- I didn't have big impacts. I didn't get any media coverage and only a small portion of people use it. It was like several hundreds low thousands.

One reason might be because if I want to make this site to have a really big impact I had to compete with things like Internet news portals, which provide a lot of different services and the users of NewsCube they first visit and say oh, this is pretty, new thing, it's really different. Then they go on to the website they usually go.

So I didn't get a lot of traffic.

>>: All kinds of --

>> Souneil Park: Right. So I think the situation is pretty much better these days because people, Twitter, for example, advertise their work. They get a lot of access to really am people. Even if it would be a short time, but anyhow you have much more access to many people.

>>: I was supposed to be somewhere at 3:00 but you could use Twitter to tweet out of different viewpoints on the same --

>> Souneil Park: Right.

>>: You could have -- exactly, just take your same story, pump them out in

Twitter and just say viewpoint number one, move to viewpoint number two. Just pushing it at people, basically. That sounds good for users. Put it in front of their eyeballs. So they have to run it. That was very cool. Thanks.

>> Souneil Park: Thank you.

[applause]

Download