>> Scott Counts: So today we have, well, someone who really doesn't need all that much of an introduction since she's been here at MSR doing a post-doc for the last year and a half actively collaborating with folks in the room. So this is Munmun De Choudhury. And
Munmun, as I said, is doing a post-doc here right now. She's about a year and a half in.
She's really just a pioneer, I think, in the field of computational social science, bringing her background in computer science and just a really strong curiosity and interest in social science questions.
Since she's been here, she's been focusing on topics in the health domain, and I think that's mainly what we're going to here about today but is interested in lots of other domains as well, and I think we'll see just a glimpse of that in her future direction slide. So please welcome
Munmun.
>>: [applause]
>> Munmun De Choudhury: Thank you Scott. And good morning everyone. I'm very excited to be presenting here today and the topic of my talk is going to be how we can mine and analyze social behavior online from different platforms and make a difference positively on our health and wellness.
So we are truly living in an information era today. Whether it seeking information about our favorite celebrity or the topic of interest, or just sharing information with our friends and family about small and big happenings in our lives. In fact one in six people in the world today is an active user of Facebook.
As we constantly usher into these really information-rich environments online presence in
some ways becoming a part of us, so much so that it is impacting the way we frame our relationships and the manner in which we interact, act, or express emotion.
Given all these huge explosion of data that we are seeing around us, there are several new challenges and a portion of these to computer scientists. For example, how do we share such huge large-scale data? How do we store them and how do me build models methods in order to make sense of them? On the other hand, to social scientists the web is providing a whole new that is enabling them analyze people and their activities on an unprecedented scale which was not possible before.
In the midst of these two mature disciplines, a new domain has been emerging, and that's called computational social science. As you would have imagined, computational social science combines computer science and social science into improving our understanding of people and their behavior.
In the last several years, growing body of work has been emerging in this area looking at various facets of human activity and behavior. For example, what kind of social and information roles do we play online? Or how do we socially influence each other, how we form groups, how network structure evolves over time, or how do we measure the strength of our ties as manifested on these platforms?
From an application perspectives researchers are constantly interested to investigate how these platforms are impacting a number of different real world phenomena and context. For example, take the example of crisis. Citizen journalism, urban informatics, emotional expression during political events, or information diffusion in the context of viral marketing.
My research in this area has been at the intersection of these three domains. Social
sciences, data mining and human computer interaction. Particularly, my broad interest lies in making sense of people's behavior as expressed on the different social platforms.
To give you some examples in my PhD dissertation, I looked at questions involving how people share information in the context of interpersonal communication on these networks, how they form groups as a result of such communication that happens online, and how do the characteristics of shared artifacts such as media objects like images and videos change or evolve over time in their characteristics due to the communication that happens around them.
In order to answer each of these questions I often resort to data mining tools and techniques.
For example, what kind of predictive models can be built to answer those questions, and what kind of analytic insights can be gathered that can tell us more about people's behavior online?
My end goal is often to make a difference to the end user which could be a single user or a set of users. For example, improving search and relevance for the end user or to generally improve and streamline their social experience online.
So how do we characterize social behavior online after all? My key hypothesis is that there are three core aspects to it. The first aspect being emotion. So what are these emotions after all? Psychology literature defines emotions to be psychophysiological reflections of our state of mind in response to implicit stimuli, for example biochemical reaction in our brain. Or explicit stimuli, for example, our social or psychological environment.
The second core aspect of our behavior which is also true online, are social interactions.
Which is a manner in which we express our emotions our feelings how we express our thoughts and form ties and build bonds with other people around us. And the third part, third core aspect of online behavior is language. Language acts as a vehicle or a medium that we
often use in order to express our emotions or whether it is to form social ties with other individuals online.
So beyond elucidating core aspects of human behavior through the three -- three things that we discussed emotion language and social interaction, we can certainly begin to identify normal patterns of human behavior among people. Another key aspect that we can begin to mine and analyze from such data are anomalies in peoples or population's behavior. In other words, to say concerns or issues that arise in people's behavior over time.
And that's the sets the stage for this talk. As I mentioned earlier, I am going to be talking about how we can analyze and mine our social behavior online and how it can make a difference to health and wellness of ourselves or of populations and so forth.
And that is what -- this is going to be the structure of this talk. I'm going to be addressing the question that I posed earlier at three different scales, starting from the micro scale of the individual going to organizations and full-timely to populations which is is the macro scale.
Let us start with the individual scale analysis and mining of human behavior and let -- let's see what we can see about people's health and wellness on that scale.
A core aspect and a very common perception of use of social media is that people resort to these tools in order to to share their thoughts and feelings around really big global happenings around us. For example, the earthquakes in Haiti. Or for example, the recent elections in the U.S. Another fundamental aspects of these websites which is actually not talked about so much is that people use these tools in order to express really personal happenings in their lives. For example, the birth of a child in the family, move to a new place, getting a new job or even traumatic experiences such as meeting with an accident or the
death of a loved one in the family.
People use all these tools to share their positive experiences because they want to share their joy with their friends, families, and audience, whereas some people may etch choose to share the negative experiences on these tools because they're probably looking for coping mechanisms by connecting with people with similar experiences or too receive social and emotion support from others.
In essence what we are saying is that these social media tools are kind of acting as a window onto understanding people's behavior around these big life events. In this particular talk, I'm going to show you the example of one big life event that characterizes a lot of our lives, and that's the birth of a child. Looking at Twitter data we are going to examine how Twitter postings of new mothers can reveal the kinds of behavioral changes that these mothers typically undergo in the postpartum period compared to prenatal period.
And why is that research interesting? This research can lead to the design and deployment of low cost and and privacy-preserving systems and tools that new mothers can use in order to to keep track of their behavior or to generally improve their postpartum health and wellness.
So getting started on the actual work the prime challenge is how do we automatically identify these new mothers, right? Which leads us to the question of how do we even identify automatically birth events on Twitter? For that purpose we created an ensemble of several different key phrases which typically characterizes birth announcements as found in online newspapers. And you can see some examples of that on the table on the slide.
We went to the Twitter firehose stream, looked for these key phrases and different postings and came up with a candidate set of posts whose authors subset of which are likely to be new mothers. Of course, we -- we are only focusing on the female population so we are add a gender classifier on top of that in order to extract the female users out of that. Sure.
>>: So I'm just wondering, how well do the kinds of language that people use in traditional newspaper actually translate to Twitter? Because the language that we use on Twitter is very different than traditional formal language.
>> Munmun De Choudhury: Actually the birth announcements are exactly the same, I mean, because they are usually the online newspaper announcements are short and they mention the weight of the baby, the height of the baby, and you know so-and-so was born to so-and-so on certain date with grandparents so-and-so. It's a very structured thing that we notice that's true on Twitter as well.
>>: So I guess I was wondering, are you able to check for example, patterns on Twitter that might -- you might be missing? Like maybe some new sort of slang or abbreviations or to refer to these terms that might not be caught by that?
>> Munmun De Choudhury: I haven't, I mean, anecdotal experience will say that there isn't much slang in especially birth events because it's kind of a serious thing. People don't want to use slangs and that kind of stuff. So we didn't support those kinds of things, but there are certain things that Twitter specific example for example, it's a boy or a girl. No one really uses that in a formal birth announcement in a newspaper, but that's something we discovered from
Twitter. But of course, I want to say that this is not hundred percent coverage. There is certainly phrases that we have not captured, and that's kind of a future world.
Okay. So we ran our gender classifier and came up with a set of users who were female users, but of course there would be noise even in that because there could be non-mothers who are posting about the birth of a child in the family. And therefore we adopted a crowd-sourcing strategy, on Amazon's Mechanical Turk, we showed the particular birth event posting to every crowd worker, and in order to give them some context we also showed them a set of ten postings before and after that childbirth posting in order so that they can figure out if it's really a mother or not.
We also gave them -- showed them several others cues such as if there was a picture of the person present on the Twitter profile and if there was a profile description. However, we refrain from showing any really name that was reported on Twitter or the Twitter user ID for privacy reasons. This exercise gave us a set of 85 users who were validated to be new mothers.
We again we believe the back to Twitter firehose and collected all of their postings over a five-month period before the birth of the child and another five-month period after the birth of the child.
Given all these postings -- yeah?
>>: Even -- I'm just surprised at 85 seems like a small number of --
>> Munmun De Choudhury: It's over a two-month period.
>>: -- Twitter is. Right, so even 85 just strikes me as many fewer mothers than I would have
imagined posting announcements on Twitter period. Do you have a sense for why that...
>> Munmun De Choudhury: That's because we wanted to focus on a really high precision set of mothers for whom we could actually go and validate that they're truly mothers because this is the first work of this nature, we truly wanted to be sure that these are new mothers and not include noise. And especially because we wanted to come up with an automated mechanism.
We wanted to see how good this automated mechanism is if you want to expand out later.
>>: And that was 85 out of how many that were --
>> Munmun De Choudhury: Twitter firehose based on those phrases over a two-month period in 2011.
>>: How many were potential candidates were mothers sent [inaudible]?
>> Munmun De Choudhury: Oh, I see. I think it was around five hundred.
Okay. So how do we measure the behavior of these new mothers given the postings that we have derived for either prenatal and postnatal periods? We define four categories of -- of measures. The first one being activity. It had measures such as volume which is the average normalized levels of posting over any given days, replies, which is a proxy to social interaction of the mothers the question, the number of questions that we asked, the links that they shared which is a signal to their information sharing behavior and so forth.
We had another two measures of ego-network, a number of inlinks and outlinks which correspond to the number of followers, followees on Twitter. We had four measures of
emotion, positive affect and negative affect, which were derived using a popular strict cycling lexicon called LIWC. And we had two other measures of emotion, activation and dominance, which correspond to the intensity of controlling power of emotion respectively, and they were derived using the ANEW lexicon.
The fourth set of measures was around linguistic style. And linguistic styles are particularly interesting because they reflect people's behavior and the way they -- they express language in the context of their social and psychological environment. We again used LIWC in order to derive 22 different linguistic markers that are directly found in English language, for example, pronoun use, function words, certainty, inhibition, quantifiers and so forth.
Let us start with our empirical study. To start with, we wanted to see these mothers are an aggregated manner, how did they change behavior in postpartum compared to prenatal?
Each of these charts that you see here corresponds to a measure that we just discussed earlier. We also wanted get a sense of how these mothers are different from the average
Twitter users and for that we define a set of users which we call the background cohort. It's a set of randomly sampled 50 thousand users from Twitters who made postings in the same time period, however did not have any evidence of giving birth to a child in the time period that we study.
So here --
>>: [inaudible] on gender, or were they anybody?
>> Munmun De Choudhury: Anybody. Yeah. So here the red line corresponds to the mothers and the green ones correspond to the background cohort. What we notice is that
these mothers are showing differences which are statistically significant in the postpartum period compared the prenatal for example, volume seems to go down, and so does positive affect, negative affect goes up, use of first person pronoun goes up, whereas activation and dominance, go down together. However, that doesn't seem to be the case with the -- with the background cohort. Yeah?
>>: So the -- the foreground cohort, the women who have just given childbirth, their timelines are aligned to the date of the childbirth?
>> Munmun De Choudhury: Yeah.
>>: And what is the background timeline align to?
>> Munmun De Choudhury: It's the same time line that we considered for the mothers, like five months on left of the blue line and five months on the right. And the background cohort we looked at their postings in the same ten month period.
>>: Oh. And so did all these women give birth on the same day? Like the timelines are aligned like before and after childbirth?
>> Munmun De Choudhury: Yeah.
>>: Okay.
>> Munmun De Choudhury: Yeah.
>>: [inaudible] background cohorts would be basically flat in all of these measures.
>> Munmun De Choudhury: I --
>>: When you look at something like volume, there's --
>> Munmun De Choudhury: Increase.
>>: The green line is definitely moving around [inaudible] --
>> Munmun De Choudhury: Yes. So the reason, we actually -- I would have pointed that out anyways. That's because in general if you look at Twitter the volume of posting is always going up in general. People are always using it more. Yeah, it's -- yeah, it's -- it's general trend on -- on Twitter. However, if you look at some of the other ones which are not dependent upon volume, they kind of have a flat trend.
>>: Okay.
>> Munmun De Choudhury: Okay. So this slide tells us that mothers in general change a lot.
However the question really is some mothers probably change more than others, right? And and being able to identify and track the behavior of those mothers can have a lot of implication in their health and wellness. For that purpose, we moved onto individual level comparison.
These two heat maps that you see here correspond to two measures positive affect and activation, it's the RGB scale which means that blue is the smallest value, red is the highest,
and the yellows and greens are anything in the middle. And the white line corresponds to the time of childbirth. Anything on the right is postpartum, anything on the left is prenatal. It turns out that our conjecture is true. In fact, for some mothers we notice that their positivity and activation both go down in the postpartum period.
We wanted to quantify these changes in these mothers statistically, and see that these small set of mothers are truly different from the rest. For that purpose we computed effect sizes using Cohen's d, and this table summarizes the number of mothers with small, medium, and large effect size changes across the different dimensions of behavior -- behavioral measures that we considered here.
If you notice we have about a quarter of the mothers who show large effects size changes in terms of activity. Which is probably intuitive, because after childbirth, it's a very busy time for the mothers, and so intuitively they are so overwhelmed with new responsibilities that they are not having enough time to go do social media and do postings.
However, you see the number of mothers who show large effect size change for emotion, that's a relatively smaller number. We were curious about this set and we went and looked for the trends across all the different categories of measures we can study here. And it turns out that these 12 mothers actually show large effect size changes across all the measures that we have considered in this study.
What do these mothers post about? Yeah?
>>: Can I ask a question about that last table?
>> Munmun De Choudhury: Yeah.
>>: The numbers there, are they the numbers of -- the number of mothers who for instance had a small effect on activity?
>> Munmun De Choudhury: Small effect size change. If you're -- yeah.
>>: I just -- should the rows sum to 85?
>> Munmun De Choudhury: Yeah, they don't. They don't because we considered a threshold even for the small effect size changes. So some people were not satisfying that -- that threshold.
>>: Right.
>> Munmun De Choudhury: Yeah. Otherwise, they should sum to 85. Okay.
So what kind of postings do these mothers with large effect size changes make? We wanted to take a look at that qualitatively, and we noticed that these mothers are actually posting about feeling lost, about loneliness, they also complain about insomnia, and they even use self-derogatory phrases such as worst mother, horrible monster mother, and so forth.
However, if you look at the postings that these mothers with small effect size changes are making, that's not the case. In fact, they seem to be pretty excited about this new change in their lives and they seem to be turning to social media platforms in order to gather information about different questions they might be having around bringing up a newborn.
We wanted to quantify these differences in the postings in a linguistic, on a -- on a quantitative manner based on the language that these mothers use in the postings. What we did was we extracted unigrams, or single words, from all of the postings of these mothers of the three cohorts, mothers with large effect side change, small effect size change, and the background cohort. And we looked at for each unigram the degree of change in postpartum compared the prenatal period.
So at times are that for more than a quarter of the unigrams for the mothers with large effect size changes, they show statistically significant change in postpartum. That number seems to be much lower compared for the mothers -- mothers with small effect size change and actually really low for the background cohort.
So what are some of these top changing unigrams? That is reported in the table below, and over here we see that for the mothers with large effect size changes a lot of these words are actually emotional in nature. And if you look carefully, the positive emotion words seem to be going down in usage, whereas the negative emotion ones seem to go up in usage. However that is not the case with the oh two cohorts that we are studying.
Based on that, we finally framed a technique which we call the greedy differencing analysis.
The goal of this technique was to identify the span of language that would make these mothers different from the rest in terms of these unigram usage. What we -- what this particular technique does, it's starting from the top changing unigram for the mothers with large effect size changes it eliminates one unigram at a time and at every iteration it computes the language distance using a standard similarity metric like Euclidean distance where the mothers with small effect size change and the background cohort.
We had two very interesting findings in this exercise. We found that for the elimination of a little over one percent of the unigrams the mothers with large effect size changes become similar in their language use to the mothers with small effect size change. And with the elimination of a little under 11 percent of the unigrams, they become similar in the language use to the background cohort.
This exercise tells us that there are -- there is a lot of discriminatory power in the language used by the mothers as well as the range of behavioral measures that we saw earlier in being able to forecast and know in advance which mothers are likely to show more changes than others in the future.
And this led us to this question, which is: What if we could predict these kind of changes in advance? For that purpose, we came up with a prediction setup. First of all we expanded our data collection, and then we came up with binary classification framework, which is a supervised learning framework and it predicts the labels are across two classes of mothers, the first one being mothers showing extreme changes along a certain behavioral measures beyond a certain threshold and -- and the others -- and the set of mothers who don't. We used five-fold cross validation for the purpose.
We -- particularly, we represented each mother as a vector of the different behavioral measures that we saw earlier, we eliminated feature interaction and redundancy with principal competent analysis, and then we chose the support vector machine classified with a radial basis kernel in order to predict the labels of the mothers across the two classes.
We trained our first model using prenatal data alone. We considered three months of prenatal
data before the report of childbirth of the mothers in our data set and tried to predict their directionality of change. That is, for example, their volume would go up or down three months into the future after childbirth.
The performance of the model is reported in the table. We notice that we are doing pretty well. We have more than 71 percent accuracy for predicting extreme changing mothers, with reasonably high accuracy and recall. Across the various measures we notice that the linguistic styles perform particularly well. For example pronoun use -- yeah?
>>: So in -- in the offline world, are doctors able to predict ahead of time who is at risk for postpartum depression before they've actually given birth?
>> Munmun De Choudhury: As far as -- I mean, my reading of the clinical literature says that doctors can provide warnings, like soft warnings, if the mother had a history of depression in the past, because prepartum depression is actually the best predictor of postpartum depression.
>>: And so is that what you think you're detecting here?
>> Munmun De Choudhury: Exactly. So this is kind of capturing if there were evidence of depression vulnerability in the prenatal period.
And if you look at the other measures for emotion, we notice that we are doing, we have pretty good results for negative effect and activation. The performance of this model is shown in the ROC curve that you see on the left. But as you can see there is a lot of room for improvement, right? And we wanted to explore how we can improve the performance of this
model. We wanted to investigate that the initial couple of weeks of behavior of these mothers right after childbirth probably have a lot of signals and cues there, which we could leverage together with the prenatal data in being able to predict what is going to happen three months from that point in time.
So we trained another model, which is just the prenatal data and an optimal training window depending upon what measure it is of one to two weeks after the report of childbirth, and guess what? The performance of the model goes up, and up to 80 percent. And as you can find that from the ROC curve on the right side of the screen as well.
So what are the implications of this research? We have been able to identify a set of 14 to 15 percent mothers for whom we see very extreme change in their behavior. For example their volume of posting goes down, negativity goes up, positive activity goes down, and so does activation, which is intensity of emotion and dominance the controlling power of emotion also goes down. The use of first person pronouns seems to go up whereas that of third and second person pronouns seem to go down.
And actually a lot of these changes, they are not instantaneous. They are actually pretty consistent over a long period of time in the postpartum period. And we also saw that how we can make use of prenatal data in making a prediction these changes ahead of time.
If you look at psycholinguistic literature, it will tell us that these kind of markers that we observed of behavioral change they are actually known to be associated with depression or other kinds of mood and affective disorders in people. The second interest finding is that the
14 to 15 percent mothers that we identified showing these extreme changes they seem to align with the reported statistics of postpartum depression which is an under-reported health
concern found in some mothers in -- in the United States.
This research gives us hope that possibly we can use social media tools and the kind of behavior we mine from social media in the development of an unobtrusive diagnostic markers of behavioral disorders, particularly postpartum depression in case, but hopeful in the future for other kinds of disorder as well. And that -- that led us to the thought that maybe it can be extended to other affective disordered such as depression, PTSD that a lot of the troops experience when they're back from deployment, seasonal affective disorder -- of course, we live in Seattle so we have to study that -- elderly depression, teen suicide and so forth.
I'm going to take the example of one particular mental illness which is pretty common among people across the range of mood disorders, and that's major depressive disorder. A challenge in this kind of research -- yeah?
>>: Actually, going back to your teen suicide for example. So are you proposing that this method is something that could be used to just monitor the general prevalence of a conditions like through trends or are you actually imposing, like, teen suicide seems to imply that you wouldn't presumably just want to monitor that, where you would actually want to intervene --
>> Munmun De Choudhury: Uh-huh.
>>: -- and prevent the activity. So do you think that that is actually feasible given the nature of Twitter and the somewhat anonymous nature of most accounts? How would one actually go about, if you observed your method like someone that was high risk, how would you actually track that person down and identify them? Are there practical and ethical problems with doing that?
>> Munmun De Choudhury: Uh-huh. So actually, I'll cover most of it throughout the talk, but very quickly, there are many ways to intervene. I'll show one demo of one tool that can make that happen. So you could do interventions on two levels. One is the level of the person.
You give them soft intervention cues that they can use to modify their behavior. Or you inform, for teens, you inform their parents or other responsible people who can intervene and take an action. The other level is to talk to doctors and hospitals and people who are already known to be at risk of of these conditions the doctors could intervene and -- and help them.
>>: But how do you find who the person is from Twitter? Because most people I assume don't reveal their true contact information or identity in their Twitter accounts, so --
>> Munmun De Choudhury: So on a personal level it has to be opt in. Like someone would opt in to volunteer in this kind of a a program where their Twitter feeds could be analyzed.
They're public in nature anyway, the ones that we have seen here. So there isn't as much of the ethical -- privacy concern. However, there's an ethical concern and that's why I'm saying opt in. If someone opts in to choose that Twitter handle, hopefully they are okay with this kind of analysis which is completely anonymous and automated by the way. I can return to those questions towards the end of the talk. Okay.
So a challenge in this kind of research involving mood disordered is it ties a little bit to what
Mary was -- was asking about is how do we have gold standard labels or ground truth data on people who are actually suffering from clinical depression? For that purpose in this work we resorted to crowd sourcing technique. We went out to Amazon's Mechanical Turk, and we asked the crowd workers to take a survey which is the standard depression evaluation cycle, psychometrics survey that a lot of the psychiatrists use, and then we also asked them a few
questions about their depression history if that was present.
Finally we asked them if they would like to opt in and share their Twitter user name on -- on on the task that they were taking, and we found around 40 percent response rate. And they were made aware that this is only going to be used for research purposes in an anonymous manner. And these were the owners of public profiles on Twitter.
Given -- we obtained a set of little under five hundred people in this manner. We went onto
Twitter using their Twitter handle and we collected all of their postings over a one-year period dating back from the reported onset of depression if it was present, and if it wasn't we just went back for the one year from the time that they took the survey.
So what are some of the differences in behavior across these two cohorts? We summarize that in this slide. On the left side of the slide we study the dire note patterns of postings on the of the depression class and the non-depression class. For the non-depression class, we'll notice that it's actually pretty intuitive.
As we go through the day, the volume of postings generally increases, whereas for the depression class we notice that the peek of activity actually happen late in the night and early in the morning. And throughout the day they, the volume of activities is actually pretty low. If you look at clinical literature it will tell us that actually 8 out of 10 people suffer from depression also have symptoms of insomnia, and nighttime internet activity is a common, common feature of those kinds of individuals.
We have similar differences on a range of other measures as well for example volume it seems to be monotonically going down over the one-year period before the reported onset of
depression for the depression class, and so do the replies whereas the negative effect and use of depression language, which I'll come back to very soon, seem to go up over the one year period before the onset of compression. However, that -- these things don't seem to be the case for the other cohort.
Let us look at some of the kinds of language that these, the users of the depression class use. We categorize them across various themes. It seems that these users actually talk quite a bit about their symptoms, for example, fatigue, imbalance, moods swings, nervousness and so forth. They seem to be using social media as a discourse tool, in order express their emotions, discuss about coping mechanisms or connect with other people with similar experiences.
They, in fact, have pretty detailed information about treatment as well. And they seem to be talking about antidepressants, dosage levels, like very accurate ones, like hundred 50 milligrams, 40 milligrams and so forth, medication, psychotherapy information and so forth.
And finally they broadly talk about relationships and lives in general with the focus on talking about the religious involvement. Yeah.
>>: How representative a sample do you think you have? I guess I wonder, are the people who are willing to share information on Mechanical Turk about their mental health history perhaps people who are also more likely to be very public in discussing these kinds of things in a public venue like Twitter?
>> Munmun De Choudhury: That's possible. However, one thing that I should mention is when we asked people to take the depression survey, we just mentioned that it is a general survey. It's not a depression survey, because we did not want to freak people out, and then
people who completed that beyond a certain threshold which meant that they were probably having depression, we showed them the next page which was a set of specific self-reported questions. So they were not given the information that this is in openly inviting people to share information of their depressed, we did not do that. So I guess there is less of a bias, but it's possible that is there is some bias.
>>: So the people, I guess I was -- because in your chart you talked about the reported data of onset of depression, which implied that [indiscernible] --
>> Munmun De Choudhury: That was last on the second page. Yes.
>>: But so were there people who scored that they were depressed on the survey but then in the self-report part didn't self-report as having a diagnosis? Like how did you treat --
>> Munmun De Choudhury: The -- we did not consider those data points.
>>: Right, because those seem like they might be the people might have different behavior on Twitter --
>> Munmun De Choudhury: But there are people chose other methods to avoid not participating in it, which is when they answered everything but they did not provide us a
Twitter handle, which is the actually the rest of the almost the rest of the 60 percent of the population. So there were certainly people like that, and we actually paid them because it was not a restriction for them to take the survey.
>>: I guess I was just wondering if those people who tested as being having depression
didn't disclose had different types of language use on Twitter that might represent, that might be harder to detect or something?
>> Munmun De Choudhury: Right. A lot of those people didn't even share the Twitter handle, so it was probably difficult for us to go back and check that. But that's a very good point.
>>: Are you going to talk about what kind of language these people used before the onset of the depression? Do -- do you -- were you able to capture?
>> Munmun De Choudhury: This is before the onset. One-year period before the reported onset.
>>: Oh. So they're talking about hospitalization and drugs and --
>> Munmun De Choudhury: That's --
>>: -- antidepressants before the onset of their depression?
>> Munmun De Choudhury: Yeah, actually, some people do. Like just before the onset, they are actually thinking about if they're that kind of presence that other people might be suggesting to them based on their symptoms. So --
>>: Okay.
>> Munmun De Choudhury: Yeah. Yeah?
>>: Is it onset or is it diagnosis? Is that --
>>: It's not onset.
>>: It's not onset. Onset's before they know they have depression.
>> Munmun De Choudhury: On -- well, sometimes this is what they said, I mean this is the date they knew they had depression. That's probably the diagnosis. Onset might be different. Like sometimes people don't even know what the onset is so that's little difficult to find -- find out.
Okay. So like we did for the new mothers for this case it's also we wanted to see if we could actually predict the time based on these, these social media postings and the different behavioral measures whether or not which people are likely to be depression, well, vulnerable. We came up with number of different features based on the different behavioral measures that we had, and particularly ones which would take into account the trends of the yearlong period of postings.
So we treated each behavioral measure as a time series signal, and over that, we computed the mean frequency, the variance, momentum which is a rate of change over time, and the entropy, which is the degree of uncertainty or [indiscernible] in a time series of the measure.
Again, we trained a support vector machine model on that. We obtained fairly good accuracy of little over 70 percent and the performance of that model is shown in this ROC curve. We notice that the depression terms are performing pretty well and so are the linguistics tiles.
So tying it all together, the initial studies of all the new mothers and then the latter study about
major depression in individuals, I believe that this research can lead to -- to the design and deployment of new kinds of smart intervention techniques and early warning systems which could be given to individuals in a completely private manner, for example, a smart phone application in order to enable them reflect on their behavior and through subtle interventions enable them make a positive impact in their lives.
I'm going to demo quickly a tool on that light called Moon Phrases. A lot of you have already seen that during Tech Fest last week, and the goal of Moon Phrases is it it looks like this and the goal of Moon Phrases is to enable people reflect on one of the goals is to enable reflect on their behavior over time. So we look at the Twitter postings off a user like we have up there, and use an analogy of the lunar phases in order to visualize the trends of their emotionality over time.
So what that means is if the moon is more eliminated on a certain day, the person was more positive. The orange moons that we see reflect that the postings volume was high, which is more than three postings, and people can even click on a certain moon and see what kind of postings that they made on a certain day and evaluate what led to a positive affect or a negative act for that person on that certain day. The nice thing about this tool is that it provides a very subtle intervention mechanism in, enabling people reflect on nary behavior over time. For example, identify episodes where they were very positive or where they were very negative.
With that thought, I will move onto the next part of -- of this presentation, which is the organizational part. How -- yeah?
>>: [inaudible] I wanted to ask a question about Moon Phrases. So you mentioned that there
are lots of different kinds of signals that you could detect in the Twitter signal that might be an indicator of depression, like the the insomnia for example. Have you thought about how to package to get all those different signals in a way that is understandable?
>> Munmun De Choudhury: Right. I mean we are still working out the details in that, about what kind of signals we can show if it's a smart phone app, that will be useful to the people and kind of what kind of interventions we should make. Moon phase is a very initial preliminary tool that just shows the emotionality and linguistic style usage. And we wanted to do a user study and see what kind of reports we get back from people about what signals are useful. But those are great points. Eventually we want to incorporate all of that into the tool.
Moving onto the second part, I'll go through this very quickly. It's about how we can analyze and mine behavior at the level of an organization, and what it tells us about health and wellness.
So the emotional health of an organization or the affective climate is very important. Because organization, it's related to the organizational outcomes and its operational procedure.
Organizations often want to know how their employers are feeling in general around different changes in company policies, for example, product releases and so forth, or generally what are the factors behind their general satisfaction, success and failure and so forth.
With that goal in mind, we leverage an enterprise microblog, which is a Twitter-like tool that office talk that was adopted by people at Microsoft. We selected several hundred thousand postings over close to a year period and looked at the kinds of affect positivity and negativity that is expressed by people through those postings.
I'll highlight two main findings of the work. On the top picture there, we notice that across the organizational hierarchy individuals tend to make some kind of affective accommodation, that is, they a adapt or change their positivity and negativity when they are interacting with someone of a different status level. Particularly, individuals with higher status seem to be more positive and less negative compared to their baseline levels of affect I have expression on the microblog.
We also notice that individuals show kind of strong dissatisfaction towards after-hours work as is shown in this plot H. where we see that positive affect actually shows a sharp drop after the end of the typical workday. Putting it together, the implication of this work lie in the design of some kind of an effect affective dash board. Yeah?
>>: Could you go back one slide? Is it -- is it -- I just want to interpret this slide -- is it that the people who are posting at you know 7 to 9 p.m., is it that the people who are happy are not posting, or that they're still posting but now they're posting negative, more negative than positive? Like is it that people become more negative in the evening, or is it just that only the negative people --
>> Munmun De Choudhury: Oh, you mean like do the same people [indiscernible] -- we act -- no. These are aggregated volumes, so it's not clear if it's same person or different person.
But that -- that's a great point.
Okay. So Microsoft recently acquired Yammer, and in that light, it seems particularly interesting in putting together something we can call an affective dashboard. A tool that this organizations are the HR management can use in order to explore completely anonymously in an aggregated fashion the trends of emotion of the employees within the company.
There is also an [indiscernible] to make similar kind of tool at the individual level where individuals, it could connect to individuals various social feeds within the workplace and it could tell them about distress and anxiety levels in the day-to-day course of work and thereby help them make an impact on health and wellness.
And finally, the third part of this talk which is going to be about how we can mine and analyze behavior at the level of the population and what it means for health and wellness in that scale.
There has been quite a bit of talk and research on that space. So Google flu trends. We all are aware of it, what Google does is it takes into account such queries around flu and comes up with a measure which shows the vulnerability of different regions of the world to flu within a short period of time into the future.
Can we do something similar on that light which is macroscopic population trend of mood disorder in populations? I was particularly important because affective disorders are very very serious challenge in public health. More than three hundred million people in the world today are affected by depression, and it is also responsible for the more than 30 thousand suicides that happen in the U.S. every year.
On that light, we moved onto Twitter and tried to first begin with an understanding of how we can model the affect of large populations. We notice an interesting culture on Twitter, which is that people use mood words as hashtags at the ends of their postings, which kind of serves as a supervisory signal reflecting their emotion in the context of the tweet. So the examples that we see over here express a variety of different emotions like grumpy, excited, lazy, and so forth. Using this exercise, based on this phenomenon that we observe, we went through an exercise in which we used crowd-sourcing techniques and a number of inputs from
psychology literature to come up with more than two hundred moods which are shared on
Twitter in this way.
So how do we represent the characteristics of human behavior at scale using these moods?
We moved onto a psychometric instrument that is popular in psychology literature that's the circumplex model. The circumplex model represents emotion along two perpendicular axis, the first one being the unpleasant-pleasant scale, which is the way we observe positivity-negativity, and the calm and activated scale, which is the intensity of an emotion.
We plotted all the two hundred moods that we obtain from Twitter on that circumplex and we notice a fairly equal distribution across the four quadrants.
Next we wanted to see what -- how can we characterize people's social behavior at scale using this kind of a psychometric instrument? I would highlight the main findings of this slide.
On the tops are circumplex that we see here, it shows the usage levels of the different moods.
So the bigger the squares are the more used that particular mood is. We notice that it's no longer equally distributed across the four quadrants. In fact, people on Twitter seem to be expressing more negative and low intensity moods than others. With a little bit of high usage of very high variance and very high activation moods.
We overlaid the same circumplex this time using based on people's social activity which is the number of postings they make on Twitter, and it turns out from the circumplex below that individuals who are more socially active tend to express more positive emotion, which is shown by a greater number of larger squares on the right side of the circle than the other.
Given these findings, can we automatically identify the affect in any arbitrary post? This give rise to our second research question in which we developed an affect classifier which could
detect in any arbitrary tweet one of a variety of different affective states that a person likely is experiencing. This is the performance of the classifier, the precision recall curve, we notice that we are doing pretty good in for a number of different more variant states, for example, joviality, serenity, fatigue, and so forth.
Now that we know how to model the affect of large populations how can we leverage these findings in coming up with an index just like Google flu does, flu trends do, which would measure the depression levels in large populations. We call this depression index to be social media depression index, and what it does is is that it's given by the standardized difference of the of the volumes of depression indicative posts to the volume of posts which are not depression indicative.
We wanted to know how good after all is this index in terms of actually capturing depression levels in large populations. We obtained actual data on depression statistics from the Centers for Disease Control and the National Institutes of Mental Health, and the heat map that you see on the left side of the slide it shows the levels of depression in the 50 states of the U.S.
We have a similar figure on the right side; however, here the depression levels are given by our measure which is the social media depression index. Visually we see that there is quite a bit of correspondence across the two maps and on quantifying it statistically, least square regression fit gives correlation of more than point five, which is pretty high.
We wanted to explore more fine-grained and nuanced attributes of population scale -- scale depression which is not given by CDC data sometimes. On one hand, we notice that women tend to be more depressed almost as twice compared to men, a fact which is also supported by CDC data. We notice that over the course of the data for both men and women, depression seems to be higher late in the night and early in the morning than during the day.
And finally we also observed that there is a seasonal component to depression which is that in the -- in cities in the U.S. with more extreme weather conditions, with extreme more -- more extreme variations in climate, the depression rate seems to be higher in the winter than it is during the -- during the summer.
Weaving all the pieces of work together, I want to highlight the potential that we have in using these postings and -- and fine-grained activities of naturalistic data that people post and share on different social media platforms and how we can make sense of that and make an impact in people's lives at multiple scales in terms of their health and wellness. And the possibilities are not just limited to healthcare. Health and wellness are very important attributes in -- in terms of social, political, and economic prosperity. And I'm going to take the examples of three different domains where these findings and these kinds of methodologies can be extended for gaining other insights into other kind of real-world issues.
The first one is finance and exhibition. In a previous work of mine we looked at the commenting pattern on blogs around tech companies and tried to use them as predictors of stock market movement for those companies obtained from NASDAQ. Going on those lines in other more current work we are working with a capital markets group at Microsoft and putting together what we call a social media where trading strategy which would derive a bunch of different signals from social media and use them to make predictions of the SMP index over time. We also notice that the kinds of signals that we obtained from social media tend to be indicative of broad macroeconomic indicators as given by the mission and consumer confidence index.
The other domain is that of politics, and we are increasingly aware of the important role played by social media during the elections in 2008 and actually more so a few months back.
On those lines, Bing put together an elections 2012 page back in November, a piece of this tool used the affective classifier that I discussed earlier in revealing how people were feeling along the two Presidential candidates and more recently being also put together the state of the union page which again made use of the affect classifier that we talked about earlier in revealing people's sentiments around a variety of different political issues.
And the third -- third domain that I'm going to be talking about where these kinds of methods and insights can be useful is that of journalism. Together with colleagues at [indiscernible] university, we put together tool which we call the sourcer, or seriously rapid source review. It was tool that journalists could use in order to discover relevant and useful information around breaking news event. So this would mean the tornadoes that happened in Joplin a few years back. A key aspect of this research was that it was the, we built the first of its kind eyewitness classifier. What it means is that we looked at the postings and the social behavior of individuals who were involved in a certain breaking news event and the media forecast whether or not that person was actually on the ground there having a firsthand experience of the event and the journalist that we talked to found that piece of information to be extremely useful and something to which is not provided by current state of the art.
As we move towards the end of this talk I want to reflect and think about the variety of possibilities that we have in terms of making use, making sense of the behavior based on naturalistic data obtained from different social platforms today. In the future I hope to be working towards how to how we can leverage these findings, tools, and methods and insights into building a healthy and sustainable society.
With the terabytes and petabytes of information that are being generated every day, I hope to be working towards how we can make the world a better place to live in. By leveraging these
findings, for example, it is whether it is healthful living becoming more responsible individuals, socially and economically or moving towards -- striving towards a more sustainable future.
I believe that the breadcrumbs of -- of information that we can arrive from these platforms or people's activities has a lot to offer.
I'm going to highlight four different domains and the kinds of questions that I'm interested to answer in that light in the future. But first one public health, contrary to what I've been doing before I'm broadly interested in under what conditions in a network can we make people adopt healthy behaviors? It leads to several other questions. For example, what are effective models of disease spread in these networks and that sort of media? How can we build predictive models that will tell us fine-grained information about mental health seeking behaviors of suicide or health risk behavior. Or how do we design effective feedback techniques or persuasive technologies that give to people which they can use to make a difference to their health and wellness?
The second domain is that of environment and whether or not I'm interested in how we can make use of people's activities online and under what conditions we can encourage people adopt sustainable living. Within that, questions of interest include how we can mine information related to sustainable health and sustainable living from social media. What are the models of influence that we can use in order to to encourage people adopt those behaviors and how does this kind of a behavior actually propagate in natural social media?
The third domain is that of urban planning or generally urban informatics, and within that I'm interested in exploring what IBM calls a smart cities, are more generally safer urban neighborhoods with increased safety, decreased social isolation, and increased mobility.
Which gives rise to the question such as how do we model mobility of individuals based on social geotemporal data that is shed online, and what are the -- what are the implication of that kind of mining of data in terms of city infrastructure for example traffic management, public transport systems and so forth.
And finally, a domain which is very close to my heart, that is education. And how we can leverage our findings and insights about people's behavior on online networks into encouraging kids find education more fun. On that line, I'm really amazed by how kids find these massive online multiplayer games to be so sticky. And I believe that the social paradigm we have underlying those kind of systems can probably be leveraged in order to adopt the pedagogy technique of learning through play and making education more fun to kids in the future.
Beyond all these domains, I want to highlight and bring reference of Harry Sheldon [phonetic] who was a psychohistorian and a prominent character in Isaac Asimov's famous foundation series. And a typical -- and a characteristic attribute of Harry Sheldon was that he could make pretty good forecasting and predictions about large sets of people into the future.
Broadly my future research is geared towards building a general model of behavior of this kind where we can leverage the social geotemporal and other naturalistic data that the constantly being shared online and make inferences about real world phenomena, real world issues like contacts and so forth. However, unlike Asimov's science fiction stories, probably crystal ball which is an oracle and tells us everything about the future is probably not possible.
Right? Because map underlying these complex social systems is very sensitive to the initial conditions and the errors we make to start with are likely to snowball and lead to widely different outcomes. So we are looking at really long term processes.
However, I believe that seven years from now we can think of something like a social weather forecasting model. That would take all this humongous data around people's activities behavior, emotion, language and so forth around a certain specific domain and make predictions and forecasting with a certain degree of confidence of short term into the future, and I would really look forward to that happening in the future.
Finally, I would acknowledge all my coauthors, collaborators, and everyone else who has supported me in various ways and Microsoft and otherwise. And with that I come to an end of talk. I would be glad to take any questions.
>>: [applause]
>> Munmun De Choudhury: Yeah.
>>: So all of these things are based off of people's public Twitter data. And it seems like that's a very narrow section of the population. So how do you get out of just the people who are privacy and sensitive kind of social media -- I don't know...
>> Munmun De Choudhury: I mean I guess things are changing people are constantly using these kinds of tools. Twitter is public, but Facebook actually has a greater population in terms of if we are talking of the representation. It will also depend upon the kinds of tools and techniques we build in order to intervene and make an impact. So at the personal level, it could be opt in or it could be someone who -- who is okay with -- with sharing, or if the sharing process is extremely private that no one else sees that, people might be okay with that. That would take care of representation issue to certain extent and we can go to larger populations
where things are not public. And then there is other option of looking at collective scale trends which can be aggregated and therefore completely anonymous and say governmental agencies can make use of it.
>>: But even if you're aggregating, you're aggregating the people who choose to have public
Twitter accounts.
>> Munmun De Choudhury: No, then, if you're aggregating you can look for other sources of data, like search which has definitely -- or general web use which has way more penetration than Twitter.
>>: [inaudible] have there been sort of general studies of Twitter and sort of to classify, what does that look like? I mean, who are the Twitter posters? It -- has there been --
>> Munmun De Choudhury: Yeah, [indiscernible] has a bunch of different, keeps doing surveys on that, it's about eight percent of U.S. population, the ethnicity distributions are a little bit skewed in terms of the actual distribution of ethnicity in the U.S. There are differences and I think there are more women on Twitter.
>>: [inaudible]
>> Munmun De Choudhury: That's -- I think pew [phonetic] does have data on that but I'm personally not aware. But when we did the --
>>: [inaudible] blacks and Hispanics, so there might be --
>>: You just -- you mean --
>>: There are more blacks and Hispanics, quite a lot more [inaudible] population.
>>: Oh, okay.
>> Munmun De Choudhury: So it's slightly skewed so it's not the perfect representative sample, but you can think of it as a modern day pew, right, they know the right way to sample people. They go on and sample people and then those guys are asked to opt in and share their social feeds so it can be Twitter or something educational and then you kind of in some sense curve their sampling and representation issue in some way and still get enough naturalistic data to make inferences with fair amount of confidence.
>>: Have there been personality, like Myers Briggs [inaudible]?
>> Munmun De Choudhury: There are budge of personality studies --
>>: -- extroverts, introverts like to get on Twitter I suppose. I never post on Twitter, but I don't know how to generalize from that.
>> Munmun De Choudhury: No, I think there is a, there has been a bunch of personality work around social media use. I think find all kinds of all big five dimension of people there where certain personality dimension has a characteristic behavior. Like I think people who are extrovert tend to like more posts or comment more on Facebook, stuff like that.
>>: Might it be interesting to do some sort of a match study of Twitter users with similar
non-Twitter users? Because for instance, reported that active Twitter users tend to be more positive, low activity Twitter users tend to be more negative. Can you assume that non-
Twitter users are incredibly depressed, or --
>>: [laughter]
>>: Like maybe we've got a [inaudible].
>> Munmun De Choudhury: That's right. Yeah. That's a great point. I think this year in
CSCW [phonetic] there was a paper on -- these are the folks from Michigan who looked at
Facebook users and non-Facebook users, so people are certainly looking at those kind of things, though not quite from the affect angle. Yeah.
>>: I know for the one of the Twitter studies you used the hashtag to [inaudible]. Have you looked at whether or not you can get some of that just from the natural length of processing?
>> Munmun De Choudhury: So that's what the classifier does. So it trains on those hashtag postings and generalizes it. So now if you input any arbitrary Twitter post, it would tell you the distribution or --
>>: Portion of that.
>> Munmun De Choudhury: Yeah.
>>: So coming back to Gina's point, again if you wanted to get a more broader sample plying into something like email --
>> Munmun De Choudhury: Uh-huh.
>>: Have you thought of that at all?
>> Munmun De Choudhury: Yeah I have. I mean, practically making that happen, unless people opt to share their email feeds, that may may be little too sensitive, I guess. And there are other kind of streams of data that can be combined for sure, like search or web browsing behavior. And I mean pretty much a lot most people use a search engine that has pretty good penetration in the population. And then kind of using the social feeds in conjunction wherever available in order to improve the predictions and the forecastings. Yeah.
>>: So in the offline world people who are more positive tend to be more popular. Did you find the same thing from your Twitter, like, where people used more positive emotions in their tweets did they have larger setting of followers?
>> Munmun De Choudhury: They did, but then I think the challenge is to draw direction of causality in any -- it's purely correlational. I mean, because this is observational data it's hard to say what causes what. But there is certainly a correlation. I didn't show that here, but it's in the paper. Okay. Thank you.
>>: [applause]