>> Andres Monroy-Hernandez: Welcome, everyone. Today we have Ryan Acton. He is a computational sociologist and assistant professor in the Department of Sociology at UMass Amherst. And today he is going to be talking about digital traces in online spaces. So, thanks. >> Ryan Acton: Great. Thank you. All right, so I have grossly over-prepared for this so I have way too much material. I'm not going to be able to get through even half of it, so I apologize for that. To give you a little bit of a background about who I am: I started in college as a psych and soc double-major -- I actually started as a landscape architecture major but that didn't last really long -- at Penn State. And then at UC Irvine I got my Master's in Demographic Analysis and then my Ph.D. in Sociology with Emma under Carter Butts. And then, in 2010 at UMass Amherst I was hired as an assistant professor in sociology and also invited to join the newly formed Computational Social Science Initiative, CSSI. And then also, sort of unrelated, but this year I also joined as member of the board of directors of a non-profit organization called Help Our Kids based in Springfield, Massachusetts, and I server as director of web and social media. And that's sort of a voluntary position I hold there. Okay, so a little bit about my research interests. Broadly speaking you can sort of break it down into three categories: first and foremost is social network analysis; -- That's what I was classically trained in so that's sort of the emphasis of what I pay attention to -- a little bit more generally though, computational social science; and then thirdly, development on analysis of data on the R platform. Now more specifically I delight in problems that allow me to work with processing large-scale data from web-based sources. And then in terms of the more substantive side of my work, I've tended to revisit classical social theories with the social behavioral data obtained from computer mediated contexts. And I'll share with you some examples of that. And then, on top of that I also have training in demographic methods, GIS, and then survey research and interviewing methods. Now in terms of my approach to research, I wanted to share with you a few things that I find very important and that I tend to pay attention to. So to me online social interaction is particularly fascinating because of the digital record of interaction that it leaves behind, these so-called digital traces that is in the title of my talk. A lot of us have come to think about this as a high tech form of archival research with the Internet being this giant archive holding all of these digital traces. And of course the ability harvest these data allows us the possibility for high statistical power. And one of the benefits it has compared to sort of off-line, old school archival research is a minimization of the human coding errors which is a big thing to be cognizant of when working with a lot of data. And then, certainly these online social interaction data allow us to answer and even ask questions that are not possible with classical forms of data. So I often, because this realm and these kinds of data, find myself asking is it possible to measure the whole population and not just the sample? And sometimes quite encouragingly yes is the answer. And is also possible to automate or semi-automate the collection of those data? And if yes, that's quite appealing to me because we can do really cool things with it. And coming out of my dissertation research I learned that if the tool doesn't exist that I need, go ahead and create it, which I'm sure many of you are familiar with that line of thinking as well. Now in terms of the ongoing research projects sort of on my plate, you can break these down into sort of three different categories. So I started my training with network analysis in the context of studying disasters. First and foremost I studied networks of collaboration among organizations in response to Hurricane Katrina. And then, I was also part of the very beginning of the formation what's called Project Heroic studying information dissemination on Twitter. I helped build the data collection engine for that project. But sort of foremost my work is about measuring social networks online, and so my dissertation is about methods for web-based data collection. I've developed some software to aid with this; that came out of the dissertation. And then, I've studied some sort of specific cases. One of them is studying how groups change in size on last.fm, and I'll share with you a little bit of that work in addition to the scrapeR software that I developed. Most recently with a Ph.D. student at UMass studying the entailment structure of farm attributes among New England farms; I have slides on that in this talk but I doubt we'll be able to get to them. So I apologize but I'm happy to sort of give you sneak peeks if needed. And then, another sort of third line that my work is going, and I mentioned this a little bit earlier, is revisiting classical theories. So in one case I revisit Balance Theory in the case of social behavior on Epinions.com, and I'll share with some of that work. And then some work I've done with Emma and our advisor Carter Butts on extending this notion of brokerage, a very classic notion in social network analysis to the dynamic case. Okay, so that's sort of me and my background and now I want to get into the main point of the talk. Okay, so these digital traces are these things that I find quite fascinating especially trying to capture them, trying to organize them, store them and then analyze them. And I thought this was very cute. On a blog that I frequently read, FlowingData, there was this post back in September: How you know you're in an upscale with these three different grades of gasoline. And the cute comment beneath says, "Analogue traces or as they're more commonly known, dirt." And this is sort of the offline analogue to what I am interested in, in the online world. So there are plenty sort of offline analogue types of traces that can be very messy to study and analyze. It's quite possible to do that but very messy. I like to bring it into the online world and study these kinds of markers and traces left behind by people in their behavior. And so to do that I developed a software package for the R Platform called scrapeR, tools for scraping data from web-based documents. And it helps to bring straightforward web-scraping capabilities to R. You can read the specifics about what it does. But it particularly is useful when trying to automate the collection of these kinds of data. And the nice thing is that because R is excellent for data organization and analysis and scrapeR brings these data directly into R, it's sort of a one-stop solution; you can do all of this stuff in R. A lot of people like to do Python to do web-scraping stuff. This brings the capabilities into R. So scrapeR in action, to sort of show you what it does in a very simple toy case: Here's a website from CISA, the Community Involved in Sustaining Agriculture. This is a consortion of farms and vendors in Southern New England. A nice website databases showing you all the different farms and farm stands and farmers' markets. You type in a zip code and it'll show you what's in your area. So here are the farm stands in Western Massachusetts. There's Amherst and all here's all the different farm stands in the area. So wouldn't it be nice to be able to scrape these data out of the page and get them directly into ours? So let's say for example you're interested in getting the geographic lat., long. coordinates of these locations. Well of course sitting behind every website is the source code. And what I did is I just pulled up the source code for that page and started highlighting the block of code where that geographic information is sitting. And of course you could pull this out manually if you wanted to; it would take a while. But using scrapeR, so here I am in R with a few lines of code and some data cleaning and stuff to get it cleaned up, I'm able to pull in the page, pull out the lat., long. coordinates and here I have them sitting over here. And then, I'm free to do with them what I want. So this is just one very specific example, but you can imagine using this for all kinds of applications where there is information sitting on some kind of XML or HTML document that you'd like to be able to pull directly in for analysis. That's one very simple example. So some other potential applications for scrapeR might include extracting news headlines for a content analysis, tracking the link structure of a website for a network analysis, perhaps retrieving weather data for some kind of climate analysis. Where I think the real benefit comes is when you're able to automate this scraping so that you can repeat it over some interval. For example, if you wanted to retrieve weather data every 24 hours or you wanted to capture news headlines every hour, scrapeR can help you do this. This sort of direction of where scrapeR can be handy is what I found it most useful for in my work. So I've prepared to present to you three different research applications. I'm probably only going to get through one and a half here. The first one is one of the chapters out of my dissertation in which I studied group size dynamics at last.fm. And scrapeR was born directly out of this project. So there is very classical questions in sociology particularly in terms of groups, group identities, group formation, and a lot of times for decades sociologists have been asking what influences the size of groups because group size influences so many other things in terms of social behavior. But what is it that predicts the size of groups? And how is it that groups are changing in their size over time? So these are things that sociologists have been thinking about for a long time. And in the social network community there's been a lot of work to show that social networks can drive changes in group size such that existing members can recruit and attract new members. And this has been demonstrated in numerous cases. This hopefully is not of surprise to anybody. So this aspect of trying to examine whether or not we're seeing social network effect in terms of driving the changes in group size was appealing to me. So in terms of trying to find an appropriate case study to do this, I wanted to be able to find a context in which the cost of joining and leaving groups was minimal. It reduces people's barriers to forming groups. I also wanted to find the case where there were opportunities for network ties to form and be utilized making people aware of the actions and behaviors of others. And it would've been nice -- It turns out it was nice. But, I was looking for it to be nice to be able to have some comparison cases like cross-national comparisons. And so in my search from data to try to test some of these classical predictors, I came across last.fm. So perhaps you have heard of it. It's an online music discovery community found in 2002. At the time of data collection there were over 21 million users in over 200 countries. And among the things that last.fm lets you do is it lets you listen to Internet radio and gives you music recommendations. It lets you maintain a personal online profile which contains information about your listening history, your friendship links and so forth. And it also allows musicians who are holding events to advertise them on last.fm and let other people indicate whether or not they're going to these events. And these events were where I ended up focusing all of my attention in this specific paper. So this is what the front page of last.fm looks like. You can search for any kind of musician, album or track and it'll give you all kinds of cool recommendations. Tons of things you can do with this website, but I was focused on this part of the website devoted to listing the events that are going on in last.fm. So if you go to the events listing page -- And I didn't do anything fancy. I just went to the Events page and it showed me, at least as of the other day, which events were happening most immediately around the country. And you can narrow it down to specific areas and so forth. So this is the first of at least 500 pages of event listings on this website. And within a given event page, like if you were to click on The Flaming Lips and Tame Impala, you see a page that looks like this telling you information about when the event is, where it is, how to get information about tickets, price of tickets. This information was not available when I did this a few years ago, at least this part here. This would've been kind of cool to have. So all kinds of metadata about the event but the part that was really cool to me was that last.fm users can indicate their attendance declarations by indicating I'm going or I'm interested. So at the time of me at least looking at this website there were 44 last.fm-er's indicating that they were going, and you can also see the listing of who was interested somewhere else. So this forms the foundation of how I was able to collect or at least what my sampling frame to collect data for this problem was. What I ended up doing was -- Oh I already explained all of this to you. What events are. Yadda, yadda, yadda. I already said this. I saw this as an opportunity to examine all of the event listings on the last.fm page and look at how people's declared attendance at these events changes over time. This is how I'm measuring this process of group size formation, group size dynamics over time, by looking at attendance to these last.fm events. And going back to the sort of theoretical directions here: there are three potential processes influencing these dynamics. So I already mentioned to you this network recruitment process, that the reason why events grow in size over time is doing network recruitment process. And this has been demonstrated in numerous other avenues. But another way that events may change in size over time is that some events just might be differentially attractive than others, so events for very popular musicians might have an easier time attracting people than lesser known musicians. That has nothing to do with the network dynamics necessarily. And another thing that came to mind was differences in group size might be attributed to how long they've been around. And by "been around" I mean how long has their event been posted on last.fm, and I'll give some more background into what that means in a minute. So I'm thinking here are three viable theories that might predict why and how groups change in size over time or perhaps it's some combination of these processes. So the networks process: I've already explained this a few times. Here's one group that has two members and here's the friends of the members. And here's Group B that has four members and all of the friends of the members of Group B. So this is sort of just to illustrate the idea that a larger group has potentially a more outreach into a network to recruit people than a smaller group with fewer network contacts to pull in. So we should see a proportional rate of growth or at least something faster than a constant rate of growth as a function of group size. So the prediction for the network process would be that the rate of growth for a given group will increase as the group size increases. Yes? >>: You're saying that's nonlinear then? >> Ryan Acton: Yes. >>: So, four people versus two will be more than twice as much in terms of growth? >> Ryan Acton: Yes. Yes. Very simplistically, yeah. So nonlinear. The other predictor would be this differential attractiveness process, that some groups are just more attractive than others. So if this is what's driving fluctuations in group size, we would see differential attractiveness for groups for a given time interval and that would be the prediction. And then, the third process I explained is this longevity process. You can sort of think of it as like a rain bucket metaphor that the longer that a group is there the more time it has to collect people, and so the differences in group size should be dictated solely by exposure time, how long it has been around. Three very simple, basic theories to predict how groups are changing size. I can imagine there are plenty of others out there. These are just three very simple ones that I chose to look at. Yes? >>: On last.fm can you see when your friend just indicates that she's going to this event? >> Ryan Acton: Yes. >>: So you get that feed? >> Ryan Acton: Because you're getting your friends' activity feed, you're seeing when they're indicating going to an event. >>: And in your data can you see that? Like who has joined first? Who is doing the... >> Ryan Acton: Yes, I have the timing of this as well. >>: Okay. >> Ryan Acton: Yes, exactly. So I'm tracking -- And I'll get to that in one second. But I am tracking when people are signing up for these events. So perfect timing. What I did was I obtained a paired comparison between events going on in the United States and those going on Germany on last.fm. On the US my data collection started in late October and ended in early January of 2009. There were 69 days of data collection in which I was able to observe nearly 3500 events and about 40,000 people declaring going to those events. And then, in Germany I had a smaller data collection window. And you can see how many events and attendees I was tracking in that time. It's kind off a little bit, but this is a large-scale data collection effort because what I ended up doing was checking the website every 24 hours for how things have changed. And you can see my basic algorithm here, but what I was doing is that every 24 hours around 2:00 AM Pacific Time, if I'm discovering an event for the first time I grab metadata for it. I record all the yes's and maybe declarations of users. Otherwise, if it's an event that I've already been tracking, I just basically update and see how it has changed since the last time I saw it. And it's important that I'm doing this every 24 hours so that I'm able to get this longitudinal time series aspect of the data. I should also say that I did get permission from last.fm to get these data through web scraping. And so you can think of an event as having a kind of life cycle. This is sort of the demographer in me where an event debuts, and I can always capture an event debut within a 24-hour period because I'm going to be getting to it within 24 hours. The event will evolve in size over time. People can join or leave. And then, the event reaches its scheduled date which I'm calling its expiration date. And events acquire their size in one of two ways, they have debut size and then over time their size is fluctuating through some kind of dynamic process. Yes? >>: Who creates the events? Is it automatically created based on some other calendar? >> Ryan Acton: That is a good question. I am not sure. I imagine people are saying, "I have an event coming up," and you have to fill out a form. That's a good question. I don't know the answer to that. So here are all of the events that I was tracking in that window for the United States. So you can see -- This is my very first day of data collection and so there were a lot of events already on the website. And so captured them sort of midway through and was able to track them until their end. But there were several events that I was able to capture for the very first time. And you can see any of these white dots, since the first day data collection, are events that I was able to detect sort of in their infancy and then track them as they fluctuated in size over time. And then, my data collection had to stop in January. >>: And the red dots are when they happened? >> Ryan Acton: And the red dots are when the events happened, yes. And it seemed that I was able to keep tracking them for one or two days after and then, they were removed from the website. So that's why you see these little tails on these. That's when the event happens and then it's available for another day or two and then it's gone, which is why I had to sort of do this every 24 hours because things get pulled from the website very quickly. That's why I'm like checking this so frequently. Oh, so this was the United States. You get a sense of the distribution of event sizes. This is Germany. It looks quite differently but it's also on a very different scale. So I was able to track at least one event that was over 1,000 people whereas in the US, the largest event was around 250. Interesting bit of an outlier. I never really followed through to see what was going on with that. But we're looking at the rest of the events are much closer to what's going on in the United States, much fewer of them because I was tracking Germany in a smaller time interval. In terms of the differences between how big an event is at debut and how big events are at their expiration date, here's what it looks like in the United States. So here's the distribution of event sizes at debut. And I found this interesting pattern that there were no events smaller than, what is that, five or six at debut in the United States. I never followed up with that but I suspect they have a minimum threshold of how many people can be saying they're going before it can be posted to the website. That's my hunch because I found the same pattern in Germany. But you can see the distribution of event sizes when they start and when they day, so that the growth that happened in between there is what I'm interested in and then trying to explain. And here it is in Germany: debut size distribution and final size distribution. So it looks quite similar. Okay so what I'm going to do is I'm, for the sake of the limited time that I have, going to walk through evaluating the evidence for this networks recruitment process that I was so eager to try to find in this data. And you'll see where I'm going with this but I'm going to focus on the first predictor I had: the rate of growth for a given group will increase as the group's size increases, this network recruitment effect. So what I did was I tried looking at this numerous was but the easiest way to convey this is that if I compare an event size yesterday to today, if the point shows up somewhere along the diagonal, that indicates no change. If it shows up in the lower triangle, that indicates that the event decreased in size in that time interval. And then if it shows up somewhere in the upper triangle, that indicates that the event grew in size in that interval. So this networks recruitment process predication would expect the points to sort of follow some kind of increasing pattern like this, some kind of geometric growth pattern over time. That's the generic prediction. Here's what the data looked like. Whoa. Sorry. Okay, United States on the left. This is with a one-day lag. So what I did was for every event day I compared its size from the day before to the day after and did the same thing in Germany. There were 66-some thousand event-days in the US and similar amount in Germany. And I looked at it with a one-day lag, I looked at it with a seven-day lag, and I looked at it with a 14-day lag, and it looks virtually identical across the board. And this was a bit upsetting to me because I was hoping to see much of the point mass in the upper triangle in both of these countries, and I was not seeing it. This is suggesting that events are not growing very much and all of the activity is effectively hovering right over this line of no change. So in any one given day an event may grow by a little bit or an event may decrease by a little bit, but we're not seeing this increasing rate of growth as groups get bigger in either country. >>: When you say events are you talking about the real events that people are attending? >> Ryan Acton: Great question. So, yes, ultimately that's what these events being advertised on last.fm are. Now I'm not realistically measuring those event sizes. I'm not there at the event counting the attendants in the arena. I'm looking simply at this community of people on last.fm declaring their interest in going to these events. So, you know, we have to sort abstract from the idea of being at a Bon Jovi concert to this community of last.fm people who say they want to go to the Bon Jovi concert is what I'm effectively measuring here. So great point to bring up. Yes? >>: The reason I brought it up is when there is a limit to the number of people who can attend events, right, because they're real bodies attending events. And so, I mean, I would almost have predicted something like this [inaudible] inhibition. Like the more people get involved in other events, the more they're not attending this one because they can only go to one. >> Ryan Acton: There are so many events they can go to. Yeah. Given, though, that there's this networked community aspect on last.fm that you can see when your friends are saying that they're going to these events, I was still hopeful to see some of these network effects. That if I have 35 last.fm friends and they all saw that I just indicated I was going to this event, you would think from this network recruitment perspective that would have helped draw more people in. And for an event that's smaller, you have fewer chances of doing that. But for a much larger event, you have many more chance of these network recruitment effects showing up. So at least from the perspective of this prediction that's what you would expect to see, and we're not seeing it. Did you have a question, Scott? >>: That's fine. Go ahead. >> Ryan Acton: Okay. So I cut out a ton of things because for -- After comparing different lags between one day, seven days, fourteen days and really finding -- I mean, I'm going to say this is not a network recruitment effect, so finding no support for this. What I ended up finding support for were the other two predictions: the longevity prediction that the longer the group is available on this website, the more opportunities it has to bring people in, and the differential attractiveness that intrinsically more attractive groups or popular groups have a better time of being bigger in terms of their attendance size than these smaller groups. So I did not find support for the network prediction and I did find support for the other two predictions. So this was sort of interesting even if it's sort of a non-finding for the network side of things because -- This in and of itself is an interesting finding because studying group size here allowed me to examine network effects without actually explicitly examining the social networks. All I was doing was tracking the counts of the yes's and the maybe's or the yes's and I'm interested's for each event every day for 60-some days. That's all I had to do for this analysis. Now what I could have done is actually looked at the who's tied to whom, how large these networks are and the potential for these recruitment events but I didn't have to, to test this theory. I didn't have to actually construct the network of who's friends with whom and who's going to which event to do this which is quite nice; that would've taken quite a lot more effort to do. So their goal here was evaluate whether this finding holds for other online group data. This is an interesting case because, again, I'm studying just this microcosm of this online world associated with these events, does this hold in other kind of eventbased websites? So a natural next step for this that I had begun doing -- I haven't really done with it in the last couple of years -- is to look at Meetup.com which is much more explicitly about getting people in the online world to do stuff offline, meeting up in person. And Meetup has a similar kind of structure where people can indicate their attendance at these events, data that can also be scraped. I haven't moved forward with that yet. Yes? >>: Did you [inaudible] rationalize the attractiveness? >> Ryan Acton: The attractiveness prediction was... >>: I thought you measured it. >> Ryan Acton: Oh, yeah. I have that in my backup slides. I can get to that at the end. Yeah. I was looking at the variance in the growth of the groups in given time intervals but I have the actual measurement at the end. So I want to move on to my Epinions example here. So what we just saw was -- Oh, yes? Yes? >>: So how did you measure the popularity of an event? >> Ryan Acton: The attractiveness? >>: Right. >> Ryan Acton: Right. So that's the same question that Andres just asked. I can show you; it's at the very end of my presentation in my back up slides. But let me go through this and I can get back to that. So remind me if I don't do that. Okay, application two of the three that I have in here is re-examining Balance Theory in Epinions.com. Now Balance Theory has been a controversial theory in numerous fields for a long time. It all began with Fritz Heider the psychologist in the 1940's who was inspired by the work of the philosopher Benedict de Spinoza. Heider developed Balance Theory, and some of you are likely familiar with Balance Theory as a theory about cognitive and social dynamics, and more specifically looking at the positive and negative relationships that exist between two or three entities. Entities being people or things. And in a sort of nutshell the idea is that like-signed relations in dyads are considered to be balanced; otherwise, they're imbalanced. So two people that like each other: that's a balanced relationship. One person who likes the other and the other person not liking the other is considered an imbalanced relationship. That's at the dyadic level. In triads there are these sort of four very famous sayings like, "A friend of a friend should be a friend." People have likely heard of these before. "An enemy of an enemy should be a friend." These are sort of the predictions of what the signs would look like in balanced configurations and triads. And in interestingly enough, Heider only ever looks at transitive triads. He says cycles are not of interest for numerous reasons in terms of cognitive perception. It's all about transitive triads: something important to keep in mind. And the fundamental key for Heider is that balance or imbalance is to be understood from each individual's perspective in the social structure. And this is where I think a lot of people who work with balance get it wrong. So the idea is, is that if you've got a triad, you have to understand how each individual, for example P or O or X, might be perceiving the configuration of relationships in that triad and that there's not just one master state of balance or imbalance for the triad, which is what Structural Balance Theory is all about. Structural Balance Theory was the natural extension about ten years after Heider wrote his initial piece by Cartwright and Harary in which they framed Heider's ideas as signed graphs in which they were able to leverage graph theory to evaluate the products of the signs. And then, they sort of were able to reduce it down to the very simple notion that positive cycles can said to be balanced; otherwise, they're imbalanced. So already we are moving into the world of reducing the network into cycles, and Heider is quite adamantly opposed to looking at cycles. He thinks cycles are of no interested with respect to balance. >>: Can you define cycles? >> Ryan Acton: Sure. A cycle is: I like Andres, Andres likes you, you like me. It's sort of the cycle -- It's moving in this cyclic direction. As opposed to transitivity: I like Andres, Andres like Emma, I like Emma. So it's not going in a cycle; it's going that way and this way. I'm sorry. I usually have slides on this up here and I left them out for this talk. Yeah, these are terms that we use a lot in networks. So for Heider this cycle-notion of the flowing of the direction of liking is not interesting. It's this transitivity. If Emma is the common target of our liking, Andres likes her, I like Andres so I'll like Emma is the logic that Heider was talking about. So in the world of Structural Balance Theory balance is a property of the social structure and it's not of the individual perceptions, very different take compared to what Heider was talking about. This is mathematically and computationally appealing because if we can treat balance as just a system of signed graphs then we have all kinds of methods for relatively easily computing these balance properties on the network. But this gets us away from the intuition that Heider was trying to get us to talk about, and in fact because it's mathematically and computationally appealing, at least this approach to balance, a majority of sociological and computational social science has started with this Structural Balance Theory as the direction to go and ignored effectively what Heider was saying. So while Structure Balance Theory was inspired by what Heider came up with -- these ideas that people have differential perceptions of what's going on in the network and that determines how they feel about whether their social configuration is balanced or imbalanced -- Structure Balance Theory takes individuals' perceptions out of the picture completely. >>: To me it's like the fundamental difference between the social psychologist and sociologist, recognizing the [inaudible] nature... >>: Yeah. Reducing it strictly to structure, and people have no input any more. That's exactly what happens here. Yep. So as a sociologist and as a social network person, I should really be thrilled with this. I should be on board with this 100 percent. And I'm not because I don't think this is how it works. So the structural balanced theorists are looking at the network as a bird's eye view. It's not from an individual point of view; it's from the bird's eye view. I'm not going to go through all of this, but for Heiderian Balance Theory empirically there is mixed support. There's lots of different stuff that's been done, mixed support. For Structural Balance Theory: mixed support. Depends on the context. There's been a lot of theoretical work done. There's been a lot of simulation-based work done. There are people who outright critique the integrity of Balance Theory thinking it's just trash. And then, there are other people who point to social mechanisms other than balance that can predict the same outcomes. So there is a very colorful background of work in this line of work, but very few people have systematically examined the effects of heterogenous perception in the network. So taking the Heiderian approach of factoring in individual's perceptions seems to have all been forgotten in the last several decades. And what really got me fired up about this was the paper that came out in 2010, Jure Leskovec and colleagues, in which they studied balance on Epinions.com from the Structural Balance Perspective in which they leave individual perceptions out. And I feel like they missed the most important piece of what I'm about to show. So basically they test Balance Theory on the Epinions data. They find support for it. Good job. Let's move on. And I argue that, okay, but you haven't really studied the most important part here. So this motivated me to bring the Heiderian perspective to the study of balance on online social networks. As many of you know many of online social networks allow people to tag others as friends. It's quite less common for you to be able to tag people as foes or enemies. Very few social networks allow you to do this. Slashdot is a very longstanding network that has allowed you to do this for a long time. Epinions happens to be another one. And there some others out there, but for Facebook, for example, there is no outright way to say, "This person, I just don't like them." You can block them which maybe is sort of what that means. But there's no easy equivalent to the friend relation. So this led me to Epinions.com data for this reason that you have the ability to sort of friend and foe people on Epinions.com. Epinions.com is a product review website started in the late nineties and is currently owned by Ebay. It's free to join. You can browse other people's reviews of products in your effort to search for a new product. It helps people in their product purchasing decisions. And there are three main types of relations between Epinions users that I'm using in this analysis. So this is actually a departure from the theme of the talk because I actually didn't use my scrapeR package to get these data. These are the same data that Jure Leskovec and colleagues used in the paper I just pointed out. I obtained them from Trustlet.org, a great repository for trust-based data. And the data span the inception of Epinions.com in the late nineties through the 12th of August 2003. This contains all of the relational activity among these people in that time span. And it's freely available to download. Anyone can play with those data. What is so cool about these data and why I'm able to bring a Heiderian perspective to balance with these data is because the negative relation in Epinions, the sort of foe relation, is censored from their recipients. So in other words, they call it trust and distrust on Epinions.com. You can either a fellow Epinions.com user or distrust them. When I distrust you, you don't know about it. Only I know about who I trust and distrust. So to be the recipient of distrust on Epinions.com means that you are censored from knowing that information. So from your perspective all you know from the website is that there is no incoming tie from that person. There is obviously no trust tie coming from them because they don't trust you, but there's no indication that they distrust you. So from your point of view, you do not know that they distrust you. From the bird's eye point of view we have perfect knowledge of who trusts and distrusts whom, but the recipients of distrust do not know this. So this sets up a nice case to test differential percepts of trust in the network on Epinions.com. The three primary types of social relations that I'm able to look at are article authorship, trust-distrust relation and article evaluation. So I have these sort of cute little cartoons to explain this. So on Epinions.com you can write product review articles, so that's an example of the authorship relation. Person O wrote the product review Article X. That's one kind of relation on the website, and that's how many of them there are in these data. The trust and distrust relationship: you can either trust or you can distrust or you can choose not to have either with a person. So here are all the ways that people can be dyadically connected through trust and distrust, going from two people who have nothing to say about each other to two people who distrust each other; keeping in mind that this person knows that they distrust this person but this person is not aware of it, and this person knows that they distrust this person and this person is unaware of it. And everything in between. Yes? >>: Question about your data. So you said that users don't know about this distrust relationship, but you know. >> Ryan Acton: Yes. >>: But is it anonymous? I mean if they had access to distrust data then they will have found out who... >> Ryan Acton: The data were anonymous in the sense that user names were stripped out and given new ID numbers. So I don't think they could reconstruct it. So I don't know who these people are specifically; I just know them by some generic ID number. >>: Okay. >> Ryan Acton: Yep, exactly. So that's the trust-distrust relation. And then, finally is the evaluation relation. So I already pointed out that people can be the authors of product review articles and third-parties can evaluate someone else's product review article in terms of liking it or disliking it effectively. So with this web of these three kinds of relations, I'm able to examine different kinds of cool ways that balance forms or imbalance forms in this network. And what I'm doing here specifically is modeling the formation of new trust and distrust ties that either closed dyads, transitive triads or three cycles. Remember Heider says cycles are not of interest, but Structure Balance theorists say they are so I threw them in here because some people at least think they're important. And what I'm doing is evaluating the effect that each new trust or distrust tie has on the states of balance or imbalance in these configurations. So when I form a new tie to somebody in the network, be it trust or distrust, does that then create a balanced situation for me? An imbalanced situation for me? Or did that do nothing to change the situation? So by my adding new ties to the network, is that upping or decreasing my balance and imbalance? >>: From ego. >> Ryan Acton: From ego's perspective, correct. Yeah, because Heider that's all that matters. It's not a structural thing any more; it's just want the individual people are perceiving. Okay, so what I used here was a multinomial choice modeling framework which has effectively a logistic regressing interpretation. But what I'm looking at is what is the probability that the Choice C of signed tie -- Meaning I have the choice of either trusting or distrusting or doing nothing with you. What is the probability that the choice of the signed tie is the one that's actually observed? So you're sort of faced with a new situation. As I form a tie with you, I could've done numerous things. What's the thing that I ultimately picked and what effect is that having then on the states of balance and imbalance in my local triadic and dyadic configurations? So basically, am I behaving in a way predicted by Balance Theory? Am I creating the kind of tie that's needed to generally increase my balance levels or am I creating ties that go completely against what Balance Theory would predict? The full model results are here, but what I'm basically going to do is zoom in on the full model to share with you the interesting outcomes from here. So let me just quickly explain this table and then I'll wrap up very shortly here. So this is the term for me having created a balanced dyad. By virtue of a tie that I just created, that created a balanced dyad for me. According the Balance Theory, the expected coefficient in the model should be positive. We should see an increased likelihood that people will create balanced dyads and will shy away from creating imbalanced dyads. What do I find? I find support for the creation of balance dyads and the term for the avoidance of imbalance dyads is non-significant. Meaning, some people choose to shy away from forming imbalanced dyads; some people choose to do nothing. And this is sort of saying that both strategies are likely happening here. When it comes to creating that closing tie in transitive triads, there are different ways this can happen. So, I can add a positive tie which will create a balanced triad. I can add a negative tie to create a balanced triad. I can add a positive tie to create an imbalanced triad. And I can add a negative tie to create an imbalanced triad. But the thing to look at is am I seeing effects for these and am I not seeing effects for these. Once again we find support for the creation of balanced triads. Here we see no significant tendency for people to stay away from this scenario, and here we do find support for using a negative tie to create an imbalanced triad. And finally this one is a little bit puzzling. Remember Heider says cycles don't matter. Structural Balance Theory says cycles are where it's at. So will my addition of a tie create a balanced cycle or will my addition of a tie create an imbalanced cycle? This should be, according to Structural Balance Theory, this should be preferred and this should be avoided. And I find the opposite. Yes. >>: Are you going to get into why you think those results there? >> Ryan Acton: Yeah. >>: Okay. >> Ryan Acton: Yes. Well, I can say it now but I think I have it on the next slide here. >>: You're saying you're avoiding imbalance? So you actually remove a tie? Type 1 and Type 2 imbalance, I'm just... >> Ryan Acton: So I'm looking at two different kinds of things that can be done. Does the addition of... >>: Adding and removing a tie, is that right? >> Ryan Acton: You're just adding a tie. You're not taking anything away. I'm jus watching the creation of new ties in the network and seeing, are you adding a positive tie or a negative tie and is that creating a balanced or an imbalanced triad? Because you can imagine... >>: Tie meaning I have tie where I don't like them? Or [inaudible]? >> Ryan Acton: So positive tie means I'm establishing a trust tie with you and does that make the triad I'm involved in, does that now become a balanced triad? Or it's also possible for me to add a positive tie to create an imbalanced triad because there was already a negative tie somewhere in the triad versus adding a negative tie to create a balanced triad or adding a negative tie to create an imbalanced triad. >>: I see. >> Ryan Acton: So it all hinges on what I'm about to do. What is the tie that I'm about to create and what is that doing for the states of balance? This one is interesting because Heider says this doesn't matter. Structural Balance theorists say this is what should happen. I find the opposite. That leads me to suspect that cycles are not accurately capturing what's going on with balance because cycles -- Do I have it on this slide? Let me see. Let me go through what I have already on the slide. I generally cautiously lend support for Heiderian Balance Theory over the predictions given by Structural Balance theory. So we do see significant tendency to create balanced dyads and transitive triads. To establish balance is cheap and it's desirable. To establish imbalanced, though, is expensive. It can induce cognitive dissonance. There could be potential social backlash for engaging in the creation of imbalanced triads. And because of some of those nonsignificant findings, it would appear that doing nothing might be a passive aggressive way of avoiding the creating of imbalance because it's a lot less uncomfortable for you to just sort of step away from it than to actually actively go ahead and create that imbalanced tie. So I don't have on here what I think is going on with cycles. So the gist is I'm arguing that this lends support for Heider's argument. It sort of doesn't really do anything because Heider doesn't even think this is important. But this is the opposite of what Structural Balance theorists say is happening. And Heider says in cycles it doesn't matter. Structural Balance theorists say cycles are where you capture it. Cycles are problematic because cycles -- Let's see if I can get this right. I have no control over the balance in the cycle because all I'm doing is creating the -- Let's see here. Andres like Emma. Emma likes me. And now it's up to me to create that closing tie in the cycle to determine balance. But from my perspective if Andres doesn't like me, I don't know about that. Remember from the censoring in the ties? So I don't even know that that cycle exists. I don't even know that balance or imbalance is even possible because I can't even see the incoming negative tie to know what the right move is to make to close that cycle. So from Heider's argument this is pointless because it's about how I perceive the network. Whereas from a bird's eye perspective this is how you could capture it but you're taking the individuals out of the picture. So this is sort of lending support to suggest that there is really kind of fuzzy business going on here. We don't really even believe that people are behaving from a cyclic perspective. >>: I think [inaudible] on the cycle. >> Ryan Acton: Yeah, they found when they don't factor in individual's perceptions, they find support for these -- they find the predicted result. But when you factor in how people are actually likely to see what's going on, I see the opposite. So the third example I don't have time to get to -- Very cool application and I'm happy to share it with you all later, but at this point what I'll do is I'll open it up to questions if there is any time. >>: So the study that you had mentioned, was it on Epinions or on a different... >> Ryan Acton: Same exact data set. >>: Same exact data set. This one. Yep. But it's not surprising because they are coming at it from a Structural Balance perspective. They're not thinking about individual cognition to the network. And to be fair, neither am I. I am not actually asking these people to share with me how they are perceiving the network. So to be fair, I'm violating that as well. What am I doing, though, is modeling this with realistic assumptions about how people might be perceiving these ties in a way that they're not even attempting in this paper. So I agree with their findings to the extent that they're using a Structural Balance theoretic approach. My argument is, is that's not the approach to take. You cannot accurately capture balance when you're not factoring in differential perceptions from individuals in the network. So that's sort of the beef I have with this line of work. >>: You could do two models together and try to like explain the variants that it explains. Do you have a sense for -- Against each other...? >> Ryan Acton: Oh, against each other. I haven't done that. >>: [Inaudible] how is adding that perception... >> Ryan Acton: Right. No, I have not done that yet but that is a great suggestion. Yeah. >>: So the non-significant finding at the top of the slide there slightly goes against the Heiderian... >> Ryan Acton: Yes, it does. Slightly. >>: So maybe -- So I guess I'm just curious why you think it is? But one possibility is, at least in this case, there might be value actually in sort of having a connection to someone with an opposite opinion of you. Like the opposite opinion might actually be informative in some way even if just to be interesting or verify your side of the argument or something along those lines.... >> Ryan Acton: Right. So it's sort of safe to maintain that... >>: Right. >> Ryan Acton: ...connection. >>: Well, there just might be a value there that overrides this notion of balance. >> Ryan Acton: I totally agree with you. Totally agree with you. From the way that I'm modeling in here though, keeping in mind, I cannot know that you disagree with me -- or I cannot know that you distrust me. So I don't even have that knowledge. I might know about it through other means on this website. You may have flamed me somewhere else on the website or said something awful about me. But in terms of my knowing directly whether or not you distrust me, I don't even know that which is kind of fascinating. But we have the luxury of being able to take the bird's eye view approach and see what actually happened, who actually did like and dislike whom on the network to see our people fitting in. Yeah? >>: I have one other question which was kind of implicit, I guess, or I've always thought it implicit in Balance Theory is that there's an equal likelihood of positive and negative connections. >> Ryan Acton: The propensity to form one or the other. >>: I mean you can have balance with positive and negative connections with equal likelihood for being positive and negative connections. Is that true with... >> Ryan Acton: No. >>: ...networking? I'm assuming there is way more positives than negatives. >> Ryan Acton: Yes. So I have determined the model for the number of positive ties. The general propensity to form positive versus negative ties and it's overwhelmingly a strong effect. People much prefer to be nice effectively than to be mean. And there has been support for this finding in numerous contexts. From the pure theoretical point of view, they should be equally likely but in terms of empirical observations people have found this finding in numerous situations. >>: Right. So actually then it almost says even more if people create balance through a negative tie. >> Ryan Acton: Yes. Because it's a much less likely behavior to be engaging in. >>: That's a stronger signal. >> Ryan Acton: Yeah. >>: Yeah. >>: So you said what's the advantage of marking somebody as a... >> Ryan Acton: When you mark somebody as distrusted, they -- Oh, I had the slide for this -- you are less likely to see their product review article in your feed among a few other things. They become less salient to you. Yes? >>: And for positive do you see more often the reviews? >> Ryan Acton: Yeah, it's sort of like following on Twitter. If I follow you now, you're part of my feed. And if I ultimately decide to change that to a distrust, it pushes you out of my salient scope. Yeah. Yeah. >>: So that might explain the non-significance as just the sheer likelihood of that happening is not that high. >> Ryan Acton: Yeah, I think so. Yeah, right because this is a much more favorable and common behavior in the first place and that's capturing most of what's going on. Yeah. Yeah. >>: And what about the non-significance for the Type 1? >> Ryan Acton: Yeah, I mean this is -- Yeah, I'm not exactly sure what to think of this. By my adding a positive tie which we know is a more likely outcome than to add a negative tie, I'm creating an imbalance triad. And this is suggesting that there is really no effect of -- compared to doing nothing, there's really no significant tendency to do one thing or the other. People are just as likely to do nothing than to engage in this behavior. So this might be an avoidance strategy. This I might be, "I know that by liking you I can create an uncomfortable situation, so I'm just not even going to go and create that tie," is what I think might be happening. Whereas knowingly adding a negative tie to create an imbalanced situation, we are seeing the predicted effect here. People are behaving according to the theory. So this I think is sort of a cheap, easy avoidance strategy. >>: But as far as the negative tie, other people can't see that. So it's like, "I'm going to do this but... >> Ryan Acton: But it's a big secret. >>: ...personally [inaudible]..." >> Ryan Acton: Nobody else knows. From my point of view, right, everything works just fine. >>: But the positive one everyone else can see and so maybe people don't want to go there. >> Ryan Acton: Exactly. >>: How do you know people don't know -- How do you know that people are aware that the negative connection is not displayed? Like on Twitter, for example, I can block you and... >> Ryan Acton: Yeah. >>: ...until I unblock, they don't know that the blocking actually is public. >> Ryan Acton: So Epinions builds this into their system. They want to prevent -- Their logic is they want to prevent hurt feelings, so they do not let recipients of dislike... >>: But they show that -- Like when you're about to block somebody does is say, "By the way what you're about to do is not going to be public?" >> Ryan Acton: I don't -- That's a good question. I don't know that. >>: Is there actually -- It is such a strong social norm to not provide negative feedback. >> Ryan Acton: Yes. >>: People will avoid doing it even if they're kind of confident that a person find out; they still won't. >> Ryan Acton: Yeah. Whereas Epinions, this is part of their system. I imagine when you sign up you are informed somewhere in a long policy that your negative actions are not seen by others. Or perhaps when you go to block someone, it lets you know. I'm not an active Epinions user and I've never tried disliking someone to actually know. That's a great question, though, to know what the user is seeing before they do that. Yeah? >>: One thought I had, just it's kind of a curiosity, I wonder if there are certain people who are much more likely to be in imbalanced situations? I don't know. The controversial people, inflamers... >> Ryan Acton: Yeah. And who don't care about... >>: Yeah. >> Ryan Acton: Yeah, I imagine. >>: The incendiary type. Or maybe it's just person type or something. >> Ryan Acton: Yeah, and admittedly I'm not capturing that here. >>: You don't know [inaudible] the users, right? >> Ryan Acton: No, this dataset it strictly the tie formation. >>: [Inaudible] >> Ryan Acton: Nothing else about them, yeah. And from that this is what I'm able to gather. >>: But you can see -- You could actually cluster people or something, right, and observer how balanced they are compared to... >> Ryan Acton: Sure. You could. >>: I mean, right? >>: Yeah. >>: There could be some people that are like totally balanced all the time and then other people, like we were saying, that are like really unbalanced. >> Ryan Acton: Right. >>: Or maybe it's just if you have very few ties, it's easy for you to maintain the model to be balanced. If you have a ton of stuff, maybe you're more likely to be imbalanced.... >> Ryan Acton: Right. And you don't even care at that point because you can't even keep track. >>: Yeah, but that's something maybe you could look at. >> Ryan Acton: Totally. >>: The degree versus... >> Ryan Acton: Does the size... >>: ...the proportion... >> Ryan Acton: ...of one's local network influence it? Good point. >>: That'd be cool. >> Ryan Acton: All right. >>: Do you feel like with either this study or the last that are there any like kind of implications for product changes? Like it sounds like [inaudible]... >> Ryan Acton: Yeah. >>: ...the positive stuff. And the negative -- So like can you draw applications like, "Oh, that might not be a useful feature?" >> Ryan Acton: Yeah. >>: Or, you know, things... >> Ryan Acton: No, it's a good question. I mean especially with the last.fm thing, perhaps when it comes to the possibility of recruiting your friends to these events, if they would like to see that happen -- which I imagine they would, they would like to draw more attention to events -- then perhaps this suggests that they might do a better job of making your friends more aware of what you're doing. That might be one potential intervention. For Epinons.com, I mean, at the end of the day this is just trust and distrust. I don't know how useful of a relation on Epinions this is other than for you to be able to block people out of your scope. I would imagine a more realistic environment to try this out. >>: Probably the [inaudible] important for design [inaudible] should be aware and thinking about people's perception, in particular behavior. >> Ryan Acton: Sure. >>: Versus this bird's eye view [inaudible]. >> Ryan Acton: Right. >>: Did you do any suggestion on the [inaudible]? Oh, if you had ties to these people, if you trusted these people, this would all be balanced. So may I should suggest [inaudible] good people... >> Ryan Acton: And in the event that that somehow increases revenue for Epinions, right, by keeping people in balanced states that might draw revenue better. Whereas people being disgruntled or engaging in disgruntled sort of behavior, yeah. Yeah, and as a non-active user of Epinions, I don't really even care what these people do at the end of the day. But it's fascinating to see that when all I know is just the tie structure and nothing else about them, by and large they're adhering to most of what Heiderian balance has to say. So to the extent that this could be -- this continues -- I mean these data are in the early days of Epinions. Maybe with this site's changes over the years, things have changed. If we were to repeat this and we continued to find this here in another context then perhaps we can use this information to figure out how to optimally pair up people to increase... >>: Another as far as like implications, one thing that would be really interesting to look at is if you were looking at these balanced cycles versus the Heiderian balance systems, they differentially predict engagement within the system. >> Ryan Acton: Sure. >>: And how long do they stay or how much do they tend to grow [inaudible]... >> Ryan Acton: Or people who find themselves overwhelmingly imbalanced might just retreat from the system. Yeah, good point. >>: So then an implication would be to really help the users find people that they think would be balanced -- You know, like you could easily draw like... >> Ryan Acton: Suggestions. >>: ...[inaudible]. >> Ryan Acton: Yeah, these are all good points. Thank you. >>: Can you go back to the question of how you measure attractiveness? >> Ryan Acton: Yes. Go ahead and keyboard. Oh here we go, okay. Oop. So I am testing for homogeneity of the variances of group size for specific observed time intervals for these different groups. So I'm effectively looking at how varied are group sizes as a proxy for this attractiveness measure or how popular these events likely are to be. And again the prediction as for any given time interval, groups should different significantly in their rates of growth. I mean, it's almost a no-brainer prediction but I find support for it; whereas, I don't find support for the network recruitment prediction which was shocking to me. I would've expected to find this anyway, and I did not find the network recruitment prediction to hold. >>: But that doesn't really tell you much about the -- like if you have Jon Bon Jovi versus some randomly old band. >> Ryan Acton: Yeah. >>: And you don't have any other information about this, you cannot... >> Ryan Acton: Right. I don't have any separate metrics of attractiveness other than this. >>: Just through the [inaudible]. >> Ryan Acton: Correct. >>: I was actually surprised by the size of the events and I was wondering if that might have something to do with it? Because the average size was, what, 200 or something? >> Ryan Acton: Yeah, let me... >>: But these are concerts or --? >> Ryan Acton: Concerts, festivals, various kinds of music. >>: So it's really people expressing an interest in an event to show others that they're going.... >> Ryan Acton: Right. This is what I'm doing. >>: [Inaudible] RSVP'ing to the event or something like that. >> Ryan Acton: Right. Yes. >>: Well, there is the I'm going versus I'm interested. >> Ryan Acton: Yeah, but still I don't have any then followup measure of if you actually went. >>: You're not committing -- Like it's not even the event organizers creating these events. It's just not like... >> Ryan Acton: Yeah, it's more like on Facebook event case, it's more of an indication that you're actually going because I know people use it for RSVP. >>: Yeah, so I wonder if there's even -- Right, like how much are people aware that this person is interested in this event? >> Ryan Acton: And how much are people paying attention to it. Yeah, right. >>: And also if there's a cap on the event size. I know these events do tend to be in small venues where only 100 would go anyways. >> Ryan Acton: But keeping in mind, though, that I'm not -- Last.fm is not capping it based on the venue size. So again this is an online community forming around this offline event. >>: But I have actually seen -- [Inaudible] -- a study looking at as event size increases the explicit RSVP's go down. [Inaudible]... >>: Yeah, that's actually a question I was going to ask. Can you go to the actual [inaudible]... >>: [Inaudible] of responsibility... >> Ryan Acton: For the network or the... >>: Yeah, the next slide. That was what I was going to ask. Yeah, right there. >> Ryan Acton: Yeah. >>: So if you zoomed on event size being small, it slightly looks like you do get above the line. >> Ryan Acton: Right. >>: But then it starts to go... >> Ryan Acton: And then it narrows down. >>: It's almost. That's the question I was going to ask. It sort of looks like... >> Ryan Acton: The effect might hold in smaller groups. >>: Well, initially and then it just kind of [inaudible]... >> Ryan Acton: As it gets bigger people feel less compelled to RSVP. So that very well could be happening. Yeah, I mean you sort of see it narrowing. What's that? >>: That's pretty nuance at that point. I mean it's -- Yeah. >> Ryan Acton: But it's clearly nothing like the pure network recruitment prediction would predict. It's clearly not that. Maybe on some very small scales at certain levels but... >>: And the other thing that might play a role in this is that friends on last.fm are not all located in the same geographic base? >> Ryan Acton: Sure. >>: I assume a lot of people friend each other because they have similar... >> Ryan Acton: Similar music taste. >>: ...but they might live [inaudible]... >> Ryan Acton: Very good point. >>: ...so they cannot attend the same event. >>: Very important actually. >> Ryan Acton: Yeah. >>: I would strongly recommend doing the same analysis on either Meetup or Facebook. >> Ryan Acton: Right. Where people are likely to be co-located which is the next natural step for Meetup.com. >>: And the notifications for friends, find out that you're at this event [inaudible]. >> Ryan Acton: Yeah. Yeah, the whole reason -- Because you -- Yeah, I totally agree. Yeah. >> Andres Monroy-Hernandez: Let's thank Ryan. >> Ryan Acton: Thank you. [Applause and voices]