22270 >> Tom McMail: Good morning. And welcome. ... Microsoft Research Connections Group. And I'm very pleased today...

advertisement

22270

>> Tom McMail: Good morning. And welcome. I'm Tom McMail from the

Microsoft Research Connections Group. And I'm very pleased today to welcome

Dr. Eric Xing from Carnegie Mellon University who will speak today about dynamic network analysis.

Dr. Xing.

>> Eric Xing: Thank you, Tom, for having me here. I know it is very short notice.

So I appreciate that you still have interest in attending this presentation.

So I'm going to talk about some of my recent research, dynamic network analysis. And this is joint work with a couple of my students and post-docs in my group called Sailing Group in CMU. So let me start by a quick motivational problem. So network is now a very popular form of representing relational information in our world.

For example, we now use social networks to communicate with others, and on the social network we have data in the form of a graph that is connecting people and also on each node we have text and other information indicating the activities of users.

And the network also prevalent in other domains such as biology, where you can express regulatory structures, influences and correlations in the form of a graph.

So there's a natural interest in understanding what do all these graph mean and what can we do with it and how can we use graph to perform useful activities.

So therefore people are interested in performing network analysis. And what do I mean by network analysis. There are different ways of analyzing the network, all the way starting from a simple task such as visualizing the graph. This is actually one of the best forms of analyzing the network. If you have a way to the graph on a two-dimensional space, such as this, you can tell something about network.

Therefore, there are a lot of research to make sure you can produce a beautiful graph of that work so your paper -- this almost becomes a prerequisite for your paper to be considered for nature science publications these days.

This kind of information is not really, really very informative because you lose a lot of information when you project a graph into a two-dimensional space like this.

So there are other more quantitative ways of doing analysis of networks, starting by measuring some of the properties on networks. And here I gave you a few examples of these properties. For example, starting from the last decade there has been a lot of work in studying the global topological measures of that work, such as the great distribution to describe connectivities between networks, the average pass length in a network so you can tell how fast information can propagate and class coefficients to tell you some kind of group it is of individuals on that network.

And actually counting such statistics is nontrivial thing. And there are also more complex statistical counts to collect. For example, some people are interested in studying something called local network property, which amounts to count the frequency of occurrences of sub graphs like this in a network. Each of the sub graphs of course tell you certain activities being carried out by the network.

So this is an active area in network research, which gives you some useful information. And down the road there are more balanced and quantitative research along this line using model-based techniques.

For example, people have developed a number of models which can generate graphs which the similar global or local properties I described before, and therefore gave people the impression that you may understand the network a great deal already because you can generate a network which might be similar to that we will study.

But of course this similarity is only defined in terms of a similarity in terms of the counting statistics, whether the graph themselves can be called similar, people don't know quite well yet.

Also, there has been other techniques which seeks to answer maybe more out of bounds question such as models like these which can allow you to identify groups among individuals in the network, which allow you to project some network in the latent semantic space so you can to the group formations and things like that.

And all this has contributed a great deal to the current knowledge of the network.

However, these kinds of techniques are still primitive in the sense of satisfying user needs.

For example, certainly if we look at all the techniques I talked just now, you can imagine that we now have a way to produce impressive graphs of networks. We can make some insightful statements about networks, properties such as a thing skill free small word and these answers are cool. It makes you feel like you know a great deal about network.

But if you show such results to a real network user or developer who wants to build utilities on the network, they are far from being satisfied. For example, they ask the following questions: Can you tell me a little bit more detail about network, such as can we infer on every node and every edge what is happening in the network.

Can we make predictions? And can we make simulations on network, that is closer to the real world? And it turns out that we don't have a lot of technique in doing all these tasks. Most of the current analysis are good at telling big stories, and actually those stories are sometimes very obvious, say, power network, a social network are both skill free. But beyond that you can really not make useful statement about individual activities, link occurrences and their dynamic fates in this complex world.

Therefore, the serious detail hunters are still seeking for new technologies. And in my group I studied some of these problems, and here I gave you a few examples of how we actually define these kind of problems. For example, one of the problems we are particularly interested in addressing is the so-called network tomography problem. It starts from a more aggressive inquiry of network properties at microscopic level. For example, when we say an individual, instead of just assigning an individual a cluster label which is sometimes naive because people are not really belonging to a single class, they have complex personalities and and social roles. Therefore, it might be more useful to seek a multi-row assumption on individual activities.

And ask what kind of roles they play in different contexts and how these roles are changing. Here I have an example in biology which shows you in need of that.

A particular gene in biological system is really performing a certain single role.

Really they can't multiple roles like this, this gene is involved in muscle development and body segmentation neuro development and so on and so forth.

The instantiation of these different roles is context dependent, depending on who it is interacting with, they may perform different roles. So can we come up with a good system to infer these? If you replace the gene with an individual, you can ask even more sociologically interesting questions.

Secondly, there is this very crucial aspect of network, which is dynamics. You know, as we can see from this sequence of graphs, this graph is going to -- I'm going to talk about this graph in a second in more detail. But it shows you the way social systems are changing over time.

Like in the U.S. politics, we know that senators and politicians are making friends in a dynamic way. Every time they have different liaisons. Therefore, can we capture this phenomenon and ask questions like can we predict productivities and antagonism click information and so on from social individuals for certain tasks whether we get their vote or not.

Lastly, there's even a more fundamental question which relates to how can we obtain social networks. Many of them we assume that social networks are given to us. For example, we have a Twitter network and Facebook network, but how trustworthy they are is actually an interesting question.

Very often with declare 500 people to be our friends, but if you use all that information to make predictions you will find out that they're not very reliable because only a few of them are your true friends and most of the others are just declarative friends which are not necessarily meaningful in predicting your activities.

So here I indicate the kind of complexity that this problem can involve. For example, here I have a network. It is called a Jesus network. It looks like a very interesting Jesus being the center of the node of the graph and there are disciples around him. But do you know how this network is the created? If I tell you, you'll probably want to stop using it.

This network is created just by looking for every line in the bible whether two names co-occur in the same line or not. Basically. That's a strategy of creating a network.

And you can imagine there might be some more sophisticated ways of creating that. Now, that means that network induction itself can be an interesting question to study. And there are, you know, different quality measures on network induction.

So these are the questions that I'm going to talk about in the rest of the talk. I'm going to first focus on a theory to describe the dynamic activities of network changing over time. Then I'm going to talk about reverse engineering social network space on nodal attributes and finally given the inferred networks I'm going to talk a little bit about how to estimate tomographies from the involved networks. So let me begin with the first problem. How to model. Because having a model really enables you to answer deep questions like making predictions on the fate of network.

So the problem is technically laid out in the following way. We have a sequence of graphs. Okay. And we want to write down a probabilistic distribution on this graph so that we're in some parts of the whole system is missing, we can use the joint distribution to make an inference on that part.

And that brings me to actually writing down an explicit expression of this probabilistic distribution, which is taking this form. This is a joint distribution of graphs that is occurring on time one all the way to time T. And, of course, this is a very complex distribution.

And in dynamic systems modeling domain, a usual practice is to break them down into a product of smaller components. Each of them tells you a transitional probability from one graph to another. This is called a Markovian assumption, which is very useful to make the problem more tractable and simpler. But now the question reduced to how can we parameterize this model, basically how to write down the probability from one graph to another.

In fact, we don't have a ready-available tool for this yet. What is available in the literature is the following: There has been some study called expanded random graph models that allow you to write down the distribution of a single graph using properties collected from the graph.

For example, here it is appeared as exponential family distribution where you have a dot product between a vector of coefficients and a vector of features. And these features are often known as potentials and the coefficients are known as the weeds.

And the potentials is a way for people to summarize the graph. For example, you can count the number of edges in a graph, the number of two stars, the number of triangles and so on and so forth and use them to write down the distribution of the graph using this linear sum of all these potential functions.

And this study, this model has been studied for a long time starting from '96 -- '86 and is a mature theory that people understand -- this kind of model can be useful in predicting the likelihood of a graph observed in the real world.

And here we took on this approach and a little bit, did a little bit extension. We go on. We went on and used this exponential random graph model to describe the dynamic probability of a transitioning from one graph to the other, from T minus 1 to T. And the strategy is the same.

Now, we replace the potential function from the property in a single graph to a so-called temporal potential which defines the property of network signatures across two graphs that occur adjacent in time.

And here are some examples of these potentials. For example, you can write down a potential function which is called contiguity which gives you a tendency of how an edge is going to be retained across time.

There are other properties such as reciprocity transviity and density, each of them reflects certain aspects of network property changing over time.

And then the way to write down a joint distribution of the transition from one graph to the other is just to put a weighted sum of all these different small properties to let them play a certain role in driving network in a certain direction.

So that leads to a nice model which is known as the temporal expanding graph model that can be useful actually in making certain characteristic studies of networks.

So here I show you there's actually a lot of interesting theoretical properties of this model that I skipped in this talk because they are mainly leading to interesting but not necessarily intuitive statistical insights. Here I show you actually some empirical outcome of the utility of this model.

So we have collected a sequence of network from the U.S. senate dataset. I'm going to tell you a little bit later how this collection of network was created. But now assume that we have all these networks available to us over time. How can we write down a probability distribution, say, how likely this sequence is going to happen in a real world.

And here is our design of the potential functions, which is a very rich setting, because you can pretty much write down your own knowledge about what kind of properties can be occurring, can be present in the sequence of network and then what amounts to learning the network is to figure out how they are going to be weighted in a probabilistic model.

So here I have examples of all these different signatures, like co-supporting. I support, I and him support both of the third party and so and so forth. And these are the learned weights by training a placeholder of all these features in empty model and then I'm going to get estimation of that.

Now, this learned model, we hope them to have a certain goodness of fit on real data. And here is a test on how good they are in fitting real data. So this is a diagram which shows you the predicted network and the true network and how far they are different from each other.

What we did is that we take a sub sequence of a network from one particular T prime minus one and then we use this subsequent to train the model, by estimating all the parameters and then we use this parameter, this estimated model to simulate a network at time T.

And then I'm going to look into my real dataset and look at the real network at time T to compare how my simulated network is similar to the real network.

And here the measurement, of course, is conducted on content certain interesting graph statistics such as degree distribution, diameters and so on and so forth and this graph shows you the discrepancy between the simulated one and the real one.

The real line here is the real one, and it's boxes with the red line in the middle is the simulated one because we have multiple simulations therefore you see a spread of the resultant statistics.

So the message from this graph says that the simulated ones are really very close to the real ones, meaning that the network being estimated from the real data is actually indeed characterizing interesting properties of the network.

And here I show you a broader spreadsheet of different statistics of the network.

Let's see. Yeah. So each of these graphs shows you the simulated -- the graph statistics from the simulated graph and from the real graph, and you can see some of the statistics are well characterized in the learned model, such as this one, even a sharp change in the network can be captured by our trained model, meaning this model can be used to predict such sharp change here and here.

And there are some other statistics which is hard to capture, for example, this one it's oscillating. And our learned model is not as oscillating as the real one, meaning that this dynamic has failed to be what was not captured in the trained model.

But by and large, many of the properties are well captured in this model. So we can declare that this is a good model for making interesting predictions on network activity at a global level, say whether a particular co-sponsorship or in this case popularity is going to change over time or not on the network.

So once we have a model, certainly we want to make, to use it to certain downstream tasks. And here you can imagine the following tasks. You can test hypotheses, say whether certain activities are taking place in a network sequence. Or you can do data exploration to do node classification and so on and so forth.

And one of the more interesting applications that we want to explore is that we want to use this model as a foundation for learning the network topology when we don't observe it.

So what this problem mean? This is actually a very interesting problem. It's often ignored in the field. For example, in here suppose that we want to monitor social networks and in the previous study I just finished presenting I told you I have a sequence of social networks, my U.S. senators. But this is not actually a quite valid statement because these politicians are not really that ready to tell you who their friends are. Therefore, you don't actually observe their network. They may be even lie to you about their network.

So what they cannot lie is their activity. There are boxes of votes that they conducted over certain bills, and that may actually reflect their behavior in the social world. Can we use that, for example, to infer such a movie of network?

Could be an interesting problem.

And similar things occur in biology, where we have a system like this and we have observations of gene activities over the system, can we ask the question about the gene network that is driving these changes? That's an equivalent question as the social network question that I just asked.

And in typical these questions are conducted, studied in a very simplistic setting, which actually ignores the time stamps and ignore the set up that was changing over time. For example, a typical paradigm for people who study network induction is the following. I collect a sequence of node information. I'm going to ignore time stamp and collapse them as ideal examples. And then I run a graph learning algorithm to infer a single graph assuming they're dominating over the entire data sequences.

Our assumption is a little bit more aggressive. We assume that every one of these observations is coming from a time-specific model. Therefore, every time point there is a model, and our goal is to estimate every graph and the perimeter of the graph at every time point from the series data, which is actually a very nontrivial task because of the following reason:

For example, now you have a sequence of information of user votes and your goal is to estimate all these hidden graphs, underline the nodes and the problem is actually very, very difficult because of the following reason: Say I want to estimate a graph at this time point T plus or T star, and then how many data is associated with the graph at T star. Only one data points, because you can only -- the things can only happen once at a single time point. Therefore, you cannot really do statistics to use one data point to estimate a model. Not allow estimating every model at every time point.

So we actually are really puzzled by this problem and we tried a bunch of approaches, beginning with a very model-based approach people yearly conducted in models like hidden Markov model for speech recognition, where you want to estimate a hidden sequence of states given some time series

observations. Now we can treat those hidden states as the graph structure, and maybe hopefully run a hidden Markov model-like process.

So this model is already presented just now. We can use, for example, the temporal Erga model I just talked about to function S, a transition model, and I can also define some emission model using a graphical model type of formulation. And looks like we have a pretty nice platform for running inference on every graph if we assume this model is true.

But it turns out that this problem is, this solution is not feasible, because in HMM, we know that time complexity of HMM is quadratic to the size of the state space of these hidden variables. And in this case the size of the hidden, of the state space of the hidden variable is the entire configuration of the graph of N nodes, which is a 2 to the power of N squared which runs through billions of possible states. Therefore, you cannot really store even this eigenvector information in a computer to conduct computation.

Therefore, this algorithm actually failed at the very beginning because we simply cannot implement and write. So that leads us to go through some simpler technology which we are actually developing over the past few years, which really pushed for two aspects of a graph inference. One is we want to seek a simpler representation of the graph as a model in which once we estimate the model, we can automatically extract the topology of the graph from the model.

Basically, there's a one-to-one correspondence between the statistic model and the graph structure.

And the other is a notion called sparsity, because if we don't have enough data, that also suggests that we have to reduce the complexity of the model so that our inferences are still feasible to be studied.

So here we actually are resorting to a simpler procedure which we called a sparse graph estimation problem.

So here is basically a setting: Suppose that our goal is to estimate a graph.

Okay. And how this graph is associated with a probabilistic model and here's the

[inaudible]. We can write down the joint distribution of all the node observations.

It could be the user multiple functions or his particular binary action such as voting yes or no on a certain bill and underlying that there may be a graph.

And this graph actually can be used to prompt write the distribution of the data in the following way, which is called a Markov random fields.

In a Markov random fields, every edge in a graph correspond to a coefficient that is going to measure the contribution of a pair of nodes connected by this edge.

And here this pair of nodes will be replaced by the real life instantiation of their value that you observe, say this guy votes yes and that guy votes no and therefore you replace this with one plus one, replace that with minus one. And this is a Markov random field which can be used to relate a model structure to the data you observe.

So once you conduct a so-called maximum likelihood estimation of the model, then by simply examining whether a particular coefficient here and here are 0 or not, you actually can automatically recover the graph structure. So that's how we want to use a principled approach to transform from data to the graph structure.

And here is a more technical illustration of the model, which I'm going to escape, and I'm going to download right into the algorithm.

So the algorithm is actually built on a modern technique known as sparsity. And sparsity is very useful because it doesn't make sense, for example, for us to assume that we have everybody in a network as a potential neighbor of myself.

That's simply absurd.

And secondly, it also helps the statistical inference I was in to overcome data sparsity, because mathematically when you have a small amount of data it's mathematically infeasible to even estimate a dense model. So therefore to introduce sparsity, we used a technique called constraint optimization problem.

But going to estimate a loss function, okay, which is describing the fitness of data under the constraint that the total degree of every edge of every node is upper bounded by some number. And this leads to a pretty difficult estimation problem which we had some nice way of relaxing it, and at the end of the day we have a very simple algorithm known as neighborhood selection as the illustrated in this graph.

So neighborhood selection is a sequential way of estimating jointly a graphic model using a pretty simple linear regression-based algorithm.

So say we have this eight node in the graph and we want to estimate the topology over these eight nodes, and math strategy is going like following: Let's randomly pick a particular node and we are going to perform a regression from all the other nodes to this particular node which basically says how predictive other nodes is on this particular node of interest.

And using a technique known as the lasso regression, you can actually obtain a so-called sparse estimator of the neighborhood. Therefore, a few of these edges will have a non-zero weight and we declared them to be the neighbor of this node.

And then we move on to the second node and then perform the same linear regression or logistic regression with sparse constraint, and then we estimate its neighbor.

If we go through this process over the entire collection of nodes, at the end of the day we can complete the neighborhood of every node and that leads to actually a complete estimation of the whole graph structure at the end of the day.

And therefore nice series proof that a simple algorithm like this will actually consistently recover the true graph underlying the model with a pretty strong statistical guarantee as this.

So this is a known technique that has been discovered in the past few years, but it is still short of estimating a dynamic sequence of networks. And that is our unique distribution. Not going to bring this algorithm to this dynamic award so we can estimate now a graph for every time points.

And the technique extension is actually built on a very simple intuition that the network doesn't have to be modeled in a very complex way in terms of how they change. Really in the real world, we're probably most often encounter two scenarios. In one scenario you can imagine the network is changing so-called slowly over time. Maybe every time point they change five percent of their edges at known places.

And that's a particular scenario. And the second scenario could be an abrupt change say network stays constant for a certain window of unknown lens then they suddenly jump to a different network.

These are the two scenarios we studied very carefully, for each of them we come up with a simple algorithm for the latent graph recovery.

For the interests of time I'm going to show you only one of them which is the first one called the Keller algorithm. It's actually also a L1 regularized regression problem with a slight twist on the last function which I colored blue here. So this is a very strange loss function, that is explicitly making use of the fact that the data is a time series.

At every time point, I'm going to bring in a likelihood function that described the predictivity of neighbors at node of interest. And I'm going to introduce a weight which is going to measure the contribution of this particular data point to the time point that I'm estimating. So here is a graphical illustration.

Say I want to estimate this graph. I have only one data point here. To improve the data sparsity, I'm going to bring in other data as well to the estimation of this model, and obviously all these data points are not from this model, but if we assume that the model is changing smoothly over time, then maybe this data is somewhat relevant to this model, with a certain degree of strengths.

So that's the weight function. Say this data may be 90 percent likely to be coming from this model. And this data maybe 70 percent likely from this model.

And by reweighting that, now for this particular model we'll have the entire dataset at disposal for estimation. So at the end of the day we can pretty much run the same algorithm as we did for graph regression by using the design I introduced before for likelihood function and then use this kernel window to smoothly reweight the data points in here.

For example, my bucket of votes is now spread across time with different weights, and then this graph actually does have multiple data points from it with a different weighting scheme.

That leads to a very simple algorithm which can be scaled to millions of nodes at this point. And we actually studied the theory of this algorithm, and approved a strong property that our estimator is having the same property of convergence to the true graph as what you do in a static world.

Okay. For the interests of time I'm going to skip the presentation of the second algorithm. But they pretty much do the same thing. They're a different scenario with the similar type of statistical guarantee. At the end of the day you see this graph. We want to measure the accuracy of the estimation. As usual we measure the accuracy first on simulation because that's where we know the ground truths, and we compared our estimator with some competitors.

Well, there isn't any competitor, because there isn't any other algorithm which is performing this dynamic network estimation. So we invent a few competitors, dynamic competitors. One is called a static network estimation which we assume all the data are coming from a single model and we just estimate one model and examine whether in this model all the edges occurred in the sequence of graph are collected in that single model. And as you can see, our model quickly outperform it once you have more and more data points.

And this is even worse [inaudible] model which is built on the so-called dynamic bayesian network. It's a classic technology of estimating time invariant graphs.

Again, the performance was really bad.

What's more interesting is to see how this whole thing performed in real world data. So here back to the senator data. I'm going to show you a cute example of how likely this model or this technique is going to generate interesting results.

So we have the votes from a particular year in the U.S. senate. And these are the bills they voted on, which is basically describing you the activities of every senator.

And our goal is to use this data to estimate a sequence of networks that is changing over time. Okay. Now, hopefully give you some information about the senator relationships.

So at a big level, high level, I show you the resultant estimated graphs, and, by the way, this estimation, the algorithm has no knowledge about whether the senators are from a certain party affiliation, and also their friendship information.

All they had is the vote information.

Here is the graph for visualization I highlighted the party affiliation of the senators with their color, the red corresponds to Republican and the blue corresponds to

Democrat. And you can already see the graph start to make sense, because you already see at a very high level a party, a party structure in this model showing that different parties are aligned with each other differently.

And also there are some interesting guys in the middle which hopefully tells you some different insight about U.S. politics and also the power of our algorithm in

discovering these kind of insights and also you can see Senator Obama at that point was at the center of the democratic party.

Let's take two examples of who are in the middle. Here's one example. Senator

Chafee was actually a Republican but also a very liberal Republican.

So his trajectory was the following. He started his term with a lot of colleagues in his own party. And after some time points he started to distance himself away from his own party and making friends from the opposite party. And actually going to this point where he has most of his colleagues from the other party.

Okay. This pattern was discovered by our algorithm, but it is also known to be true by, just average people on the street, because we know that this senator at the end of the day turned his party line lost his re-election because of his dis-alliance of his party affiliation.

This is the opposite story. Ben Nelson was a very conservative Democrat from

Nebraska. During his early years, he actually had -- he seems to be a neutral person. Politically neutral person with both democratic and Republican friends, but after a while he lost all his democratic party friends and turned himself into a

Republican. And again what people know about this senator because in his re-election he actually was actually getting most of his votes from the voters of the opposite party to get himself reelected.

So, again, the algorithm discovered a phenomenon consistent with people's understanding about the U.S. politics.

So the implication of this example basically tells you that an algorithm as simple as I just talked about can actually be used to discover hidden social networks that is making sense in real life.

And, again, similar technology can be used for biology where I can actually estimate a movie of evolving networks like this, that is describing how the gene regulation is changing over time.

Again, I'm going to skip the details of how to interpret this network. Now let me drive to the last bullet of this talk. So we have this algorithm for estimating unobservable hidden involved network, and once it's discovered it's no longer hidden and we can use it as input for us to analyze downstream objectives such as tomographies of the underlying network.

Here we wrote a few papers, including actually, okay, a few papers that describe the technique for doing so.

So what do we mean by tomography inference? This is typically a Facebook problem. People actually were asking what I'm doing in Facebook. I'm doing this.

Given you a network, the network has text, has images, has demography and all that kind of thing. Your goal is to turn all this information into a feature vector one

for every individual, so that you can use them to make predictions, such as an

ECTR predictions.

And this whole thing, of course, is very high dimensional and is time evolving and the goal is to come up with a clean statistical technology to make this inference possible.

In particular, as I said I want to cast this problem as a multi-row prediction of every node so we can say something very sensible about people's social role in a complex world.

And to build this model, now I'm going to switch gears to embrace a bayesian technology where we can build the so-called generative model. The beauty of a generative model is that you can synthesize a model based on your knowledge about how social world is functioning, and then because of this mechanism your generative data and I give you data you can use the same machinery to reverse engineer the model.

So here I give you an example of how this model can be formed. For example, suppose I want to form a social network between these four individuals. And I begin by having already the mixture membership functionality of every individual.

Say this is myself, I have a vector of mixed membership saying that I am

40 percent as academia and maybe 30 percent working in entrepreneurship. I'm also 20 percent engaging in family life. Fortunately my wife wanted me to have a bigger portion of this but there are other roles as well.

Now other individuals have the same thing. Now, if I start from this conjecture, how will I build my social network. This is the story. Say I'm coming here today to Microsoft I want to see my friend Allison and the model needs to decide whether there is a link between myself and her in this context. And how am I going to estimate a probability of formal link. Here's one scenario. Why I'm coming here? I may come here for giving a talk. And also I may come here just to have a random chat with a friend. So in that case I'm going to sample my role from this distribution so that I can decide what role I'm playing.

She does the same thing. She can be -- battery -- before it dies let me quickly rescue it. Okay. And she needs to do the same thing. Basically she needs to decide whether she wants to meet me to play golf in the vacationnal mode or she needs to also be in a mode of academician, scholar and discuss research.

Say we have a match. I decide to sample my role to be in a vacation mode and to play golf. She does the same thing. There's a high probability we form a link because we're aligned on functionality.

Then tomorrow I go home and I want to see my baby. And I could also be in different modes. I say I'm so busy in meeting a deadline so I want to lock myself in the office and don't want to hug my baby. And but if my baby you know wants to see his daddy, then he will be sampling his role to be a son and I will be a professor, and then chances for professor to really spending time with a son is very small. But if I want to be in a family mode, then I have a higher probability.

So this is just the story telling people how a social network can be dynamically formed. And this language can be succinctly captured by a tool called graphic models, where I start from a pair of rows between the individuals to be performing a link and then I'm going to sample actually the context of a certain link to be formed. And then based on a configuration of a pair of rows between me and the other party, I'm going to generate a probability of forming this edge.

So this is a well-known probabilistic formulation of doing a network induction known as mixed membership stochastic Brock model and the way to make use of it is we actually observe this data already in real world because we see the network and now we want to go reverse and estimate now the hidden rows of every individual. Which can be captured in this picture. So, for example, here I have the model. I'm going to estimate the theta. The theta is basically a row of numbers in a vector and they define coordinates for that individual in a social space like this so that you can see some people are here, some people are over there, which reflects the fraction, the breakdown of different fractions of their social role in different dimensions.

That gives you a way to actually group individuals. And this is again a very interesting way of exploring complex social behavior. And I'm going to show you in a second with a cute example. And this whole story can also be stretched into a dynamic setting where I have not only a network, but a sequence of network that I can actually use machinery to estimate a trajectory of this social role changing over time in the social space.

Okay. So I have a number of interesting examples. But let me show you just one for this analysis. Again, given now these graphs, okay, and again you can just stare at this graph or zoom into a particular node to see how their network is changing, but it would be more easy to actually just to every node in their social space like this.

For example, let's see...okay. So here's the inference results coming from the social network in senators. Here I highlight two individuals, what is this? This is basically a stack of -- collection of columns. Each of these columns is actually a graphic illustration of a multi row vector. Every row corresponds to a particular color, and here I have four different rows. One is corresponding to a democratic-centric role. The other is a Republican-centric role and there are some other non-typical roles can be played by senators.

By looking at the color breakdown you can actually tell how much time is aligned with a democratic behavior, how much is aligned with a Republican behavior, and how such breakdown is changing over time.

So, again, go back to this guy, for example, Senator Ben Nelson in Nebraska.

We knew he started off as a very conservative Democrat and then turned to

Republican. And we can actually see that in this particular diagram. We actually can. Say he started with a lot of deep blue color, and that color keeps going down, down to the point to almost diminishing, and then this is the democratic color, Republican color. It was actually keep increasing until a point there's a

sudden jump of this particular color, what does that mean? It means he suddenly becomes extremely Republican at a particular point.

We actually find out this is the point that re-election is about to take place. He was actually quickly changed his party, his political stance toward the other party.

So again this graph gives you a very nice way of visualizing social roles and their change of every individual in the time. And here I have another story telling here, but for the interests of time I'm going to skip that. And it gives you really a very holistic way of visualizing social individuals in a space. Here I have the breakdown of the social trajectory of all the 100 U.S. senators in that particular year. And you can quickly tell there are some pure Republicans. There are some very die-hard Democrats, and there are some other people in the middle.

And if you have this chart I guess the lobbyists and the people who run presidency would be interested in seeing this and targeting their activities. And you can do it with Facebook and Twitter to do friend recommendations and things like that.

So that basically brings me to the end of the talk. I talked about a few problems in dynamic social network analysis, and there are of course a lot of open problems that I'm continuing studying at this point.

Especially I'm very interested in making all these inference algorithms to be scaleable to the whole Web so we can make large scale inference on multiple aspects.

And also there is a particular need for the so-called social media modeling and data integration, because in social media, you are dealing with not only a graph but also things -- everything on the graph like pictures, images, texts, and videos and other things. How to make this integratable to make it a coherence coefficient information is actually a challenging task. Okay. I think I'm done with the talk. This is my research group, part of which is involved in the study I talked about.

So, say, these are the people I circled with a red circle which contributed to different papers I mentioned in the above talk and the funding agencies for supporting. Okay. I think that's the end of the presentation.

[applause]

Download