AdamicTimeSeriesNetS.. - University of Michigan

Co-evolution of network structure and content Lada Adamic School of Information & Center for the Study of Complex Systems University of Michigan Outline  Co-evolution of network structure and content  Can the structure of Twitter and virtual world interactions reveal something about their content?  http://arxiv.org/abs/1107.5543  Can the structure of a commodity futures trading network reveal something about information flowing into the market?  http://papers.ssrn.com/sol3/papers.cfm?abstract_id=136 1184 What is the relationship between network structure and information diffusion? 3 Is information flowing over the network? Or is information shaping the network? Can the shape of the network reveal properties of information  Big news! Giant microbes! Can the shape of the network reveal properties of information  Little news. How’s the weather? Related work on time evolving graphs  Densification over time (Leskovec et al. 2005)  Community structure over time (Leicht et al. 2007, Mucha et al. 2010)  Change in structure (ability to “compress” network) signals events (Graphscope by Sun et al. 2007)  Disease propagation & timing (Moody 2002, Liljeros 2010)  Enron email (B. Aven, 2011) What’s different here  We look at network dynamics at relatively short time scales and construct time series  A range of network metrics, instead of just community structure  Information novelty and diversity as opposed to tracking single events / pieces of information Can the network reveal…  If everyone is talking about the same thing, or if there is just background chatter.  If what they are talking about is novel? 1st context: virtual worlds  Networks: asset transfers (gestures, landmarks) and transactions (e.g. rent, object purchases)  Content: assets being transferred 10 Study transfers in the context of 100 groups with highest numbers of transfers 11 Second context: Twitter Network  microblogging : < 140 characters / tweet  Network links read from tweets  Reply or mention: by putting the @ in front of the username  Retweet: repeat something someone else wrote on twitter, preceded by the letters RT and @ in front of their username Selecting Twitter communities to track  http://wefollow.com/twitter/researcher  For each “researcher” gather tweets of accounts they follow Highly dynamic networks repeated of edges 0.10 0.15 0.20 SecondLife Twitter  Segmentation: Twitter: every 800 tweets  % 0.05  1 2 3 4 5 6 7 Segments # of segment elapsed 8 median segment duration 1.5 days SecondLife: every 50 asset transfers  0.00 percentage 0.25  median segment duration 8.4 days Conductance: capturing potential for information flow A A B low conductance A B B medium conductance high conductance wkl Cij = å Õ deg(k) paths _i _ j edges _ k _ l _ on _ path  Temporal conductance (summed over all pairs):  High if pairs of nodes share edges, or many short, indirect paths Koren, North, Volinsky, KDD, 2006 Network expectedness  Define expectedness:  Average conductance of all neighbor pairs at time t,  based on conductance of pair at time t-1 1 Xt = Et å edges(i, j ) C i,t-1j expected unexpected 16 network configuratio n at t=0 conductance = 4 possible configuratio ns at t=1 conductance = 4 expectedness = 1.5 edge jaccard = 1 Conductan ce and expectedn ess as a toy network evolves d conductance = 4.5 conductance = 6 expectedness = 1.3333 expectedness = 0.5 edge jaccard = 0.6667 edge jaccard = 0.25 SecondLife: network structure and content standard network metrics are not indicative of information properties overlapoverlapD diversity D diversity t-1,t t,t+1 t-1, t t, (t+1) conductance and expectedness are Conductance & diversity of information  High conductance brings higher content diversity  Repeat network patterns bring less diversity and less novelty  but… similarity and novelty are positively correlated (r = 0.19) Social and transaction network of top sellers in SL Twitter: textual diversity and novelty  Semantic metrics Metric Type Computation Methods between connected node pairs in the graph Contemporary Metrics (average cosine similarity of words in Tweets) between indirectly-connected node pairs, i.e., non-neighbors with an undirected path of length > 1 between them between isolated pairs (in different components) Novelty Metric (Language Model distance) between two sets of tweets associated with Twitter networks captured at different times network structure Twitter: network structure and information diversity # nodes(T) -0.584 ! ! ! -0.632 0.305 ! ! ! 0.030 ! ! ! # edges(T) -0.537 ! ! ! -0.601 0.348 ! ! ! 0.058 ! ! ! 0.6 0.4 reciprocity(T) -0.160 ! -0.179 ! 0.176 ! ! 0.128 ! clustering coef.(T) -0.198 ! ! -0.240 0.181 ! ! 0.030 ! ! ! centralization(T) -0.121 ! -0.176 0.158 ! ! 0.062 ! ! edge deg cor.(T) 0.027 -0.155 ! ! 0.113 0.054 ! ! av. degree(T) -0.287 ! ! ! -0.353 0.323 ! ! ! 0.093 ! ! ! sd. degree(T) -0.212 ! ! -0.277 0.251 ! ! ! 0.048 ! ! WCC size(T) 0.317 ! ! ! 0.303 -0.126 ! ! 0.038 ! ! ! conductance(T) -0.444 ! ! ! -0.506 0.369 ! ! ! 0.121 ! ! ! expectedness(T) -0.145 ! ! -0.161 ! 0.234 ! ! ! 0.092 ! ! ! all-pairs unconnected indirectly-connected connected content similarity 0.2 0.0 -0.2 -0.4 -0.6 Inferring Network Semantic Information  Question: Does the network structural information help to improve the prediction performance of the characteristics of information exchanged? Semantic variables Topological variables Kernel Regression Prediction Model Semantic variables Example: Inferring the average similarity score between isolated pairs 0.8 0.6 0.4 2 R in predicting the ASS between isolated nodes 1 0.2 Q c :X ={connected} 1 1 c2:X2={indireclty−connected} c :X ={# nodes} 3 3 c :X ={# edges} 4 4 0 . . s s s y ted cted ode dge ocit coef ation cor Deg Deg Size ance nes c v d r g t e ne n r d t C e t a c p g n e s C u cte n n # # eci rin cen e d W o n d pe r te n co −co g s o d c ex y l u c e t l c ec r i ind The input variables of curve ci start from Xi and increase each time by adding the variable labeled on x-axis.  Don’t need to use other textual variables (e.g. similarity between indirectly connected pairs) when sufficient topological information available  Reason: topological variables account for much of the pattern in the text! Network structure and information novelty  Greater novelty in edges # nodes(T-1,T) corresponds to # edges(T-1,T) greater novelty in reciprocity(T-1,T) content shared clustering coef.(T-1,T) centralization(T-1,T))  For nodes that are edge deg cor.(T-1,T) interacting (citing av. degree(T-1,T)) or being cited): sd. degree(T-1,T)) WCC size(T-1,T))  Higher edge jaccard(T-1,T) conductance and conductance(T-1,T) expectedness expectedness(T-1) correlates with less expectedness(T) information novelty 0.3 0.124 ! -0.050 ! ! 0.171 ! -0.117 ! ! ! 0.042 -0.004 0.149 ! -0.197 ! ! ! -0.018 0.038 -0.111 ! ! 0.101 ! 0.066 -0.044 ! 0.083 -0.119 ! ! 0.085 -0.101 ! -0.233 ! ! -0.230 ! ! ! 0.202 ! -0.225 ! ! ! 0.171 ! ! -0.143 ! ! 0.093 ! -0.273 ! ! ! 0.2 0.1 0.0 -0.1 -0.2 -0.3 LMdist_allNodes(T-1,T) LMdist_NodesWithNeighbors(T-1,T) Information in trading networks  CFTC = Commodity futures trading commission  stated mission: protect market users and the public from fraud, manipulation, and abusive practices  futures contracts started out as contracts for agricultural products, but expanded to more exotic contracts, including index futures Collaboration with Celso Brunetti, Jeff Harris, and Andrei Kirilenko http://papers.ssrn.com/sol3/papers.cfm?abstract_id= 25 Data  6.3 million transactions in Aug. 2008 in the Sept. E-mini S&P futures contract  price discovery for the index occurs mostly in this contract (Hasbrouck (2003))  data includes: date & time, executing broker, opposite broker, buy or sell, price, quantity  sample in broker transaction windows of 240 transactions executing opposite broker quantity: 10 price: $171.25 matching algorithm sell 10 contracts at $171.25 buy 20 30 contracts at $171.50 $171.25 sell 5 contracts at $171.75 buy 30 20 contracts at $171.25 $171.50 sell 20 contracts at $172.00 buy 50 contracts at $171.00  limit order book 27 not social, not intentional, not persistent 28 Financial variables Rate of return: Last price to first price in logs (close-to-open) Volatility: Range – log difference between max and min price Duration: start Total period duration - time in seconds between the and end of each sampling period Proxy for arrival of new information Volume: Trading volume – number of contracts traded What can we learn from network structure? e.g. centralization? low in-centralization high in-centralization low outdegree low indegree high outdegree high indegree 30 overview of network variables  # nodes, # edges  clustering coefficient, LSCC, reciprocity  CEN = giniin-degree – giniout-degree  INOUT = r(indegree of node, outdegree of same node)  AI (asymmetric information) 31 Correlations between network and financial variables High Centralization: market dominance - a dominant trader buys from many small sellers – low duration, low volume Correlations between network and financial variables Negative assortativity: large sellers sell to small buyers and vice versa – low duration, higher volume Correlations between network and financial variables High av. degree & largest strongly connected component: no news - many buyers and sellers – high duration, high volume Correlations between network and financial variables Rate of return: positive correlation with centralization Volatility & duration: Volume: correlated with standard deviation of degree, average deg. and the total number of edges (E). Correlated with a few network variables, sign varies. Conclusion  Network structure alone is revealing of the diversity and novelty information content being transmitted  Results depend on the scope and relative position of the activity in the network Future work  Sensitivity to inclusion of non-interactive or across-community interactions  Applying novelty & conductance metrics to financial time series  Continuous formulation of novelty and other network metrics (because segmentation is problematic)  Roles of individual nodes  Thanks:  Edwin Teng Liuling Gong Avishay Livne  Information network academic research center Questions?

AdamicTimeSeriesNetS.. - University of Michigan

Related documents

Products

Support

AdamicTimeSeriesNetS.. - University of Michigan

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib