AdamicTimeSeriesNetS.. - University of Michigan

advertisement
Co-evolution of network
structure and content
Lada Adamic
School of Information & Center for the Study of Complex Systems
University of Michigan
Outline
 Co-evolution of network structure and content
 Can the structure of Twitter and virtual world interactions
reveal something about their content?
 http://arxiv.org/abs/1107.5543
 Can the structure of a commodity futures trading network
reveal something about information flowing into the market?
 http://papers.ssrn.com/sol3/papers.cfm?abstract_id=136
1184
What is the relationship between network
structure and information diffusion?
3
Is information flowing over the
network?
Or is information shaping the
network?
Can the shape of the network reveal
properties of information
 Big news! Giant microbes!
Can the shape of the network reveal
properties of information
 Little news. How’s the weather?
Related work on time evolving graphs
 Densification over time (Leskovec et al. 2005)
 Community structure over time (Leicht et al. 2007, Mucha et al.
2010)
 Change in structure (ability to “compress” network) signals
events (Graphscope by Sun et al. 2007)
 Disease propagation & timing (Moody 2002, Liljeros 2010)
 Enron email (B. Aven, 2011)
What’s different here
 We look at network dynamics at relatively short time
scales and construct time series
 A range of network metrics, instead of just community
structure
 Information novelty and diversity as opposed to tracking
single events / pieces of information
Can the network reveal…
 If everyone is talking about the same thing, or if there is just
background chatter.
 If what they are talking about is novel?
1st context: virtual worlds
 Networks: asset transfers (gestures, landmarks) and
transactions (e.g. rent, object purchases)
 Content: assets being transferred
10
Study transfers in the context of 100
groups with highest numbers of
transfers
11
Second context: Twitter Network
 microblogging : < 140 characters / tweet
 Network links read from tweets
 Reply or mention: by putting the @ in front of
the username

Retweet: repeat something someone else wrote on twitter,
preceded by the letters RT and @ in front of their username
Selecting Twitter communities to track
 http://wefollow.com/twitter/researcher
 For each “researcher” gather tweets of accounts they
follow
Highly dynamic networks
repeated
of edges
0.10
0.15
0.20
SecondLife
Twitter

Segmentation:
Twitter: every 800
tweets

%
0.05

1
2
3
4
5
6
7
Segments
# of segment elapsed
8
median segment
duration 1.5 days
SecondLife: every
50 asset transfers

0.00
percentage
0.25

median segment
duration 8.4 days
Conductance:
capturing potential for information flow
A
A
B
low
conductance
A
B
B
medium
conductance
high
conductance
wkl
Cij = å
Õ deg(k)
paths _i _ j edges _ k _ l _ on _ path

Temporal conductance (summed over all pairs):

High if pairs of nodes share edges, or many short,
indirect paths
Koren, North, Volinsky, KDD, 2006
Network expectedness
 Define expectedness:
 Average conductance of all neighbor pairs at time t,
 based on conductance of pair at time t-1
1
Xt =
Et
å
edges(i, j )
C i,t-1j
expected
unexpected
16
network
configuratio
n at
t=0
conductance = 4
possible
configuratio
ns at
t=1
conductance = 4
expectedness = 1.5
edge jaccard = 1
Conductan
ce and
expectedn
ess as a toy
network
evolves
d
conductance = 4.5
conductance = 6
expectedness = 1.3333 expectedness = 0.5
edge jaccard = 0.6667 edge jaccard = 0.25
SecondLife: network structure and
content
standard
network
metrics are
not indicative
of information
properties
overlapoverlapD diversity D diversity
t-1,t
t,t+1 t-1, t
t, (t+1)
conductance
and
expectedness
are
Conductance & diversity of
information
 High conductance brings higher
content diversity
 Repeat network patterns bring less
diversity and less novelty
 but… similarity and novelty are
positively correlated (r = 0.19)
Social and transaction
network of top sellers in
SL
Twitter: textual diversity and novelty
 Semantic metrics
Metric Type
Computation Methods
between connected node pairs in
the graph
Contemporary Metrics
(average cosine
similarity of words in
Tweets)
between indirectly-connected node
pairs, i.e., non-neighbors with an
undirected path of length > 1
between them
between isolated pairs (in different
components)
Novelty Metric
(Language Model
distance)
between two sets of tweets
associated with Twitter networks
captured at different times
network structure
Twitter: network structure and
information diversity
# nodes(T)
-0.584 ! ! !
-0.632
0.305 ! ! !
0.030 ! ! !
# edges(T)
-0.537 ! ! !
-0.601
0.348 ! ! !
0.058 ! ! !
0.6
0.4
reciprocity(T)
-0.160 !
-0.179 !
0.176 ! !
0.128 !
clustering coef.(T)
-0.198 ! !
-0.240
0.181 ! !
0.030 ! ! !
centralization(T)
-0.121 !
-0.176
0.158 ! !
0.062 ! !
edge deg cor.(T)
0.027
-0.155 ! !
0.113
0.054 ! !
av. degree(T)
-0.287 ! ! !
-0.353
0.323 ! ! !
0.093 ! ! !
sd. degree(T)
-0.212 ! !
-0.277
0.251 ! ! !
0.048 ! !
WCC size(T)
0.317 ! ! !
0.303
-0.126 ! !
0.038 ! ! !
conductance(T)
-0.444 ! ! !
-0.506
0.369 ! ! !
0.121 ! ! !
expectedness(T)
-0.145 ! !
-0.161 !
0.234 ! ! !
0.092 ! ! !
all-pairs
unconnected indirectly-connected connected
content similarity
0.2
0.0
-0.2
-0.4
-0.6
Inferring Network Semantic
Information
 Question: Does the network structural information help to
improve the prediction performance of the
characteristics of information exchanged?
Semantic
variables
Topological
variables
Kernel
Regression
Prediction
Model
Semantic
variables
Example: Inferring the average
similarity score between isolated pairs
0.8
0.6
0.4
2
R in predicting the
ASS between isolated nodes
1
0.2
Q
c :X ={connected}
1
1
c2:X2={indireclty−connected}
c :X ={# nodes}
3
3
c :X ={# edges}
4
4
0
.
.
s
s
s
y
ted cted ode dge ocit coef ation cor Deg Deg Size ance nes
c
v
d
r
g
t
e ne n
r
d
t
C
e
t
a
c
p
g
n
e
s C u cte
n
n
#
# eci rin cen e d
W o n d pe
r te n
co −co
g
s
o
d
c ex
y
l
u
c
e
t
l
c
ec
r
i
ind
The input variables of curve ci start from Xi
and increase each time by adding the
variable labeled on x-axis.
 Don’t need to use
other textual variables
(e.g. similarity between
indirectly connected
pairs) when sufficient
topological information
available
 Reason: topological
variables account for
much of the pattern in
the text!
Network structure and information
novelty
 Greater novelty in
edges
# nodes(T-1,T)
corresponds to
# edges(T-1,T)
greater novelty in reciprocity(T-1,T)
content shared clustering coef.(T-1,T)
centralization(T-1,T))
 For nodes that are edge deg cor.(T-1,T)
interacting (citing av. degree(T-1,T))
or being cited):
sd. degree(T-1,T))
WCC size(T-1,T))
 Higher
edge jaccard(T-1,T)
conductance
and
conductance(T-1,T)
expectedness
expectedness(T-1)
correlates with less expectedness(T)
information
novelty
0.3
0.124 !
-0.050 ! !
0.171 !
-0.117 ! ! !
0.042
-0.004
0.149 !
-0.197 ! ! !
-0.018
0.038
-0.111 ! !
0.101 !
0.066
-0.044 !
0.083
-0.119 ! !
0.085
-0.101 !
-0.233 ! !
-0.230 ! ! !
0.202 !
-0.225 ! ! !
0.171 ! !
-0.143 ! !
0.093 !
-0.273 ! ! !
0.2
0.1
0.0
-0.1
-0.2
-0.3
LMdist_allNodes(T-1,T)
LMdist_NodesWithNeighbors(T-1,T)
Information in trading networks
 CFTC = Commodity futures trading commission
 stated mission: protect market users and the public from
fraud, manipulation, and abusive practices
 futures contracts started out as contracts for agricultural
products, but expanded to more exotic contracts,
including index futures
Collaboration with Celso Brunetti, Jeff Harris, and Andrei
Kirilenko
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=
25
Data
 6.3 million transactions in Aug. 2008 in the Sept. E-mini S&P
futures contract
 price discovery for the index occurs mostly in this contract
(Hasbrouck (2003))
 data includes: date & time, executing broker, opposite broker,
buy or sell, price, quantity
 sample
in broker
transaction windows of 240 transactions
executing
opposite broker
quantity: 10
price: $171.25
matching algorithm
sell 10 contracts at $171.25
buy 20
30 contracts at $171.50
$171.25
sell 5 contracts at $171.75
buy 30
20 contracts at $171.25
$171.50
sell 20 contracts at $172.00
buy 50 contracts at $171.00
 limit order book
27
not social, not intentional, not
persistent
28
Financial variables
Rate of return:
Last price to first price in logs (close-to-open)
Volatility:
Range – log difference between max and min price
Duration:
start
Total period duration - time in seconds between the
and end of each sampling period
Proxy for arrival of new information
Volume:
Trading volume – number of contracts traded
What can we learn from network
structure?
e.g. centralization?
low in-centralization
high in-centralization
low outdegree
low indegree
high outdegree
high indegree
30
overview of network variables
 # nodes, # edges
 clustering coefficient, LSCC, reciprocity
 CEN = giniin-degree – giniout-degree
 INOUT = r(indegree of node, outdegree of same
node)
 AI (asymmetric information)
31
Correlations between network
and financial variables
High Centralization: market dominance - a dominant trader buys
from many small sellers – low duration, low volume
Correlations between network
and financial variables
Negative assortativity: large sellers sell to small buyers and vice
versa
– low duration, higher volume
Correlations between network
and financial variables
High av. degree & largest strongly connected component:
no news - many buyers and sellers – high duration, high volume
Correlations between network
and financial variables
Rate of return:
positive correlation with centralization
Volatility & duration:
Volume:
correlated with standard deviation of degree, average
deg. and the total number of edges (E).
Correlated with a few network variables, sign varies.
Conclusion
 Network structure alone is revealing of the diversity and
novelty information content being transmitted
 Results depend on the scope and relative position of the
activity in the network
Future work
 Sensitivity to inclusion of non-interactive or across-community
interactions
 Applying novelty & conductance metrics to financial time series
 Continuous formulation of novelty and other network metrics
(because segmentation is problematic)
 Roles of individual nodes
 Thanks:

Edwin Teng
Liuling Gong
Avishay Livne
 Information network academic research center
Questions?
Download