Decentralized Online Social Network Primitives

advertisement
Diffusion in (Social) networks
Rajesh Sharma
http://rajshpec.github.io/
rajesh.sharma@unibo.it
October, 2014
This presentation is based on several works, including some with:
Prof. Danilo Montessi (University of Bologna, Italy), Prof. Matteo Magnani (Uppsala University,
Sweden) Prof. Anwitaman Datta (NTU, Singapore), Prof. Mostafa Salehi (University of Tehran, Iran)
*Some slides’ content from Jure Leskovec ‘s course work.
1
Agenda
• Preliminary
– Overview of Networks
– Diffusion on Networks in Monoplex
• Models, Algorithms etc.
• Algorithm for diffusion in decentralized
settings.
• Diffusion on Networks in Multilayer Networks.
• Models, Algorithms etc.
• Conclusion & Future work.
Networks: collection of objects where some
pairs of objects are connected by links
Protein-protein
Transportation: Metro
ISP: Router etc
Human Diseases
Sexual contact
Food Web
Friendship
Recipe
Co-citation
Network Really Matters
• If you want to understand the structure of the
Web, it is hopeless without working with the
Web’s topology.
• If you want to understand the spread of diseases,
can you do it without social networks?
• If you want to understand dissemination of news
or evolution of science, it is hopeless without
considering the information networks.
Networks & Diffusion
Networks
HumanHuman
Network
Diffusion
Idea, Innovation
Innovation
Goods
Transportation
Network
Comm. Network
Eg: OSN,
Internet, Mobile
Virus
Rumor
Behavior
Affect of Diffusion in ML Networks
Internal Entity
• Diffusion process happening
in a network affecting
internal entities.
• Example:
– Influence (product, behavior
etc)
External Entity
• A diffusion process
happening in a network
affecting external entity
• Example:
– Effect of tweets on stock
prices
Diffusion Dynamics: What can be done?
A) Models:
•
Decision Based Models
– Independent Contagion
Model
– Threshold Model
– Questions:
•
•
•
Finding Influential
Nodes
Detecting cascades
Epidemic Based Models
– SIS: Susceptible-InfectedSusceptible (e.g., Flu)
– SIR : Susceptible Infected
Recover (e.g., chicken
pox)
– Question:
•
Virus will take over the
network?
B) Explanatory/Empirical
Analysis
•
•
Infer the underlying
spreading cascade.
Questions
–
–
How Diffusion look like
Cascades look like ?
C) Algorithms
– Influence
maximization
– Outbreak detection
– etc
Information Dissemination: Algorithm
• Objectives
– Effective
• High precision (low spam) & recall (good coverage)
– Efficient
• Low latency, low duplication
• Challenges : Decentralized settings
– No global list, no explicit subscriptions or coordination
• Intuition
– Use social links in each hop
• Locally available (interest) information
• Less likely to be spammed
• Easier accountability
9
Approach/Algorithm
• Two logically independent mechanisms/phases
– Control phase (runs in the background)
• collect neighbor nodes’ information (interest, degree)
• dissemination behavior (forwarding behavior, activeness)
– Propagation of messages using selective gossip
[4] Anwitaman Datta and Rajesh Sharma, GoDisco: Selective Gossip based Dissemination of
Information in Social Community based Overlays, ICDCN 2011 [ best paper award in
Networking track]
10
Intuitions for designing selective gossip
• Social science principals
– Reciprocity based incentives
– Social triads to reduce duplicates
• Feedback
– Learning & adapting to neighbor interests
• Interest communities
– Naturally clustered
• But there may be isolated islands
11
Information agent (IA) categories
• Interest Classification :
– main Category (MC)
– subcategory (SC)
• Order of preference
– shared main category
– irrelevant but good forwarding history
– irrelevant but well connected (high degree)
12
Approach
• If any Relv Nbrs
– Forward to all relevant nbrs
d
0
p
b
a
h
• Duplication saving : social triad
e
• a & b don’t send each other
• Not for cases like c
• What about non-relv Nbrs
c
i
m
j
• With probability p
l
k
– Send to e (closely related)
n
• α, β, γ can be change
• Feedback mechanism
• Boundary nodes
– αh + βd + γa (h – history, d degree, a-activeness )
– C selects j
– j starts a Random Walk
13
Message Dissemination
14
More on Information Dissemination
• Swarm Particle Approach [2]
•
•
•
Communities: Multi-Dimensional
Network (based on relations)
Particle swarm technique - Mobility
(particles/agent can move),
Orthogonal to GoDisco ( as multi-dim
and mobility).
• GoDisco++ [3]
– Took best out of ICDCN 2011 and
2012 approaches.
– Social sciences plus multi-dimensional
network.
.
[3] Rajesh Sharma and Anwitaman Datta , Decentralized information dissemination in multidimensional semantic social overlays, ICDCN
2012, Hongkong.
[4] Rajesh Sharma and Anwitaman Datta. GoDisco++: A Gossip algorithm for information dissemination in multi-dimensional community
networks. Journal of Pervasive and Mobile Computing, Oct, 2012
15
Multilayer Networks
• Multiplex networks
– Every node is present in
every network.
– multiple types of
Relationships.
• Interconnected networks
– Not every node is present
in every network.
– Multiple networks.
• Model
– Diffusion
Modeling: cascade process
• C1: (v4,l2)
• C2 : (v4,l1)
• Diffusion network:
Aggregation of cascades
C1 and C2
[5] Spreading processes in Multilayer Networks, Mostafa Salehi, Rajesh Sharma, Moreno Marzolla, Danilo Montesi, Payam
Siyari, and Matteo Magnani, under review at IEEE Transactions on Network Sceience & Engg.
4 possibilities of diffusion in ML
• Same-node inter-layer
– Cascade switches layer but remains
on the same node
– Facebook post is shared on Twitter
• Other-node inter-layer
– Cascade continues spreading to
another node in another layer
– The spread of a disease in an
interconnected network of cities
• Other-node intra-layer
– Cascade continues spreading
through the same layer.
– Retweeting a post in Twitter
• Same-node intra-layer
– ??
Dependent variables used in different
diffusion studies
Milgram Experiment. (late 1960s)
• The navigation problem
– Small world community.
• The experiment set up
– One target (Massachusetts)
– Many originators. (Nebraska)
– Acquaintance chains of Letters
• Output
– Six degrees of Separation
• New version (2003) by Dodds et al.
– Multiple source and Targets
– Web based experiment
History of Diffusion (Time Line)
1967
1975
Milgram
Navigation in
small world [1]
Epidemic
model [2]
1978
Granoveter:
Threshold
Model
1993
1998 1999
2014
2001
??
Internet
AIDS impact on Swedish population.
SW: Small
World
2015
SF: Scale
Free
Vesigpinani:
underlying n/w
is important
Wiki, Friendster,
Myspace, FB,
Blogs, Flickr,
Youtube,
smartphones.
Milgram Reloaded!
• Attempt to understand the
navigation process
• Multiple networks (FB, Twitter,
WhatsApp etc)
• Across the Globe
• Multiple originators
Output: Average path length, Network usage
• Multiple targets
(geographically), orig < -- >target impact
• Multi Lingual
T1
T2
T4
O2
O4
T3
T5
O1
O5
O3
T6
Milgram Reloaded!
• What data we will ask*
–
–
–
–
Who are you : Email ID or Phone No
Network: Through what network you received it.
Who sent you: ID of the person
Which networks are you going to use to move the
message towards its destination ?
• Web Link: http://m.web.cs.unibo.it/
• If you have comments or feedback. Please
contact:
– rajesh.sharma@unibo.it or rajshpec@gmail.com
Reasoning about Networks
• How do we reason about networks?
– Empirical: Study network data to find
organizational principles
• How do we measure and quantify networks?
– Mathematical models: Graph theory and statistical
models
• Models allow us to understand behaviors and
distinguish surprising from expected phenomena.
– Algorithms: for analyzing graphs
• Hard computational challenges
Networks: Structure & Process
• What do we study in networks?
– Structure and evolution:
• What is the structure of a network?
• Why and how did it come to have such structure?
– Processes and dynamics:
• Networks provide “skeleton for spreading of
information, behavior, diseases
• How do information and diseases spread?
Networks: Impact
• Companies: Google (382.61B), Cisco
(125.29B), Facebook (207.04B), Twitter
(25.32B), LinkedIn (28.9B)
• Predicting Epidemics : Flu
• Intelligence and fighting (cyber) terrorism:
Find the leaders/hubs of terrorist org/regimes
• Financial Impact: Recession in Europe (who is
lending whom)
Networks: Size Matters
• Network data: Orders of magnitude
– 436-node network of email exchange at a corporate
• research lab [Adamic-Adar, SocNets ‘03]
– 43,553-node network of email exchange at an
• university [Kossinets-Watts, Science ‘06]
– 4.4-million-node network of declared friendships on a
• blogging community [Liben-Nowell et al., PNAS ‘05]
– 240-million-node network of communication on
• Microsoft Messenger [Leskovec-Horvitz, WWW ’08]
– 800-million-node Facebook network [Backstrom et al. ‘1
Group Activity
• Big data : Network (and non network) data
(mostly from web).
– Understand and analysis
• Few Examples:
– Impact of Tweets on :
• Financial patterns.
• Reputation of Companies
– Community patterns in networks: Information
dissemination.
– GPS data : insurance fraud
Thank you !!
Questions?
Rajesh Sharma
University of Bologna
http://rajshpec.github.io/
rajesh.sharma@unibo.it
Research Group: http://sigsna.net/impact/
Download