phd slides - Kazem Jahanbakhsh

advertisement
Contact Prediction,
Routing and Fast
Information
Spreading in Social
Networks
Kazem Jahanbakhsh
Computer Science Department
University of Victoria
August 2012
1
Outline
•
•
•
•
•
•
•
•
Problem Definition and the Context
Routing in Mobile Social Settings
Human Mobility and Contact Event
Collecting Contact Data
Contact Prediction
Hidden Contact Prediction
Fast Information Spreading
Conclusions, Major Contributions and Future Work
2
Problem Definition
Message routing, human contact prediction and
fast information spreading in the context of human
social networks.
3
Routing in Mobile Social
Settings
• Motivation: First empirical evaluation of Milgram's
experiment in mobile settings
• Social Profile: Set of social characteristics for a user:
o Affiliation, Hometown, Language, Nationality, Interests and so on
• Goal: Designing an efficient routing algorithm
• Efficiency: Minimizing message forwardings &
Maximizing the probability of message delivery
• Assumptions & Constraints:
•
•
Message delivery in physical proximity
Sender knows the destination social profile
4
Social-Greedy Routing
Algorithms
• Approach: a greedy strategy by computing
similarities between people social profiles
o Social-Greedy I: Sender forwards the message “m” to nodes socially
closer to destination.
o Social-Greedy II & III: Variations of Social-Greedy I.
• Our work is different from previous work because we
only make use of social profiles of people for
routing!
• Real Data: Infocom 2006 contact trace - 79 people
- a brief version of social profiles
5
Total Delivery Cost
Successful Delivery Ratio (%)
SDR & Cost
70
80
60
70
Epidemic
Bubble Rap
Social-Greedy I
LABEL
50
60
50
40
40
30
30
20
20
Epidemic
Bubble Rap
Social-Greedy I
LABEL
Waiting
10
10
00
00
22
44
66
TTL(hour)
(hour)
TTL
8
10
Performance Results for Different Routing Schemes (TTL=9h)
6
Human Mobility &
Contact Data
Kenny
Eric
Eric
10:00AM
10:10AM
Eric
Contact Event: 10:00-10:10 AM
Kenny 10:00AM
Kenny
10:10AM
7
Contact Graphs
Kyle
Eric
Butters
Katy
Jack
Kenny
Sara
8
Collecting Data from Different
Social Settings
9
Real Data Descriptions
Dataset
Inf 05
Inf 06
MIT
Camb
Roller
Sensors
41
79
97
36
62
Length
3 days
4 days
246 days
11 days
3 hours
Scanning Time
120 sec
120 sec
300 sec
600 sec
15 sec
Ext. Nodes
206
4321
20698
11367
1050
Total Cont.
227657
28216
285512
41587
132511
Ext. Cont.
57056
5757
183135
30714
72365
Ext. Cont. %
25%
20%
64%
74%
55%
Dataset
No. of Nodes
No. of Edges
Facebook
63731
817090
10
Contact Prediction: Problem
Definition and Assumptions
11
Social Information & SmallWorld Network Properties
• Birds of a Feather (Homophily)
• Using Social Profiles:
o Jacard Social Similarity (Jac)
o Social Foci Similarity (Foci)
o Max Social Similarity (Max)
• Using Contact Graphs:
o Transitivity:
• Number of Common Neighbors (NCN)
o Low Diameter :
• Shortest Path (SP)
• Random Walk (RW)
• How to reconstruct?
12
Contact Prediction Results
Performance of Contact Graph Structure
Performance of Social Similarities
1
1
NCN
0.9
0.9
SP
0.3
RW
0.8
0.8
Correct prediction %
Correct prediction %
0.7
0.7
0.6
0.6
Random
0.2
0.1
0
0
0.02
0.04
0.06
0.08
0.1
0.12
0.5
0.5
0.4
0.4
0.3
0.3
decreasing social similarity
0.2
0.2
Jacard
Focus
0.1
0.1
MAX
Random
00
00
0.1
0.1
0.2
0.2
0.3
0.3
0.4
0.4
0.5
0.5
0.6
0.6
0.7
0.7
0.8
0.8
0.9
0.9
11
Population%%
Population
Infocom 2006
13
Hidden Contact
Prediction
14
Hidden Contact Prediction:
Reconstruction Algorithm
• Methods:
o
o
o
o
Time-Spatial Locality: NCN, Jacard & MIN
Contact Rates: Popularity
Social Similarity: Foci & Jacard
Social Similarity-NCN: Foci-NCN
• Algorithm:
• For each L k compute and store quadruples
(u,v,k,sim(u,v)) in Lsim
• Sort Lsim in a descending order using similarity scores
• Output the first Rank number of quadruples
15
Hidden Contact
Prediction Results
Infocom 2006
16
Supervised Learning Approach
• Techniques:
o Logistic Regression
o K-Nearest Neighbor (KNN)
• Extracted Features:
o
o
o
o
Contact Graph-based (Degree, Product of degrees, NCN)
Contact Duration
Social Profiles
Static Sensors
Prediction Results (Logistic Regression/KNN)
Session Type
Keynote
Lunch Break
Coffee Break
TPR
0.18/0.24
0.37/0.40
0.41/0.43
FPR
0.03/0.08
0.04/0.07
0.02/0.02
Accuracy
81%/78%
84%/81%
92%/92%
RMSE
0.42/0.40
0.39/0.36
0.26/0.24
17
Fast Information Spreading in
Social Networks
• Input: social graph G=(V,E) & a unique message for
each node
• Communication Model: synchronized
• Constraints: no global information & one contact
per round
• Termination: when every node receives all
messages
• Goal: analyzing running times of three information
spreading algorithms
18
Information Spreading
Algorithms
• Random push-pull:
o In each round, every node randomly chooses one of its neighbors for
message exchange
• Doerr:
o In each round, every node randomly chooses one of its neighbors except
the one that has been just contacted
• Censor: Hybrid strategy:
o Even rounds: each node runs random push-pull
o Odd rounds: each node chooses one of its neighbors in a sequential
manner from its Bottleneck List
19
Empirical Results from
Facebook Graph
4
6
Number of Finished Nodes
5
Number of Finished Nodes
TotalRunning
Running Time
Total
Time
4
x 10
x 10
7
Random Push−Pull
Doerr
6
Censor
well connected core
5
4
4
3
3
v
w
2
2
1
1
Random Push−Pull
1−whisker
Doerr
u
x
0
0
0
0
Cx
20
5
10
40
15
60
20 Round #25
Cu
Censor
80
30
100
35
120
40
45
Round #
Running
Times
Without
1-whiskers
Running
Times on
Original
Facebook
Graph
20
Conclusions & Future
Work
• Major Contributions:
• Social-Greedy Algorithm:
o Suitable for bootstrapping wireless devices
• Contact Prediction:
o Social Similarity methods, SP and RW outperform random
o Foci-NCN provides the best precision results
o Supervised learning is an effective technique for contact prediction
• Information spreading:
o Censor performs well for spreading information in social networks
• Future Work:
o Proposing more efficient predictors for large geographical spaces
o Final Goal: Predicting where people go next and who they will meet there!
21
Hidden Contacts
Prediction Results
Performance Evaluation (no of external nodes = 73)
0.5
NCN
Jac
Min
Pop
Rand
The Percentage of True Positives
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
3
4
5
6
7
8
log2 Rank
MIT Campus
9
10
11
12
22
Supervised Learning Results
Session Type
Keynote
Lunch Break Coffee
Break
degree
4
5
5
degree
7
7
7
degree prod.
3
3
6
ncn
1
1
2
total overlap
2
2
1
social
5
6
4
ncsn
6
4
3
Ranking Features
23
Examples of 1-Whiskers
24
Download