ppt

advertisement
Modeling Relationship Strength
in Online Social Networks
Rongjian Xiang1, Jennifer Neville1, Monica Rogati2
1Purdue
University, 2LinkedIn
WWW 2010
2010. 08. 13.
Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University
Introduction – Social Network
 Homophily (동질성)

the tendency of individuals to associate and bond with similar
others

“Birds of a feather flock together”

Found in many real-world and online social networks
 Research Area

Network Structure Analysis

Link prediction – “Who will be my friend?”

Community Detection

Item Recommendation
Copyright  2010 by CEBT
2
Introduction
 Past work has focused on social networks with binary ties.

e.g., friends or not
 Binary indicators provide only a coarse indication of the
relationship.

Pairs of individuals with strong ties (e.g., close friends) are likely
to exhibit greater similarity then those with weak ties (e.g.,
acquaintances)

Treating all relationships as equal will increase the noise and
degrade the performance
 Pruning away spurious relationships and highlighting stronger
relationship has improved the accuracy of the models.
Copyright  2010 by CEBT
3
Related Works
 I. Kahanda and J. Neviile. Using transactional information to
predict link strength in online social networks, ICWSM09
 E. Gilbert and K. Karahalios. Predicting tie strength with social
media. CHI 09

Binary prediction task
–

Strong ties or Weak ties
Supervised learning
–
Involved in efforts on human annotations

Friendship Rating

Top friend nomination
Copyright  2010 by CEBT
4
Goal
 A model to infer relationship strength

Based on profile similarity and interaction activity

Automatically distinguishing strong relationships from weak ones
–

Relationship strength is represented as continuous value
–

Unsupervised
Full spectrum of relation strength, from weak to strong
Scalable approach
–
Suitable for online application
Copyright  2010 by CEBT
5
Assumptions of the Model
 The higher the similarity, the stronger the tie

There is many common feature between ‘용진’ and me, so we
have strong relationship.
 Relationship strength directly impacts the nature and
frequency of online interactions between a pair users

‘청림’ is close with me if he has many chat with me in messenger.
 The independence of interactions
Copyright  2010 by CEBT
6
Variables of the Model
 Profile: the data of specific user

𝐱 (𝑖) : profile vector of individual 𝑖

e.g., school, company, region, industry, job of the user
 Interaction: the activity between two users
(𝑖𝑗)
: occurrences of the interaction between 𝑖 and 𝑗

𝑦𝑡

e.g., reply, retweet (in twitter)

e.g., tagging the person in a picture, posting one’s wall (in
facebook)
 Relationship Strength

𝑧 (𝑖𝑗) : latent relationship strength
Copyright  2010 by CEBT
7
Graphical model representation
𝐱
(𝑖)
𝐱
𝑧
(𝑖𝑗)
 𝑃 𝑧
𝑖𝑗
,𝐲
(𝑗)
(𝑖𝑗)
(𝑖𝑗)
𝑦1
(𝑖𝑗)
𝑦2
𝑖𝑗
x (𝑖) : profile vector
(𝑖𝑗)
𝑦𝑡 : occurrences of the interaction
𝑧 (𝑖𝑗) : latent relationship strength
𝑦𝑚
𝐱 (𝑖) , 𝐱 (𝑗) ) = 𝑃 𝑧
𝑖𝑗
𝐱 (𝑖) , 𝐱 (𝑗) )
Copyright  2010 by CEBT
(𝑖𝑗)
𝑚
𝑖𝑗
𝑡=1 𝑃( 𝑦𝑡 |𝑧
)
8
Model Specification
 Inferring relationship strength from user profile
𝐱
(𝑖)
𝐱
Using similarity vector

–
𝑧
𝐬 = [𝑠1 , … , 𝑠𝑛 ]

e.g., 𝑠𝑘 : 1 if 𝑖 and 𝑗 in the same company, 0 otherwise

e.g., 𝑠𝑙 : logarithm of the normalized counts of common groups that 𝑖 and
𝑗 join
Adopting the Gaussian distribution

(𝑖𝑗)
𝑃 𝑧
𝑖𝑗
To be estimated
𝐱 (𝑖) , 𝐱 (𝑗) ) = 𝑁(𝐰 𝑇 𝐬 𝐱 𝑖 , 𝐱𝑗 , 𝑣)
Weighted sum of
similarity measures
p
Blue represents similar two users
Red represents unsimilar two users
0
z
Copyright  2010 by CEBT
9
(𝑗)
Model Specification
𝑧
 Inferring relationship strength from interactions



Modeling all interactions as binary variables
Introducing an auxiliary(보조) variables
(𝑖𝑗)
𝐚𝑡
(𝑖𝑗)
𝑦1
(𝑖𝑗)
(𝑖𝑗)
𝑦2
–
Capturing auxiliary causes of the interactions which are independent
of the relationship strength
–
e.g., the total number of pictures that a user has tagged represents
their intrinsic tendency to tag pictures
𝑦𝑚
1
Using sigmoid function 𝜎 𝑥 =
1 + 𝑒 −𝑥
𝑃 𝑦𝑡

(𝑖𝑗)
𝑖𝑗
=1 𝑧
𝑖𝑗
1
(𝑖𝑗)
, 𝐚𝑡 ) =
1+𝑒
𝑖𝑗
𝑖𝑗
−(𝜃𝑡1 𝑎𝒕1 + …+ 𝜃𝑡𝑙 𝑎
+𝜃𝑡𝑙+1 𝑧 𝑖𝑗 +𝑏)
𝑡𝑙
Weighted sum of
auxiliary variables and z
𝛉 is to be estimated
Copyright  2010 by CEBT
10
Model Specification
𝐱
𝐬
(𝑖)
(𝑖𝑗)
𝐱
(𝑗)
(𝑖𝑗)
(𝑖𝑗)
𝑧
(𝑖𝑗)
(𝑖𝑗)
𝑦2
(𝑖𝑗)
(𝑖𝑗)
𝐚2
(𝑖𝑗)
𝑦1
(𝑖𝑗)
(𝑖𝑗)
𝑦𝑚
(𝑖𝑗)
𝐚1
(𝑖𝑗)
𝐚𝑚
𝑃 𝐷, 𝑤, 𝜃 = 𝑃 𝐷 𝑤, 𝜃 𝑃 𝑤)𝑃(𝜃
𝑚
∝
𝑃 𝑧
(𝑖,𝑗)∈𝐷
𝑖𝑗
𝐱 𝑖 ,𝐱
𝑗
)
𝑃( 𝑦𝑡
𝑖𝑗
(𝑖𝑗)
|𝐚1 , 𝑧
𝑖𝑗
) 𝑃 𝑤 𝑃(𝜃)
𝑡=1
Copyright  2010 by CEBT
11
Inference
 Find the point estimates 𝑤, 𝜃,𝑧 that maximize ℒ = 𝑃 𝐷, 𝑤, 𝜃

Using gradient method

Using Newton-Raphson updates to weight updates
–
𝑥𝑛+1 = 𝑥𝑛 −
𝑓(𝑥𝑛 )
𝑓′(𝑥𝑛)
Copyright  2010 by CEBT
12
Experiment
 Two dataset is prepared for experiments


LinkedIn
–
Business-Oriented Social Network
–
Members can search member profiles and job postings
Facebook Data
Copyright  2010 by CEBT
13
LinkedIn Dataset
 100 seed users and their tow-hop neighborhood (100000 pairs)


𝑖𝑗
(𝑖𝑗)
Overall similarity 𝑠 (𝑖𝑗) = [𝑠1 , … . , 𝑠8 ]𝑇
𝒔𝟏
1 if 𝑖 and 𝑗 went to same school, 0 otherwise
𝒔𝟐
1 if 𝑖 and 𝑗 work in the same company, 0 otherwise
𝒔𝟑
1 if 𝑖 and 𝑗 are in the same geographical region, 0 otherwise
𝒔𝟒
1 if 𝑖 and 𝑗 are in the same industry, 0 otherwise
𝒔𝟓
1 if 𝑖 and 𝑗 have the same job title, 0 otherwise
𝒔𝟔
1 if 𝑖 and 𝑗 are in the same functional area, 0 otherwise
𝒔𝟕
Logarithm for the normalized counts of common groups that 𝑖 and 𝑗 join
𝒔𝟖
Logarithm for the normalized counts of common connections that 𝑖 and 𝑗 join
Interaction features
𝒔𝟏
1 if 𝑖 and 𝑗 have established a connection, 0 otherwise
𝒔𝟐
1 if 𝑖 has written a recommendation for 𝑗, 0 otherwise
𝒔𝟑
1 if 𝑖 has viewed 𝑗 ‘s profile, 0 otherwise
𝒔𝟒
1 if 𝑖 has included 𝑗 in his or her online LinkedIn address book, 0 otherwise
Copyright  2010 by CEBT
14
Evaluation (in LinkedIn Dataset)



Estimating relationship strength with

Job

Functional area

Geographical region
Measuring how well the estimated relationship strengths

Identifying feature values ( same school, same company, same industry)

Measuring the are under the ROC curve (AUC)
Comparing relationship strength to

Recommendation links

Profile view links

Address book links

Connection links

Interaction count

Profile similarity
Copyright  2010 by CEBT
15
Receiver Operating Characteristic (ROC)
 TPR (sensitivity)

eqv. with hit rate, recall

TPR = TP / P = TP / (TP + FN)
 FPR

eqv. with fall-out

FPR = FP / N = FP / (FP + TN)
 AUC (Area Under ROC Curve)
Copyright  2010 by CEBT
16
The result on LinkedIn dataset
Copyright  2010 by CEBT
17
Facebook dataset
 5 public Purdue Facebook user and their three-hop
neighborhood

4500 nodes and 144,712 pairs
𝑖𝑗
(𝑖𝑗)
(𝑖𝑗)
 Overall similarity 𝑠 (𝑖𝑗) = [𝑠1 , 𝑠2 , 𝑠3 ]𝑇

Not using personal profile data
𝒔𝟏
logarithm of the normalized counts of common networks for which 𝑖 and 𝑗 are both member
𝒔𝟐
logarithm of the normalized counts of common group that 𝑖 and 𝑗 join
𝒔𝟑
logarithm of the normalized counts of common friends that 𝑖 and 𝑗 share
 Interactions
𝒔𝟏
1 if 𝑖 has posted on 𝑗‘s wall, 0 otherwise
𝒔𝟐
1 if 𝑖 has tagged 𝑗 in a picture, 0 otherwise
Copyright  2010 by CEBT
18
Evaluation (in Facebook Dataset)
 Comparing the relationship strength of the model to other
weighted graph

Friendship graph: strong/weak relationships

Top-Friend graph: strong relationships

Wall graph: interactions

Picture graph: interactions
 Evaluating

Autocorrelation improvement

Classification improvement
Copyright  2010 by CEBT
19
Evaluation (in Facebook Dataset)


Autocorrelation

Statistical dependency of the same attribute on related instances

𝜒2 =
𝑖∈𝐾
𝑗∈𝐾
𝑂𝑖𝑗 −𝐸𝑖𝑗
𝐸𝑖𝑗
–
K is the number of possible categorical value of the attribute
–
𝑂𝑖𝑗 is the observed occurrence
–
𝐸𝑖𝑗 is the expected occurrence
–
If the observed occurrence is increasing, then the autocorrelation is also
increasing
–
e.g., Geographical region attribute has higher autocorrelation than favorite
baseball team attribute in friendship network
Classification performance

The Gaussian Random field (GRF) model is used to classification
Copyright  2010 by CEBT
20
Autocorrelation improvement
Copyright  2010 by CEBT
21
Classification improvement
Copyright  2010 by CEBT
22
Conclusions
 A latent variable model for the task of relationship strength
estimation

Latent variable model capture the causality of the underlying
social process

Hybrid approach of generative model and discriminative model
–
Not suffering from sparsity of interaction
–
The latent variable is inferred using only upper level in model
–
Predicting future interactions is also possible

Predicting new connections
 Experiments show estimated relationship strength gives higher
autocorrelation and better classification performance
Copyright  2010 by CEBT
23
Discussions
 General model to estimate relationship strength

Easy to apply specific domain knowledge
–
Just define similarity of two users and interaction distributions
 But, Experiment is something weird

No comparison to other state-of-the-art techniques
–

There is only comparison to raw data
Similarity function is too simple
–
Considering the recent techniques
Copyright  2010 by CEBT
24
Thank you
25
Download