PRscore: A measure of proximity between Internet Users Abstract:-

advertisement
PRscore: A measure of proximity between Internet Users
Nikhil Almeida, Grégoire Cachet, Romain Rigaux
Georgia Institute of Technology, Atlanta GA
Abstract:The advent of online social communities took place with the start of Web 2.0. Now most
social networks consist of millions of users providing a method of finding a path between two
users. With so many users it becomes difficult for a user to keep track of friends if the system
does not provide the necessary functionality. It is even more difficult to search for new friends.
There is a need for the system to be able to find friends that a user may know or want to know.
The concept of PRscore is a measure to calculate the strength of friendship that may exist
between two internet users. Using this value many useful applications can be developed
Keywords:- PRscore, Friend Recommender, social newtorks
Introduction:The project started as our class project for the Advanced Internet Application class for Spring
2008. The main aim of the project was to display friends of a particular location on to the map of that
location. The map provided the additional functionality of making it easier for the user to visualize where
in the city his friends were. The accuracy of this would depended upon the details allowed by the social
network and the details provided by the particular individual.
There is a strong possibility that the number of friends and their friends a user is connected to is
large. This creates a problem to display the friends on a page. For this it is necessary to rank his friends,
so that we can display a subset of the results and the user can move on and check additional results only if
needed. So we designed a system that would rank friends on the basis of how close they are to the user.
Here the word close refers to a good friend. So the user’s best friends will be the closest to him and will
be ranked the highest. We thus calculate the Proximity score (PRscore). The users best friends will have a
high proximity score and those with a low proximity score would indicate that they are only
acquaintances.
The method used to calculate the PRscore is to consider the different features of the social
network as attributes where each attributes have their own PRscore. This PRscore is calculated on some
rules pre-decided for each attribute. Features such as number of mutual friends or number of scraps, etc
can be considered as attributes. Thus different social networks have different number of attributes that go
on to the calculation of the PRscore. Each attribute could be given different priorities or importance levels
as to how much effect they should have on the final PRscore. Also attributes from different social
networks are predefined and classified into mutually compatible with each other.
To make the system powerful, we have designed it in a way to be compatible with different social
networks. So a user can search for friends that he has in different social networks in one application.
Using this the user can rank his friends from different social networks together. Eg. The system will be
able to suggest that a friend in one social network has a greater PRscore than another. This is made
possible because of the common database that contains the attributes of the different social networks and
their compatibility lists.
Literature Review: Different social networks already have their own recommender systems available today. But
these systems are built on particular features that are specific to their social network. Also those features
are hard coded into the system. There is no scope for adding or removing features on the basis of which
the recommendation should occur.
1.) Last.fm:This is a social networking website where people can listen to and share music. Friends
are people who have similar tastes for music. This feature is specific to last.fm but can be adapted
to suit our requirements. Eg. We can use this in other social networks and consider people with
similar hobbies, tastes for books, etc as friends or potential friends. Here the recommender
searches through the history of the user browsing profile to create an approximate signature for
the user. People who have similar profiles will have similar signatures and are candidates for
potential friends.
2.) LinkedIn: This social network links every profile to one or more networks belonging to a city,
country, institution, etc. The recommender system works by choosing people belonging to either
your own network or to some of your friends networks. After this it ranks the friends based on the
number of common friends the user has.
3.) Friend Suggestor: In this well known social network this feature is known as ‘Friends You May Know’.
This feature is only offered to people who are new to this network. In this a user can suggest to a
friend who is new to facebook, a list of people he thinks his friend might know. This is not a
recommender system of sorts and only propagtesa list of profiles from one user to another.
There was some work performed in this are by Shuchuan Lo and Chingching Linin their paper titled
“WMR – A graph based Algorithm for Friend Recommendation”. In this paper they consider a general
scenario of social networks as a web forum. They use the amount of interaction between two people as
the degree of friendship between them. Two people who have no interaction cannot be friends and if there
is only one way communication between two people then it could be the case of harassment or spam. The
authors plot a large interaction graph of the communication between the different users. They use a
minimization strategy to minimize the size of the graph. From the graph they predict the people who
could be a friend of a user. This can be done in cases where you know the friend relation ship between
users. The drawback of this approach is that the interaction graph can be very large and complex
algorithms will be required to process it.
System Architecture: -
We have tried to make the system architecture as generalized as possible so as to incorporate a
homogeneity between the methods used to calculate the proximity score, so that only social network
specific crawler has to be written when a new social network is to be added to the application.
Most of the social networks have released their APIs so that one car write applications that use
data from them and improve or add value applications. The system is designed such that the application
developer has to only write a crawler for any new social network and store the information in the desired
format. The application will automatically use that information and calculate the friend proximity score.
Having the crawler as a separate module and having a temporary table as a layer between the social
network and the ranking algorithm has several advantages.
1.) It provides modularity and allows a user to add more social networks to the application just by
writing an appropriate crawler.
2.) Having this layer allows the user to populate the database at runtime i.e. when the user logs in to
the application he can then provide the login information of the different social networks he uses.
He can rank only those friends from those networks he desires and thus separate his profiles if he
desires
FACEBOOK
Database
ORKUT
Database
RFacebook API
FACEBOOK
Info Getter
LinkedIN
Database
Open Social API
ORKUT
Info. Getter
LinkedIn
Info. Getter
Attribute Database and Attribute Rule Database
Proximity Calculation Algorithm
Application
Database Design:This database is in the middle layer of the system architecture. When a user logs in to a system every
specific crawler will gather information with respect to the specific attribute and store it in this database.
Attribute Table:This table is used to record the different attributes that the system will use for calculating the
PRscore. It records the social network the attribute comes from as well as the attributes from other social
networks with which it is compatible with. It has the following fields•
AttributeID : - This is a ID firld that is the primary key for the table and uniquely identifies an
attribute in the table.
•
Title : - This is the name of the attribute
•
Descr. : - This is the description of the attribute.
•
Compatibility Lists: - This is a multi-valued attribute that records the other attributes that are
compatible with this one.
•
Social Network: - This specifies the social network the attribute comes from.
Attribute
Title
ID
Description
Compatibility
Social
Lists
Network
5,8
Face book
4
Face book
This lists the number
1
Common Friends
Of common friends
Between two friends
Lists the number of
2
Common Groups
common
groups subscribed by the
two.
3
Common Network
Common
4
Applications
Lists the common networks
Face book
The two users are on
Lists the common
applications
Face book
used by the two users
This lists the number
5
Mutual Friends
Of common friends
1, 8
Orkut
2
Orkut
Between two friends
Lists the number of
6
Common
common
Communities
groups subscribed by the
two.
Lists whether the users
7
Testimonials
Have written testimonials
Orkut
To each other.
This lists the number
8
Colleagues
Of common friends
1, 5
LinkedIn
Between two friends
PRScore table:This table is populated by the crawler at runtime per user. For every user the crawler fetches his
information and gives it to the respective module that computes his PR score for the user. It is then filled
in this table so that the system can compute the final PRscore from here. It contains the following
attributes-
•
Attribute ID: - This indicates the attribute in consideration. It acts as the foreign key coming
from the attribute table above.
•
PRscore: - This field contains the PRscore for the respective attribute.
•
Level: - This field contains one of the three values (High – H, Low – L, Medium –M ) and
indicates the level of importance of the attribute when calculating the dynamic weights.
Attribute ID PRscore Level
1
7.5
H
3
2.8
M
5
5.6
L
6
3.4
H
Calculation of weights for individual attributes1) Number of Common Friends:The number of common friends is a measure of the degree of relationship between
friends or two people. The PRoximity (PR) score is high if the number of common
friends is high. A direct use of the percent of common friends in the PR score
computation is not recommended as it will always lead to a low score.
Rule:- If the number of common friends is more than 15% of the total number of
friends (Common Friends Ratio CFR), then the PR score is maximum for the two
people. The minimum of the PR score of the two individuals is taken as the common
PR score. If the number of common friends to the total number of friends is between
0 to 15 percent then the PR score is adjusted between 0 to 10.
I.e.
if
CFR >= 15 then PR score = 10
else
PR score = CFR / 1.5
Eg. If user A has 300 friends and user B has 100 friends and they have 30 friends in
common then
User A PR score is
PR score = 300 / 30 = 10% / 2 = 5
User B PR score id
PR score = 100 / 30 = 33.3 % = 10 (As 33.3 %> 20 %)
Thus the common PR score is 5. (minimum PR score of both users)
2.) Number of Common Groups / Communities they belong to.
This is also another measure of the strength of the relations between two users. This is an
indication of the common interests they share. Also they could have common friends generated through
the communities they are involved in.
Rule:- If the number of common communities is more than 60% of the total number of
Communities (Common Communities Ratio CCR), then the PR score is maximum for the
two people. The minimum of the PR score of the two individuals is taken as the common PR
score. If the number of common communities to the total number of communities is
between 0 to 60 percent then the PR score is adjusted between 0 to 10.
I.e.
if
CCR >= 60 then PR score = 10
else
PR score = CCR / 6
3.) Number of testimonials: (Orkut Specific)
The number of testimonials written is a strong indicator of the friendship between two
people. A person writes a testimonial only if he knows the person well. This would be a three
valued PR score.
Riule:- If both people have written testimonials for each other, then the common PR score is
10. If anyone has written a testimonial for the other then the common PR score is 5. It is
zero if no one has written a testimonial for each other.
This is a part of the collaborative recommendation system that will be used between two people
who are already friends.
4.) Number of scraps / wall writings:This is a part of the collaborative recommendation system. It shows the amount of
interaction between two users who already are friends. If there is interaction between two users
who are not friends then this measure stands against the strength of the relationship as two users
have communicated but still have not cared to add each other as a friend.
The communication between the users should be two way communication. If it is a one
way communication, then there is a high chance that the communication is a form of spam. If this
concept is implemented without caution then there is a possibility that genuine communication
can be neglected as spam. This happens if one of the users always clears his scrap book or deletes
postings from his wall. The solution to this problem is to predict whether this happenings from
other utilization parameters of the user profile.
5.) Number of Common Applications:- (Facebook specific.)
The number of common applications shared by two users is an indicator of their shared
interests.
Rule:- If the number of common applications is more than 40% of the total number
of Applications (Common Applications Ratio CAR), then the PR score is maximum
for the two people. The minimum of the PR score of the two individuals is taken as
the common PR score. If the number of common Applications to the total number of
Applications is between 0 to 40 percent then the PR score is adjusted between 0 to
10.
I.e.
if
CAR >= 60 then PR score = 10
else
PR score = CAR / 6
6.) Common Networks Connections (CNR):The number of networks through which the two users are connected can also be used to
calculate a PR score. Albeit this score should always be placed in the Low (L) importance
category when generating dynamic weights to use in the final score. A CNR of 50% and up
should garner a high score of 10.
7.) Other features:Other features can also be used to indicate the strength of the relationship between users.
Orkut allows a user to categorize his friends into different categories from best friends to don’t
know the person. Also it allows users to rate as a “fan” or “cool” or “lovable” etc. All these
interactions between friends can be used in calculation the rank of friends.
All social networks provide users to add their hobbies, favourite dishes, favourite
television shows, etc. These can be modeled as attributes. The user can then set the value of this
attribute to high and other attributes to low. This will provide a rnaking of friends based on the
attribute the user chooses. He can thus find out his friends who like the same TV shows or movies
that he likes.
Calculating weights dynamically:Every attribute that is selected to calculate the proximity score is grouped in one of the three
categories based on the desired effect of that attribute.
1.) Highly important: - (H) These attributes will have a relatively high weight and will affect the
proximity score the most. Generally attributes such as mutual friends, degree of interaction, etc
are included in this category.
2.) Medium : - (M) These attributes will have a lower weight than the attributes in the above
category. Thus they will have a smaller impact on the final result.
3.) Low : - (L) These attributes will have the least weight when we calculate the weighted average.
The system initially provides default grouping for the different attributes, but the user has the flexibility to
change the groups as desired. The user can specify the difference between the weights of the categories. X
represents that category (H) will have a weight at least X % more than the weight assigned for category
(M). Similarly Y represents that category (M) will have a weight at least Y % more than the weight
assigned for category (L). Let w1, w2 and w3 be the weights for categories H, M and L respectively.
Having stated this we can calculate the weight as follows.
The addition of all the weights is 1. Hence we get –
Equation 1: - ‫ݓ‬1 ‫ ܪ‬+ ‫ݓ‬2 ‫ ܯ‬+ ‫ݓ‬3 ‫ = ܮ‬1
W1 is X% more than w2
Equation 2: -
‫ݓ‬1 =
ሺ ଵ଴଴ା୶ ሻ୵ଶ
ଵ଴଴
ଵ଴଴ ௪ଵ
‫ݓ‬2 = ሺ ଵ଴଴ା௫ ሻ
Similarly w2 is Y% more than w3
Equation 3: -
ଵ଴଴ ௪ଶ
‫ݓ‬3 = ሺ ଵ଴଴ା௬ ሻ
ଵ଴଴଴଴ ௪ଵ
‫ݓ‬3 = ሺ ଵ଴଴ା௫ ሻሺ ଵ଴଴ା௬ ሻ
Using Equation 2 and Equation 3 we can express Equation 1 and thus find w1 as -
‫ݓ‬1 =
ሺ100 + ‫ݔ‬ሻሺ100 + ‫ݕ‬ሻ
ሺ100 + ‫ݔ‬ሻሺ100 + ‫ݕ‬ሻ‫ ܪ‬+ 100 ሺ100 + ‫ݕ‬ሻ + 10000 ‫ܮ‬
Thus we can find the weights that are to be assigned to each attribute.
Computing the final Proximity Score:The final proximity score is the weighted average of all the attributes taken into consideration. The three
weights corresponding to the three categories are calculates as above. Let the attributes in category H be
denoted as H1, H2, ….. Similarly M1, M2, … denote the attributes of category M and L1, L2, …. Denote
the attributes of category L.
The final proximity score is
ܲ = ‫ݓ‬1 ሺ‫ܪ‬1 + ‫ܪ‬2 + ⋯ ሻ + ‫ݓ‬2 ሺ‫ܯ‬1 + ‫ܯ‬2 + ⋯ ሻ + ‫ݓ‬3 ሺ‫ܮ‬1 + ‫ܮ‬2 + ⋯ ሻ
The higher the proximity score between two individuals the better friends they are.
Evaluation Model:There is a need to develop an evaluation system for checking the accuracy of the PRscore. The
naive method for this approach is that of a survey.
1.) In this method a user (who has been on a social network for quite some time) is given a list of
people from his social networks and he is asked to rank them in order starting with his best friend
at the start.
2.) Then the system is asked to calculate the PR score for each of those friends with respect to the
said user.
3.) Later the friends are sorted on the basis of the PRscore.
In this evaluation model only the first 40% of the friends are taken into consideration. The reason for this
is that any user can accurately rank good friends. The remaining friends who are just acquaintances
cannot be ranked accurately. This inability on the part of the user can damage the total evaluation results
of the system. Hence the remaining 60 % of the friends are left out of the evaluation criteria.
The ranking of the top 40% friends and that of the system are compared. The rank is considered
accurate if there is difference of 8% of the total number of friends considered for evaluation. This could
be better explained with an example.
Suppose there a user has 120 friends and he is asked to rank them. We would
then consider only the
top 40% friends. Here it is 48. Now a deviation of 8% is considered valid. 8% of 48 is 4. So if the rank of
the system is +/- 4then it is considered a hit. Else it will be considered a miss.
The main evaluation will be done using the metrics of precision. Precision is the number of hits in
the system calculated per 100. As per the example above if there were 30 results whose rankings
calculated by the system were at a difference of at most + / - 4 from the rankings given by the user then
the precision of the system is (30/48)*100 = 62.5 %.
Similarly recall is the number of users that were present in both the ranking lists, i.e. one provided
by the user and the other calculated by the system. Thus if 40 of the 48 friends provided by the user were
also present in the list of the system, then the recall is (40 / 48)* 100 = 83.3%
Applications that can be created by using the PR score:The PRscore can be used to create various applications. Some of them are listed below
1.) Ranking for Person search:
The PRscore can be calculated between different friends within a social network. Thus
those friends can be ranked based on the decreasing order of their PRscore. Various game
applications can be created using the concept of finding best friends.
2.) Friends Recommender system.
It is very often the case that a user may know a person in real life but may have never
come across that persons profile on the web in a social community. Thus a recommender
system can be built that recommends a friend to a user. The user can thus find profiles of
people he knows without even looking out for them.
3.) Inter network friend finder.
The same concept of friend finder can be used to search for friends across social
networks. A user may have a friend on one social network and may not have that users
profile added as a friend on another social network. He can thus use the tool to search for
friends that he has on one network on a different network.
It is also the possibility that a user can view the friend of friend on a network. But
using the system, a search application can be created that can help a user to search for a
friend (on orkut) of a friend (on facebook). Thus using this the entire network of users
can grow much more than what it currently is on separate social networks.
4.) Customizable friend search.
In the system, a user can change the weights of the attributes by classifying the attributes
into three categories namely High, Medium and Low. Thus by adjusting the importance
to these attributes the user can create his own customizable profile search. He can thus
search for all friends with common hobbies, by adjusting the importance value associated
with the hobby attribute.
Future Extensions: As we were working on the concept of PRscore, we came across the need to be able to match
profiles belonging to a user on different social networks. i.e. If I have two profiles on two different social
networks (or the same social network), the system should be able to recognize that the two profiles belong
to the same individual.
Many interesting applications can be developed if we could provide this functionality. Also
applications useful for various security agencies can be developed.
A crude method to this is to consider two profiles belonging to a same user if they have a
proximity value greater than a certain threshold. For this various additional parameters can also be taken
into account. Images can be hashed based upon content and be compared to other images of the other
profile. Other image processing techniques can also be used. All these can be used as features to the
PRscore calculation algorithm.
Conclusions: He have developed an algorithm for calculating the friendship value between two online users.
We have provided a framework for integration of user information across various social networks. Thus
user information can be accessed in a consistent manner across many networks. Again we have designed
the system such that other new social networks can also be used with the current system only by writing
one separate module that is specific to the social network being added.
References:1.) Shuchuan Lo and Chingching Lin “WMR – A graph based algorithm for Friends
Recommendation” Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web
Intelligence
2.) http://marshallk.com/lastfm-anotherrecommendation-algorithm-acquired
3.) Melville, et. al. "Content-boosted collaborative filtering for improved recommendations” In
Proceedings of the 18th National Conference on Artificial Intelligence, 2002, pp. 187-192.
4.) Basu, C., Hirsh, H., and Cohen, W., “Recommendation as Classification: Using Social and
Content-based Information in Recommendation,” In Proceeding of Recommender System
Workshop, 1998, pp. 11-15.
5.) Jorge Aranda et. al. “An Online Social Network-based Recommendation System”
6.) http://www.associatedcontent.com/article/679849/facebook_launches_new_people_you_may.htm
Download