PRscore: A measure of proximity between Internet Users Nikhil Almeida, Grégoire Cachet, Romain Rigaux Georgia Institute of Technology, Atlanta GA Abstract:The advent of online social communities took place with the start of Web 2.0. Now most social networks consist of millions of users providing a method of finding a path between two users. With so many users it becomes difficult for a user to keep track of friends if the system does not provide the necessary functionality. It is even more difficult to search for new friends. There is a need for the system to be able to find friends that a user may know or want to know. The concept of PRscore is a measure to calculate the strength of friendship that may exist between two internet users. Using this value many useful applications can be developed Keywords:- PRscore, Friend Recommender, social newtorks Introduction:The project started as our class project for the Advanced Internet Application class for Spring 2008. The main aim of the project was to display friends of a particular location on to the map of that location. The map provided the additional functionality of making it easier for the user to visualize where in the city his friends were. The accuracy of this would depended upon the details allowed by the social network and the details provided by the particular individual. There is a strong possibility that the number of friends and their friends a user is connected to is large. This creates a problem to display the friends on a page. For this it is necessary to rank his friends, so that we can display a subset of the results and the user can move on and check additional results only if needed. So we designed a system that would rank friends on the basis of how close they are to the user. Here the word close refers to a good friend. So the user’s best friends will be the closest to him and will be ranked the highest. We thus calculate the Proximity score (PRscore). The users best friends will have a high proximity score and those with a low proximity score would indicate that they are only acquaintances. The method used to calculate the PRscore is to consider the different features of the social network as attributes where each attributes have their own PRscore. This PRscore is calculated on some rules pre-decided for each attribute. Features such as number of mutual friends or number of scraps, etc can be considered as attributes. Thus different social networks have different number of attributes that go on to the calculation of the PRscore. Each attribute could be given different priorities or importance levels as to how much effect they should have on the final PRscore. Also attributes from different social networks are predefined and classified into mutually compatible with each other. To make the system powerful, we have designed it in a way to be compatible with different social networks. So a user can search for friends that he has in different social networks in one application. Using this the user can rank his friends from different social networks together. Eg. The system will be able to suggest that a friend in one social network has a greater PRscore than another. This is made possible because of the common database that contains the attributes of the different social networks and their compatibility lists. Literature Review: Different social networks already have their own recommender systems available today. But these systems are built on particular features that are specific to their social network. Also those features are hard coded into the system. There is no scope for adding or removing features on the basis of which the recommendation should occur. 1.) Last.fm:This is a social networking website where people can listen to and share music. Friends are people who have similar tastes for music. This feature is specific to last.fm but can be adapted to suit our requirements. Eg. We can use this in other social networks and consider people with similar hobbies, tastes for books, etc as friends or potential friends. Here the recommender searches through the history of the user browsing profile to create an approximate signature for the user. People who have similar profiles will have similar signatures and are candidates for potential friends. 2.) LinkedIn: This social network links every profile to one or more networks belonging to a city, country, institution, etc. The recommender system works by choosing people belonging to either your own network or to some of your friends networks. After this it ranks the friends based on the number of common friends the user has. 3.) Friend Suggestor: In this well known social network this feature is known as ‘Friends You May Know’. This feature is only offered to people who are new to this network. In this a user can suggest to a friend who is new to facebook, a list of people he thinks his friend might know. This is not a recommender system of sorts and only propagtesa list of profiles from one user to another. There was some work performed in this are by Shuchuan Lo and Chingching Linin their paper titled “WMR – A graph based Algorithm for Friend Recommendation”. In this paper they consider a general scenario of social networks as a web forum. They use the amount of interaction between two people as the degree of friendship between them. Two people who have no interaction cannot be friends and if there is only one way communication between two people then it could be the case of harassment or spam. The authors plot a large interaction graph of the communication between the different users. They use a minimization strategy to minimize the size of the graph. From the graph they predict the people who could be a friend of a user. This can be done in cases where you know the friend relation ship between users. The drawback of this approach is that the interaction graph can be very large and complex algorithms will be required to process it. System Architecture: - We have tried to make the system architecture as generalized as possible so as to incorporate a homogeneity between the methods used to calculate the proximity score, so that only social network specific crawler has to be written when a new social network is to be added to the application. Most of the social networks have released their APIs so that one car write applications that use data from them and improve or add value applications. The system is designed such that the application developer has to only write a crawler for any new social network and store the information in the desired format. The application will automatically use that information and calculate the friend proximity score. Having the crawler as a separate module and having a temporary table as a layer between the social network and the ranking algorithm has several advantages. 1.) It provides modularity and allows a user to add more social networks to the application just by writing an appropriate crawler. 2.) Having this layer allows the user to populate the database at runtime i.e. when the user logs in to the application he can then provide the login information of the different social networks he uses. He can rank only those friends from those networks he desires and thus separate his profiles if he desires FACEBOOK Database ORKUT Database RFacebook API FACEBOOK Info Getter LinkedIN Database Open Social API ORKUT Info. Getter LinkedIn Info. Getter Attribute Database and Attribute Rule Database Proximity Calculation Algorithm Application Database Design:This database is in the middle layer of the system architecture. When a user logs in to a system every specific crawler will gather information with respect to the specific attribute and store it in this database. Attribute Table:This table is used to record the different attributes that the system will use for calculating the PRscore. It records the social network the attribute comes from as well as the attributes from other social networks with which it is compatible with. It has the following fields• AttributeID : - This is a ID firld that is the primary key for the table and uniquely identifies an attribute in the table. • Title : - This is the name of the attribute • Descr. : - This is the description of the attribute. • Compatibility Lists: - This is a multi-valued attribute that records the other attributes that are compatible with this one. • Social Network: - This specifies the social network the attribute comes from. Attribute Title ID Description Compatibility Social Lists Network 5,8 Face book 4 Face book This lists the number 1 Common Friends Of common friends Between two friends Lists the number of 2 Common Groups common groups subscribed by the two. 3 Common Network Common 4 Applications Lists the common networks Face book The two users are on Lists the common applications Face book used by the two users This lists the number 5 Mutual Friends Of common friends 1, 8 Orkut 2 Orkut Between two friends Lists the number of 6 Common common Communities groups subscribed by the two. Lists whether the users 7 Testimonials Have written testimonials Orkut To each other. This lists the number 8 Colleagues Of common friends 1, 5 LinkedIn Between two friends PRScore table:This table is populated by the crawler at runtime per user. For every user the crawler fetches his information and gives it to the respective module that computes his PR score for the user. It is then filled in this table so that the system can compute the final PRscore from here. It contains the following attributes- • Attribute ID: - This indicates the attribute in consideration. It acts as the foreign key coming from the attribute table above. • PRscore: - This field contains the PRscore for the respective attribute. • Level: - This field contains one of the three values (High – H, Low – L, Medium –M ) and indicates the level of importance of the attribute when calculating the dynamic weights. Attribute ID PRscore Level 1 7.5 H 3 2.8 M 5 5.6 L 6 3.4 H Calculation of weights for individual attributes1) Number of Common Friends:The number of common friends is a measure of the degree of relationship between friends or two people. The PRoximity (PR) score is high if the number of common friends is high. A direct use of the percent of common friends in the PR score computation is not recommended as it will always lead to a low score. Rule:- If the number of common friends is more than 15% of the total number of friends (Common Friends Ratio CFR), then the PR score is maximum for the two people. The minimum of the PR score of the two individuals is taken as the common PR score. If the number of common friends to the total number of friends is between 0 to 15 percent then the PR score is adjusted between 0 to 10. I.e. if CFR >= 15 then PR score = 10 else PR score = CFR / 1.5 Eg. If user A has 300 friends and user B has 100 friends and they have 30 friends in common then User A PR score is PR score = 300 / 30 = 10% / 2 = 5 User B PR score id PR score = 100 / 30 = 33.3 % = 10 (As 33.3 %> 20 %) Thus the common PR score is 5. (minimum PR score of both users) 2.) Number of Common Groups / Communities they belong to. This is also another measure of the strength of the relations between two users. This is an indication of the common interests they share. Also they could have common friends generated through the communities they are involved in. Rule:- If the number of common communities is more than 60% of the total number of Communities (Common Communities Ratio CCR), then the PR score is maximum for the two people. The minimum of the PR score of the two individuals is taken as the common PR score. If the number of common communities to the total number of communities is between 0 to 60 percent then the PR score is adjusted between 0 to 10. I.e. if CCR >= 60 then PR score = 10 else PR score = CCR / 6 3.) Number of testimonials: (Orkut Specific) The number of testimonials written is a strong indicator of the friendship between two people. A person writes a testimonial only if he knows the person well. This would be a three valued PR score. Riule:- If both people have written testimonials for each other, then the common PR score is 10. If anyone has written a testimonial for the other then the common PR score is 5. It is zero if no one has written a testimonial for each other. This is a part of the collaborative recommendation system that will be used between two people who are already friends. 4.) Number of scraps / wall writings:This is a part of the collaborative recommendation system. It shows the amount of interaction between two users who already are friends. If there is interaction between two users who are not friends then this measure stands against the strength of the relationship as two users have communicated but still have not cared to add each other as a friend. The communication between the users should be two way communication. If it is a one way communication, then there is a high chance that the communication is a form of spam. If this concept is implemented without caution then there is a possibility that genuine communication can be neglected as spam. This happens if one of the users always clears his scrap book or deletes postings from his wall. The solution to this problem is to predict whether this happenings from other utilization parameters of the user profile. 5.) Number of Common Applications:- (Facebook specific.) The number of common applications shared by two users is an indicator of their shared interests. Rule:- If the number of common applications is more than 40% of the total number of Applications (Common Applications Ratio CAR), then the PR score is maximum for the two people. The minimum of the PR score of the two individuals is taken as the common PR score. If the number of common Applications to the total number of Applications is between 0 to 40 percent then the PR score is adjusted between 0 to 10. I.e. if CAR >= 60 then PR score = 10 else PR score = CAR / 6 6.) Common Networks Connections (CNR):The number of networks through which the two users are connected can also be used to calculate a PR score. Albeit this score should always be placed in the Low (L) importance category when generating dynamic weights to use in the final score. A CNR of 50% and up should garner a high score of 10. 7.) Other features:Other features can also be used to indicate the strength of the relationship between users. Orkut allows a user to categorize his friends into different categories from best friends to don’t know the person. Also it allows users to rate as a “fan” or “cool” or “lovable” etc. All these interactions between friends can be used in calculation the rank of friends. All social networks provide users to add their hobbies, favourite dishes, favourite television shows, etc. These can be modeled as attributes. The user can then set the value of this attribute to high and other attributes to low. This will provide a rnaking of friends based on the attribute the user chooses. He can thus find out his friends who like the same TV shows or movies that he likes. Calculating weights dynamically:Every attribute that is selected to calculate the proximity score is grouped in one of the three categories based on the desired effect of that attribute. 1.) Highly important: - (H) These attributes will have a relatively high weight and will affect the proximity score the most. Generally attributes such as mutual friends, degree of interaction, etc are included in this category. 2.) Medium : - (M) These attributes will have a lower weight than the attributes in the above category. Thus they will have a smaller impact on the final result. 3.) Low : - (L) These attributes will have the least weight when we calculate the weighted average. The system initially provides default grouping for the different attributes, but the user has the flexibility to change the groups as desired. The user can specify the difference between the weights of the categories. X represents that category (H) will have a weight at least X % more than the weight assigned for category (M). Similarly Y represents that category (M) will have a weight at least Y % more than the weight assigned for category (L). Let w1, w2 and w3 be the weights for categories H, M and L respectively. Having stated this we can calculate the weight as follows. The addition of all the weights is 1. Hence we get – Equation 1: - ݓ1 ܪ+ ݓ2 ܯ+ ݓ3 = ܮ1 W1 is X% more than w2 Equation 2: - ݓ1 = ሺ ଵା୶ ሻ୵ଶ ଵ ଵ ௪ଵ ݓ2 = ሺ ଵା௫ ሻ Similarly w2 is Y% more than w3 Equation 3: - ଵ ௪ଶ ݓ3 = ሺ ଵା௬ ሻ ଵ ௪ଵ ݓ3 = ሺ ଵା௫ ሻሺ ଵା௬ ሻ Using Equation 2 and Equation 3 we can express Equation 1 and thus find w1 as - ݓ1 = ሺ100 + ݔሻሺ100 + ݕሻ ሺ100 + ݔሻሺ100 + ݕሻ ܪ+ 100 ሺ100 + ݕሻ + 10000 ܮ Thus we can find the weights that are to be assigned to each attribute. Computing the final Proximity Score:The final proximity score is the weighted average of all the attributes taken into consideration. The three weights corresponding to the three categories are calculates as above. Let the attributes in category H be denoted as H1, H2, ….. Similarly M1, M2, … denote the attributes of category M and L1, L2, …. Denote the attributes of category L. The final proximity score is ܲ = ݓ1 ሺܪ1 + ܪ2 + ⋯ ሻ + ݓ2 ሺܯ1 + ܯ2 + ⋯ ሻ + ݓ3 ሺܮ1 + ܮ2 + ⋯ ሻ The higher the proximity score between two individuals the better friends they are. Evaluation Model:There is a need to develop an evaluation system for checking the accuracy of the PRscore. The naive method for this approach is that of a survey. 1.) In this method a user (who has been on a social network for quite some time) is given a list of people from his social networks and he is asked to rank them in order starting with his best friend at the start. 2.) Then the system is asked to calculate the PR score for each of those friends with respect to the said user. 3.) Later the friends are sorted on the basis of the PRscore. In this evaluation model only the first 40% of the friends are taken into consideration. The reason for this is that any user can accurately rank good friends. The remaining friends who are just acquaintances cannot be ranked accurately. This inability on the part of the user can damage the total evaluation results of the system. Hence the remaining 60 % of the friends are left out of the evaluation criteria. The ranking of the top 40% friends and that of the system are compared. The rank is considered accurate if there is difference of 8% of the total number of friends considered for evaluation. This could be better explained with an example. Suppose there a user has 120 friends and he is asked to rank them. We would then consider only the top 40% friends. Here it is 48. Now a deviation of 8% is considered valid. 8% of 48 is 4. So if the rank of the system is +/- 4then it is considered a hit. Else it will be considered a miss. The main evaluation will be done using the metrics of precision. Precision is the number of hits in the system calculated per 100. As per the example above if there were 30 results whose rankings calculated by the system were at a difference of at most + / - 4 from the rankings given by the user then the precision of the system is (30/48)*100 = 62.5 %. Similarly recall is the number of users that were present in both the ranking lists, i.e. one provided by the user and the other calculated by the system. Thus if 40 of the 48 friends provided by the user were also present in the list of the system, then the recall is (40 / 48)* 100 = 83.3% Applications that can be created by using the PR score:The PRscore can be used to create various applications. Some of them are listed below 1.) Ranking for Person search: The PRscore can be calculated between different friends within a social network. Thus those friends can be ranked based on the decreasing order of their PRscore. Various game applications can be created using the concept of finding best friends. 2.) Friends Recommender system. It is very often the case that a user may know a person in real life but may have never come across that persons profile on the web in a social community. Thus a recommender system can be built that recommends a friend to a user. The user can thus find profiles of people he knows without even looking out for them. 3.) Inter network friend finder. The same concept of friend finder can be used to search for friends across social networks. A user may have a friend on one social network and may not have that users profile added as a friend on another social network. He can thus use the tool to search for friends that he has on one network on a different network. It is also the possibility that a user can view the friend of friend on a network. But using the system, a search application can be created that can help a user to search for a friend (on orkut) of a friend (on facebook). Thus using this the entire network of users can grow much more than what it currently is on separate social networks. 4.) Customizable friend search. In the system, a user can change the weights of the attributes by classifying the attributes into three categories namely High, Medium and Low. Thus by adjusting the importance to these attributes the user can create his own customizable profile search. He can thus search for all friends with common hobbies, by adjusting the importance value associated with the hobby attribute. Future Extensions: As we were working on the concept of PRscore, we came across the need to be able to match profiles belonging to a user on different social networks. i.e. If I have two profiles on two different social networks (or the same social network), the system should be able to recognize that the two profiles belong to the same individual. Many interesting applications can be developed if we could provide this functionality. Also applications useful for various security agencies can be developed. A crude method to this is to consider two profiles belonging to a same user if they have a proximity value greater than a certain threshold. For this various additional parameters can also be taken into account. Images can be hashed based upon content and be compared to other images of the other profile. Other image processing techniques can also be used. All these can be used as features to the PRscore calculation algorithm. Conclusions: He have developed an algorithm for calculating the friendship value between two online users. We have provided a framework for integration of user information across various social networks. Thus user information can be accessed in a consistent manner across many networks. Again we have designed the system such that other new social networks can also be used with the current system only by writing one separate module that is specific to the social network being added. References:1.) Shuchuan Lo and Chingching Lin “WMR – A graph based algorithm for Friends Recommendation” Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence 2.) http://marshallk.com/lastfm-anotherrecommendation-algorithm-acquired 3.) Melville, et. al. "Content-boosted collaborative filtering for improved recommendations” In Proceedings of the 18th National Conference on Artificial Intelligence, 2002, pp. 187-192. 4.) Basu, C., Hirsh, H., and Cohen, W., “Recommendation as Classification: Using Social and Content-based Information in Recommendation,” In Proceeding of Recommender System Workshop, 1998, pp. 11-15. 5.) Jorge Aranda et. al. “An Online Social Network-based Recommendation System” 6.) http://www.associatedcontent.com/article/679849/facebook_launches_new_people_you_may.htm