Group 23 Nikhil Almeida Grégoire Cachet Romain Rigaux FriendYou Technical report The design and development of FriendYou is particularly rich and touch to several domains: Ruby on Rails (Ruby, Active Record, Rake, GeoKit, YM4R, models, schema…), Social networks (RFacebook API, Orkut, Open Social API) and Web (HTML, JavaScript, JSON, templates, CSS, AJAX). FriendYou was an ideal opportunity for getting experience on really advanced and up to date technologies. What was also interesting is that nobody had a real knowledge on these technologies before. Moreover, it was a nice try to use Ruby / Ruby on rails as main language even if it has added many difficulties (need of API supporting this language, learning it, language still not mature…). A lot of new start-ups are using this language and it is now a strong asset in our skill set. FriendYou’s architecture is separated in two main and independent components: Data (e.g. backend): take care of getting the data from the social networks, clean it and store it efficiently for an easy retrieval and computation of ranking of friends. Interface (e.g. front-end): display an interface where queries about location can be entered, display the map and the friends and friends of friends location. Provide more information, intuitive visualization and filtering. CS8803 – GaTech Spring 2008 - FriendYou 1 / 15 Two main parts of the application Here is a more detailed "big picture" of the architecture: More detailed view of the two parts Data Data fetching One API for each type of social network is built (e.g. one for Facebook, one for Open Social). This API will proceed to the retrieval of the friends and get their information from the social network. As we chose to use Ruby for this project we needed some Ruby API capable of accessing the social networks. CS8803 – GaTech Spring 2008 - FriendYou 2 / 15 For each user, the list of his friends and their information (names, status, picture, addresses, networks, hobbies, list of applications used…) are retrieved. Ac cess to the social network from API Facebook: we used RFacebook, a mapping in Ruby of the official Facebook API. Orkut: Orkut is following the Open Social API (v0.7 at this time) but its implementation is not achieved yet RFacebook The website and the gem can be found here: http://rfacebook.rubyforge.org/ RFacebook proved to be priceless for our application. Its documentation was clear, we did not encountered bugs and it is really convenient for accessing Facebook in Ruby. The documentation about Facebook (http://developers.facebook.com/documentation.php) proved also being rich and helpful. The main problem encountered with Facebook is that the friends of friends are not accessible thought the API (online you can access some depending of their privacy settings but with the API you are blocked). The only solution is to save the friends of each user into the database. The defaults of this method is that the database will grow with the time, the friends information needs to be updated and you can access the friend of friends of a friend only if he has already used the application. CS8803 – GaTech Spring 2008 - FriendYou 3 / 15 Orkut Orkut does not have an official REST API yet. It will have one based on open social, but the implementation is in progress. Some of its JavaScript is however available in the Orkut sandbox: It is possible to create application (a mix of XML and JavaScript) and add them to your profile in the Sandbox: However, the JavaScript API is not useful for us as we need a server to server communication. This will be implemented in a near future and follow the RESTful API (http://code.google.com/apis/opensocial/docs/dataapis.html), and so unfortunately we could not use it for the project. The People Data API could have been particularly useful: http://code.google.com/apis/opensocial/docs/gdata/people/reference.html One possible solution would have been to crawl all Orkut and store the friend information of everyone in our database. This solution is not efficient at all and inflexible. The goal of the next step it to clean and harmonize each data coming from different social networks. Data processing For each friend, his information is hashed and compare to its hash in the database if the user is already inside. In case of unmatched, its information is updated accordingly. Each friend of this friend is also saved in a particular table. This step is capital and needs to be fast as thousand of friends are processed. CS8803 – GaTech Spring 2008 - FriendYou 4 / 15 Processing the data coming from the API Example of data that needs to be processed For each social network, the data needs to be cleaned and some decisions happen. For example, the best possible address of each user needs to be detected. If we take the example of Facebook we can imagine 4 order of preferences: Current address Hometown address Network with a region Location in the status message Example of detection and process of the data: addresses and geolocalization CS8803 – GaTech Spring 2008 - FriendYou 5 / 15 In our case, each address is built from the available information (street and number are not available from the API, some parts are missing or are wrong like the zip code…) tried to be converted into longitude and latitude coordinates thanks to Ruby GeoKit (using Google, Yahoo or GeocoderUS below). If the address is valid, it is kept. Data storing Each friend and its information are saved into a specific table. In our project we chose to use MySQL since it is fast, flexible and really mature. It is faster to fill up the database and compute the friend of friend relationship at the loading of the data. The view will have just to retrieve the data with filters, without having to compute the friend of friends during users' requests. Moreover, all the friends for each user can be loaded in memory. It creates many duplicates and requires too much memory. At the beginning, loading 200 friends was taking 5minutes. After optimizations it is now done in about 30 seconds. Optimizations are: Compute hash of the friend and then compare it to the current hash of him if it is already in the database Set up a table of addresses, so that there are no duplicated addresses The list of friends is serialized for a faster retrieval Update the whole friend database once at the beginning of the session and not several times CS8803 – GaTech Spring 2008 - FriendYou 6 / 15 Improved data processing, loop optimization and tests Saving the processed data into the database Data retrieval As the main goal of the project is to search friends of friends, we developed an API for expressing search request. Different criteria are taken into account: If friend of the user If friend of friends of the user Its distance from the user Ranking criteria (as explained in the research part of the report): common friends, networks, application, hobbies… Extracting the information from the database Here is a simple example in Ruby on Rails your first 20 friends in a radius of 25 miles starting from the address loc (which has longitude and latitude coordinates): Address.find(:all, :include => :friends, :origin => loc, :order => 'distance', :within => 25, :limit => 20, :conditions => ["friends.id IN (?)", f.friend_ids]) CS8803 – GaTech Spring 2008 - FriendYou 7 / 15 This part is actually the most complex since its complexity it high. Indeed, imagine that I have 200 friends (which is a regular number) and each of my friend has 100 friend of friends (we remove 100 common friends). 200 * 100 = 20 000 Fig: Example of the exponential multiplication of friends of friends The algorithm needs to select 20/40 friends from 20 000 and needs to be as accurate as possible. The data then needs to be sent to the interface. Here we are converting it to JSON before sending it. "json_class": name of the object, e.g. "Friend" "uid": user id of the friend, e.g. 519893729 "first_name": first name of the friend, e.g. "Bob" "last_name": last name of the friend, e.g. "Bryan" address_name": name of its address, e.g. "Atlanta, GA, USA" "lat": latitude of the address, e.g. 33.7545 "lng": longitude of the address, e.g. -84.3897 "fof_level": if is a friend (0) or a friend of friend (1) "precision": degree of precision of the address, e.g. "city", "country" "picture": address of the picture of the user e.g. "http:\/\/profile.ak.facebook.com\/93729_3943.jpg" "distance": its distance from the search point, "1.00" miles CS8803 – GaTech Spring 2008 - FriendYou 8 / 15 Example of JSON sent: {"json_class":"SearchQuery","friends": [{"address_name":"Atlanta, GA, USA","json_class":"Friend", "lng":-84.3897,"uid":12820237,"fof_level":0,"precision":"city", "picture":"http:\/\/profile.ak.facebook.com\/237_5328.jpg", "first_name":"Kelly","lat":33.7545, "last_name":"Carlson","distance":"0.00"}, {"address_name":"Atlanta, GA, USA","json_class":"Friend", "lng":-84.3897,"uid":519893729,"fof_level":1,"precision":"city", "picture":"http:\/\/profile.ak.facebook.com\/93729_3943.jpg", "first_name":"Bob","lat":33.7545, "last_name":"Bryan","distance":"0.00"}] Ranking It is important to rank friends as the amount of friends of a user is large and the screen space available for visually displaying those friends is comparatively very small. As explained above the algorithm has to display only 20 – 40 friends from a list of around 20000 friends. Hence there is need for an accurate ranking algorithm that will select the best or the top 40 friends who are closest to the user. To find this we have developed a measure called PRscore. The PRscore is the measure for the proximity between two friends. This PRscore is calculated on some rules pre-decided for each attribute. Features such as number of mutual friends or number of scraps, etc can be considered as attributes. Thus different social networks have different number of attributes that go on to the calculation of the PRscore. Each attribute could be given different priorities or importance levels as to how much effect they should have on the final PRscore. Also attributes from different social networks are predefined and classified into mutually compatible with each other. Calculation of weights for individual attributes Number of Common Friends CS8803 – GaTech Spring 2008 - FriendYou 9 / 15 The number of common friends is a measure of the degree of relationship between friends or two people. The PRoximity (PR) score is high if the number of common friends is high. A direct use of the percent of common friends in the PR score computation is not recommended as it will always lead to a low score. Rule: If the number of common friends is more than 15% of the total number of friends (Common Friends Ratio CFR), then the PR score is maximum for the two people. The minimum of the PR score of the two individuals is taken as the common PR score. If the number of common friends to the total number of friends is between 0 to 15 percent then the PR score is adjusted between 0 to 10. I.e. if CFR >= 15 then PR score = 10 else PR score = CFR / 1.5 Eg. If user A has 300 friends and user B has 100 friends and they have 30 friends in common then User A PR score is: PR score = 300 / 30 = 10% / 2 = 5 User B PR score is: PR score = 100 / 30 = 33.3 % = 10 (As 33.3 %> 20 %) Thus the common PR score is 5. (minimum PR score of both users) Similarly the PRscore can be calculated for other attributes such as • Number of Common Groups / Communities they belong to • Number of testimonials (Orkut Specific) • Number of scraps / wall writings • Number of Common Applications (Facebook specific.) • Common Networks Connections (CNR) CS8803 – GaTech Spring 2008 - FriendYou 10 / 15 Interface The interface is similar to a Google or Yahoo! Search, except that it is looking for friend and friend of friends. Getting data The user specifies where he would like to find its friends and their friends of friends location: As explained in "Data Retrieval" section, all the information is received in JSON. The JSON is converted to JavaScript objects Displaying data The JavaScript objects just retrieved previously are processed in: Markers (Google Marker) on the map Item in the list of friend on the right CS8803 – GaTech Spring 2008 - FriendYou 11 / 15 From pure data to visual information As a result, the map is centered and zoomed according to the precision of the request (e.g. street, city, zip, state, country). The markers (one for each friend) are put in an overlay on the map and the list of friends is populated. Navigation The user has then the possibility to navigate on the map (zoom in/out, pan) and click on friends (on the map or in the list). A small pop up display more information about the friends and its relationship with the user. Here is the legend of the map: This part uses a lot of JavaScript in order to provide cool visual effects without having to reload the page. CS8803 – GaTech Spring 2008 - FriendYou 12 / 15 The JavaScript API and the map are exchanging information during each search in order to update the map and the list. We are using XMLHttpRequest calls to query the server. It returns data in JSON as seen above. Listeners on the map and on the list are waiting for events before updating the view. JavaScript/Ruby API in order to render a dynamic interface Here is a view of the friends and friends of friends at Atlanta. CS8803 – GaTech Spring 2008 - FriendYou 13 / 15 The list provides a nice way to visualize the friends at a certain distance. It is sorted by distance and friend relationships. A mouseover event on the picture of a friend highlights where he/she is on the map. A dynamic filtering is available in order to display friend or friend of friends. The map is also updated in live. CS8803 – GaTech Spring 2008 - FriendYou 14 / 15 Here is a full example for Romain in France. Conclusion This project was really intense and dealing with new technologies (Ruby on Rails, Gmaps, Facebook API in JavaScript and in Ruby, Geolocalization, database, Active Record…). It was a mine of new knowledge for us. It research background was also really consistent since the complexity of ranking friends of friends is high. We implemented several optimizations (address table for doing only one geo-lookup by address and adding hash of the content of the users in order to detect quickly if changes had been made). To sump up, it was a concentrated of use of latest Internet techniques, which fit the best for this class of advanced Internet applications. We are ready to apply to Internet startups now. Future versions could integrate a more complex implementation of the friend ranking system, more choices of search (hobbies, age…) and information coming for real from the social networks following the Open Social API (which will be finished to be implemented soon). Moreover, since the application is working and has a useful goal, we plan to release it online and see it taking more importance since the more it has users, the more the friends of friends can be found! CS8803 – GaTech Spring 2008 - FriendYou 15 / 15