FriendYou Technical report

advertisement
Group 23
Nikhil Almeida
Grégoire Cachet
Romain Rigaux
FriendYou
Technical report
The design and development of FriendYou is particularly rich and touch to
several domains: Ruby on Rails (Ruby, Active Record, Rake, GeoKit, YM4R,
models, schema…), Social networks (RFacebook API, Orkut, Open Social API)
and Web (HTML, JavaScript, JSON, templates, CSS, AJAX).
FriendYou was an ideal opportunity for getting experience on really advanced
and up to date technologies. What was also interesting is that nobody had a
real knowledge on these technologies before.
Moreover, it was a nice try to use Ruby / Ruby on rails as main language
even if it has added many difficulties (need of API supporting this language,
learning it, language still not mature…). A lot of new start-ups are using this
language and it is now a strong asset in our skill set.
FriendYou’s architecture is separated in two main and independent
components:
 Data (e.g. backend): take care of getting the data from the social
networks, clean it and store it efficiently for an easy retrieval and
computation of ranking of friends.

Interface (e.g. front-end): display an interface where queries about
location can be entered, display the map and the friends and friends of
friends location. Provide more information, intuitive visualization and
filtering.
CS8803 – GaTech Spring 2008 - FriendYou
1 / 15
Two main parts of the application
Here is a more detailed "big picture" of the architecture:
More detailed view of the two parts
Data
Data fetching
One API for each type of social network is built (e.g. one for Facebook, one
for Open Social). This API will proceed to the retrieval of the friends and get
their information from the social network. As we chose to use Ruby for this
project we needed some Ruby API capable of accessing the social networks.
CS8803 – GaTech Spring 2008 - FriendYou
2 / 15
For each user, the list of his friends and their information (names, status,
picture, addresses, networks, hobbies, list of applications used…) are
retrieved.
Ac
cess to the social network from API


Facebook: we used RFacebook, a mapping in Ruby of the official
Facebook API.
Orkut: Orkut is following the Open Social API (v0.7 at this time) but its
implementation is not achieved yet
RFacebook
The website and the gem can be found here: http://rfacebook.rubyforge.org/
RFacebook proved to be priceless for our application. Its documentation was
clear, we did not encountered bugs and it is really convenient for accessing
Facebook in Ruby.
The documentation about Facebook
(http://developers.facebook.com/documentation.php) proved also being rich
and helpful.
The main problem encountered with Facebook is that the friends of friends
are not accessible thought the API (online you can access some depending of
their privacy settings but with the API you are blocked). The only solution is
to save the friends of each user into the database. The defaults of this
method is that the database will grow with the time, the friends information
needs to be updated and you can access the friend of friends of a friend only
if he has already used the application.
CS8803 – GaTech Spring 2008 - FriendYou
3 / 15
Orkut
Orkut does not have an official REST API yet. It will have one based on open
social, but the implementation is in progress.
Some of its JavaScript is however available in the Orkut sandbox:
It is possible to create application (a mix of XML and JavaScript) and add
them to your profile in the Sandbox:
However, the JavaScript API is not useful for us as we need a server to
server communication. This will be implemented in a near future and follow
the RESTful API
(http://code.google.com/apis/opensocial/docs/dataapis.html), and so
unfortunately we could not use it for the project.
The People Data API could have been particularly useful:
http://code.google.com/apis/opensocial/docs/gdata/people/reference.html
One possible solution would have been to crawl all Orkut and store the friend
information of everyone in our database. This solution is not efficient at all
and inflexible.
The goal of the next step it to clean and harmonize each data coming from
different social networks.
Data processing
For each friend, his information is hashed and compare to its hash in the
database if the user is already inside. In case of unmatched, its information
is updated accordingly. Each friend of this friend is also saved in a particular
table. This step is capital and needs to be fast as thousand of friends are
processed.
CS8803 – GaTech Spring 2008 - FriendYou
4 / 15
Processing the data coming from the API
Example of data that needs to be processed
For each social network, the data needs to be cleaned and some decisions
happen. For example, the best possible address of each user needs to be
detected. If we take the example of Facebook we can imagine 4 order of
preferences:




Current address
Hometown address
Network with a region
Location in the status message
Example of detection and process of the data: addresses and geolocalization
CS8803 – GaTech Spring 2008 - FriendYou
5 / 15
In our case, each address is built from the available information (street and
number are not available from the API, some parts are missing or are wrong
like the zip code…) tried to be converted into longitude and latitude
coordinates thanks to Ruby GeoKit (using Google, Yahoo or GeocoderUS
below). If the address is valid, it is kept.
Data storing
Each friend and its information are saved into a specific table. In our project
we chose to use MySQL since it is fast, flexible and really mature.
It is faster to fill up the database and compute the friend of friend
relationship at the loading of the data. The view will have just to retrieve the
data with filters, without having to compute the friend of friends during
users' requests.
Moreover, all the friends for each user can be loaded in memory. It creates
many duplicates and requires too much memory.
At the beginning, loading 200 friends was taking 5minutes. After
optimizations it is now done in about 30 seconds.
Optimizations are:
 Compute hash of the friend and then compare it to the current hash of
him if it is already in the database
 Set up a table of addresses, so that there are no duplicated addresses
 The list of friends is serialized for a faster retrieval
 Update the whole friend database once at the beginning of the session
and not several times
CS8803 – GaTech Spring 2008 - FriendYou
6 / 15

Improved data processing, loop optimization and tests
Saving the processed data into the database
Data retrieval
As the main goal of the project is to search friends of friends, we developed
an API for expressing search request. Different criteria are taken into
account:




If friend of the user
If friend of friends of the user
Its distance from the user
Ranking criteria (as explained in the research part of the report):
common friends, networks, application, hobbies…
Extracting the information from the database
Here is a simple example in Ruby on Rails your first 20 friends in a radius of
25 miles starting from the address loc (which has longitude and latitude
coordinates):
Address.find(:all, :include => :friends, :origin => loc,
:order => 'distance', :within => 25, :limit => 20,
:conditions => ["friends.id IN (?)", f.friend_ids])
CS8803 – GaTech Spring 2008 - FriendYou
7 / 15
This part is actually the most complex since its complexity it high. Indeed,
imagine that I have 200 friends (which is a regular number) and each of my
friend has 100 friend of friends (we remove 100 common friends).
200 * 100 = 20 000
Fig: Example of the exponential multiplication of friends of friends
The algorithm needs to select 20/40 friends from 20 000 and needs to be as
accurate as possible.
The data then needs to be sent to the interface. Here we are converting it to
JSON before sending it.












"json_class": name of the object, e.g. "Friend"
"uid": user id of the friend, e.g. 519893729
"first_name": first name of the friend, e.g. "Bob"
"last_name": last name of the friend, e.g. "Bryan"
address_name": name of its address, e.g. "Atlanta, GA, USA"
"lat": latitude of the address, e.g. 33.7545
"lng": longitude of the address, e.g. -84.3897
"fof_level": if is a friend (0) or a friend of friend (1)
"precision": degree of precision of the address, e.g. "city", "country"
"picture": address of the picture of the user e.g.
"http:\/\/profile.ak.facebook.com\/93729_3943.jpg"
"distance": its distance from the search point, "1.00" miles
CS8803 – GaTech Spring 2008 - FriendYou
8 / 15
Example of JSON sent:
{"json_class":"SearchQuery","friends":
[{"address_name":"Atlanta, GA, USA","json_class":"Friend",
"lng":-84.3897,"uid":12820237,"fof_level":0,"precision":"city",
"picture":"http:\/\/profile.ak.facebook.com\/237_5328.jpg",
"first_name":"Kelly","lat":33.7545,
"last_name":"Carlson","distance":"0.00"},
{"address_name":"Atlanta, GA, USA","json_class":"Friend",
"lng":-84.3897,"uid":519893729,"fof_level":1,"precision":"city",
"picture":"http:\/\/profile.ak.facebook.com\/93729_3943.jpg",
"first_name":"Bob","lat":33.7545,
"last_name":"Bryan","distance":"0.00"}]
Ranking
It is important to rank friends as the amount of friends of a user is large and
the
screen
space
available
for
visually
displaying
those
friends
is
comparatively very small. As explained above the algorithm has to display
only 20 – 40 friends from a list of around 20000 friends. Hence there is need
for an accurate ranking algorithm that will select the best or the top 40
friends who are closest to the user.
To find this we have developed a measure called PRscore. The PRscore is the
measure for the proximity between two friends. This PRscore is calculated on
some rules pre-decided for each attribute. Features such as number of
mutual friends or number of scraps, etc can be considered as attributes.
Thus different social networks have different number of attributes that go on
to the calculation of the PRscore. Each attribute could be given different
priorities or importance levels as to how much effect they should have on the
final PRscore. Also attributes from different social networks are predefined
and classified into mutually compatible with each other.
Calculation of weights for individual attributes
Number of Common Friends
CS8803 – GaTech Spring 2008 - FriendYou
9 / 15
The number of common friends is a measure of the degree of relationship
between friends or two people. The PRoximity (PR) score is high if the
number of common friends is high. A direct use of the percent of common
friends in the PR score computation is not recommended as it will always
lead to a low score.
Rule: If the number of common friends is more than 15% of the total
number of friends (Common Friends Ratio CFR), then the PR
score is maximum for the two people. The minimum of the PR
score of the two individuals is taken as the common PR score. If
the number of common friends to the total number of friends is
between 0 to 15 percent then the PR score is adjusted between 0
to 10.
I.e.
if
CFR >= 15 then PR score = 10
else
PR score = CFR / 1.5
Eg. If user A has 300 friends and user B has 100 friends and they have 30
friends in common then
User A PR score is: PR score = 300 / 30 = 10% / 2 = 5
User B PR score is: PR score = 100 / 30 = 33.3 % = 10 (As 33.3 %> 20 %)
Thus the common PR score is 5. (minimum PR score of both users)
Similarly the PRscore can be calculated for other attributes such as
• Number of Common Groups / Communities they belong to
• Number of testimonials (Orkut Specific)
• Number of scraps / wall writings
• Number of Common Applications (Facebook specific.)
• Common Networks Connections (CNR)
CS8803 – GaTech Spring 2008 - FriendYou
10 / 15
Interface
The interface is similar to a Google or Yahoo! Search, except that it is looking
for friend and friend of friends.
Getting data
The user specifies where he would like to find its friends and their friends of
friends location:
As explained in "Data Retrieval" section, all the information is received in
JSON. The JSON is converted to JavaScript objects
Displaying data
The JavaScript objects just retrieved previously are processed in:


Markers (Google Marker) on the map
Item in the list of friend on the right
CS8803 – GaTech Spring 2008 - FriendYou
11 / 15
From pure data to visual information
As a result, the map is centered and zoomed according to the precision of
the request (e.g. street, city, zip, state, country). The markers (one for each
friend) are put in an overlay on the map and the list of friends is populated.
Navigation
The user has then the possibility to navigate on the map (zoom in/out, pan)
and click on friends (on the map or in the list). A small pop up display more
information about the friends and its relationship with the user.
Here is the legend of the map:
This part uses a lot of JavaScript in order to provide cool visual effects
without having to reload the page.
CS8803 – GaTech Spring 2008 - FriendYou
12 / 15
The JavaScript API and the map are exchanging information during each
search in order to update the map and the list. We are using
XMLHttpRequest calls to query the server. It returns data in JSON as seen
above.
Listeners on the map and on the list are waiting for events before updating
the view.
JavaScript/Ruby API in order to render a dynamic interface
Here is a view of
the friends and
friends of friends at
Atlanta.
CS8803 – GaTech Spring 2008 - FriendYou
13 / 15
The list provides a nice way to visualize the friends at a certain distance. It is
sorted by distance and friend relationships.
A mouseover event on the picture of a friend highlights where he/she is on
the map.
A dynamic filtering is available in order to
display friend or friend of friends.
The map is also updated in live.
CS8803 – GaTech Spring 2008 - FriendYou
14 / 15
Here is a full example for Romain in France.
Conclusion
This project was really intense and dealing with new technologies (Ruby on
Rails, Gmaps, Facebook API in JavaScript and in Ruby, Geolocalization,
database, Active Record…). It was a mine of new knowledge for us. It
research background was also really consistent since the complexity of
ranking friends of friends is high. We implemented several optimizations
(address table for doing only one geo-lookup by address and adding hash of
the content of the users in order to detect quickly if changes had been
made). To sump up, it was a concentrated of use of latest Internet
techniques, which fit the best for this class of advanced Internet applications.
We are ready to apply to Internet startups now.
Future versions could integrate a more complex implementation of the friend
ranking system, more choices of search (hobbies, age…) and information
coming for real from the social networks following the Open Social API
(which will be finished to be implemented soon).
Moreover, since the application is working and has a useful goal, we plan to
release it online and see it taking more importance since the more it has
users, the more the friends of friends can be found!
CS8803 – GaTech Spring 2008 - FriendYou
15 / 15
Download