RECFLIX: AN ANDROID-BASED RECOMMENDATION & RETRIEVAL SYSTEM
FOR NETFLIX
A Project
Presented to the faculty of the Department of Computer Science
California State University, Sacramento
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
Computer Science
by
Kaniz Fatema
SPRING
2012
© 2012
Kaniz Fatema
ALL RIGHTS RESERVED
ii
RECFLIX: AN ANDROID-BASED RECOMMENDATION & RETRIEVAL SYSTEM
FOR NETFLIX
A Project
by
Kaniz Fatema
Approved by:
__________________________________, Committee Chair
Du Zhang
__________________________________, Second Reader
Bob Buckley
____________________________
Date
iii
Student: Kaniz Fatema
I certify that this student has met the requirements for format contained in the University format
manual, and that this project is suitable for shelving in the Library and credit is to be awarded for
the project.
__________________________, Graduate Coordinator
Nikrouz Faroughi
Department of Computer Science
iv
___________________
Date
Abstract
of
RECFLIX: AN ANDROID-BASED RECOMMENDATION & RETRIEVAL SYSTEM
FOR NETFLIX
by
Kaniz Fatema
Widespread adoption of high speed internet connectivity makes streaming media a
reality, and these days many streaming devices are increasingly becoming available,
including smart TVs, internet-connected Blu-ray players, Apple TV, and so on. One
problem users quickly face is the navigation and search of titles. The current approach is
far from satisfactory. Since smartphones are everywhere these days, can smartphones
help with this discovery problem? In this project we develop an Android-based app for
Netflix, a movie rental and streaming system, to address this problem. One of the nice
aspects of Netflix is that it offers recommendations. However it is tailored to one account,
which is often shared by all of the family members. Therefore, it is hard to differentiate
the recommendations for the parents vs. the teenage members of a family. The app we
develop does not require a user to login and the recommendations are tailored to the
owner of the smartphone only. Our proposed recommendation algorithm is based on the
v
ratings available on Netflix and Rotten Tomatoes, but can also incorporate new data
sources. We conduct a limited study and based on the feedback of the subjects, the result
is promising.
_______________________, Committee Chair
Du Zhang
_______________________
Date
vi
TABLE OF CONTENTS
List of figures.....................................................................................................................ix
Chapter
1.
INTRODUCTION ...................................................................................................... 1
1.1 Problem Statement ................................................................................................. 2
1.2 Solution Sketch ....................................................................................................... 4
1.3 Organization of the report ....................................................................................... 5
2.
RELATED WORK ..................................................................................................... 6
2.1 Related Android Apps............................................................................................. 6
2.2 Recommendation Algorithms ................................................................................. 7
3.
DESIGN .................................................................................................................... 10
3.1 User Interface Design ........................................................................................... 10
3.2 Recommender algorithms ..................................................................................... 12
3.2.1
Average-Deviated Recommendation algorithm. .......................................... 12
3.2.2
Correlation-Weighted Recommendation algorithm. ..................................... 14
3.3 Datasets ................................................................................................................. 18
4.
IMPLEMENTATION ............................................................................................... 22
4.1 Back-end ............................................................................................................... 22
4.1.1
Data collection from the web-services .......................................................... 22
vii
4.1.2
Application/Web Server and Database ......................................................... 24
4.2 Client or Front-end ................................................................................................ 27
4.3 User study ............................................................................................................. 31
5.
4.3.1
Approach ....................................................................................................... 31
4.3.2
Result ............................................................................................................ 32
4.3.3
Performance Analysis ................................................................................... 33
4.3.4
Discussion ..................................................................................................... 34
CONCLUSION ......................................................................................................... 35
Bibliography ..................................................................................................................... 37
viii
LIST OF FIGURES
Figure 3.1: Overview of the Correlation-weighted recommendation process. ................ 17
Figure 3.2: Distribution of Netflix titles ........................................................................... 18
Figure 3.3: Correlation between various types of ratings. ................................................ 20
Figure 4.1: High level application components and data collection. ................................ 24
Figure 4.2: Entity-relationship diagram of the database tables used. ............................... 26
Figure 4.3: The main user interface of RecFlix ................................................................ 28
Figure 4.4: Comprehensive set of options for filtering titles ............................................ 29
Figure 4.5: Movie description dialog appears when a user clicks on a movie image....... 30
Figure 4.6 Interview prior to RecFlix interaction ............................................................. 32
Figure 4.7: Post-interview questions................................................................................. 34
ix
1
Chapter 1
INTRODUCTION
Streaming media, and, in particular, streaming movies are becoming increasingly
commonplace, thanks to the widespread availability of high speed internet, and
streaming-able devices including computers, smart TVs, and so on. Streaming, or aptly
sometimes called instant movies and TV shows, offer major advantages over that are
available on physical media, such as DVDs, etc. Most notable advantage is perhaps the
fact that the user gets to choose what she wants to watch, when she feels like it. Netflix
[1] is a pioneer in streaming movies and TV shows, in addition to their original business--DVDs-by-mail. As success draws company, quite a few other vendors including
Amazon [2] and Hulu [3] are offering streaming movies. We focus on Netflix in this
project, as they have the largest streaming customer base.
Smartphones are now ubiquitous, with capabilities of a computer. People are increasingly
depending on their smartphones for on-the-spot information, rather than carrying or
opening up a laptop everywhere, and for every minute needs. Naturally, information
needs such as which movies are available, or which movie I should watch, is expected to
be on their fingertips as well.
2
1.1
Problem Statement
After interviewing a few Netflix-streaming subscribers and going through many postings
on the internet, we have identified following issues with the information need of Netflix
streaming titles.

Movie information organization: No matter what the streaming device is--smart TV, internet-connected DVD players, media players like Roku [4] or Apple
TV, and so on, the titles that a user can browse appear to be very limited, leaving
the user wanting for more. Furthermore, the search options are often limited,
leaving the user with less control to find what she needs. One user commented
that he thought there are 75 titles available under each genre category, since
Netflix shows only 75 titles in each genre. The reality is that Netflix carries
14,000-18,000 titles under streaming. Clearly, the presentation of information
about the extent of titles that are available can be improved. One can argue that
more is not necessarily better; however, it is comforting to know that there are
more titles available, if the user has not found something interesting to watch yet.

The user in control: interesting titles to watch: The point above contributes to
this issue. Netflix
provides top 10 recommended movies, which are rather
generic. The user needs to be in control. If she is in mood for a classic drama, she
finds it desirable to be able to specify the specific date-range and the genre to
3
discover the top recommended titles in that category. Once she is in control, she
could discard unwanted features, such as foreign titles, as an example.

Having to login means all history in one bucket: Most Netflix
clients/applications require a customer to login to the Netflix service to use and to
get personalized content delivery and recommendations. When the user need is to
browse available titles and to find out interesting titles, this login requirement can
be a disadvantage for the following reason. A customer who has young children in
the family may watch all sorts of movies. Netflix will generate recommendations
based on this disparate set of titles, and may miss the mark when only the children
are the viewers, or the parents. An alternative is that your mobile device carries
your movie preferences and delivers recommendations only for you.

Inadequate smartphone apps: In this project we focus on smartphones as the
platform of choice, and android as the first mobile operating system to explore.
While there are a few apps that attempts to tackle the problems identified above,
including one from Netflix itself, they all appear to be inadequate. We compare
the features of the apps in the related work section of the paper.
4
1.2
Solution Sketch
In this paper we design an Android application for Netflix streaming titles that attempts
to tackle the issues identified in the last section. In particular, our design has following
main features.

Title information and discovery: We organize available streaming titles in such
a way that the user can realize how many they have seen and how many are still
available to explore. The user can search for titles using various filters that also
guides the user with availability information. For example, the user sees which
genre or which year has more or how many titles with respect to total available.

Recommendation to help find what might be interesting: We implement a
recommendation algorithm which adapts based on the user’s movie preferences.
As the user keeps rating new movies, it learns more about the user and updates the
recommendation. We build RECFLIX based on a well-known recommendation
algorithm and modify it to accommodate an easy to use system. The user’s ratings
are contrasted against Netflix users’ average ratings and Rotten Tomatoes’ [5]
critics and members’ ratings. Our novelty here is that we keep it very simple, and
we look at users’ preferences at the fine grained genre and other feature level.

Hassle free: We do not require the user to login to Netflix or any other site.
Further, we do not limit what the user can see. It is the user who controls which
5
way she wants to find the titles. No login also means anybody with an Android
OS-based device can rate a few movies and get personalized recommendation.
Therefore, one combine history for a family with one account is not an issue.
However, a user’s rated movies are not lost after a session. Since we keep record
by the android device ID, therefore a user can accumulate her ratings as long as
she uses the same device.
1.3
Organization of the Report
This report is organized as follows. Chapter 2 goes through some of the related work,
including one of the most cited recommendation algorithms, particularly the one we build
our approach on. It also contrasts some of the related and prominent Android apps with
our design. Chapter 3 covers our proposed design---both user interface and
recommendation algorithms. It explores the datasets we consider in this project, and the
relationships between them to explain why we need all of them. Chapter 4 is about our
implementation, both server-side back-end and the Android client. It also describes the
limited user study we have conducted. Chapter 5 concludes the report with remarks on
future work.
6
Chapter 2
RELATED WORK
In this section we describe some of the most cited recommendation algorithms,
particularly the one we build our approach on. We also contrast here some of the related
and prominent Android apps with our design.
2.1
Related Android Apps
Although there are quite a few Android apps for movies, they are primarily for the
movies that are currently in theaters, and not for Netflix or for Netflix streaming titles.
Most similar app in functionality that we have found is the app created by Netflix itself.
However, this project is in part to overcome some of the limitations of that app. The
Netflix app, for example, requires a user to login before she can do any operation. Its
proprietary recommendation algorithm is generic and one account specific. That is, all
family members with the same account gets to see the same set of recommendations.
Another limitation of the app is that the user can explore titles only in a very limited
fashion. And it is impossible for her to know or find how many titles are available in total
or in any category. Here is a comment from a user of this app: “There are virtually no
settings or options or ability to refine or filter searches and browsing. The browsing
layout is poor at best.” [6] Another user reviews: “Limited selection of titles. Works in
7
limited number of phones (at launch). Poor recommendation engine.” [7] Another user
commented: “If a family shares an account, the recommendation is not great.” [8]
2.2
Recommendation Algorithms
The recommendation problem can be thought of as predicting a value, that is, how much
a user would like an item (movie in this case), based on what we know about the user and
other users or data sources. This translates fairly well to the solution space of statistical or
machine learning algorithms or approaches. No wonder there have been quite a few
algorithms proposed so far [9] to make recommendations, and the contribution of
algorithms continues.
Most commonly cited recommendation algorithm [10] [9] is sometimes referred to as
“user-based” algorithm, and it works as follows: First it finds similarity, for which a
metric can be correlation, between the target user and other users. We can represent two
users as two vectors of numbers, based on the movies they have rated in common. Each
cell in the vector is the rating on the corresponding movie. We find the similarity between
the two users, expressed as 𝑤𝑢𝑖,𝑢𝑡 ,using the equation for Pearson correlation coefficient
[11]:
8
2.1.
Where 𝑤𝑢𝑖,𝑢𝑡 is correlation between two users, ut is the target user, ui is another user,
̅𝑢𝑡 is average rating of the target user, 𝑅̅𝑢𝑖 is average
𝑅𝑢𝑖 ,𝑎 is ui’s rating on movie a, 𝑅
rating of other user, 𝑅𝑢𝑡,𝑎 is ut’s rating on movie a.
Since the value of correlation is between -1 to +1, the similarity between users can be
either positive, 0, or negative.
Once we know the similarities between users, the recommended value of a target movie
is estimated by taking others’ rating on the target movie weighted by their similarities
(correlations) with the target user:
2.2.
9
Here ut is the target user, at is the target movie, ui is another user, and w is the correlation.
Note that this algorithm is actually a machine learning algorithm called k-nearest
neighbor [12]. A known limitation of this algorithm is that it suffers from scalability
issues [11], that is, when there are many users, say millions of them, typical in popular ecommerce sites, the algorithm suffers from performance issues. We do not have this
problem, however, since for initial assessment we have used four users in total to
consider.
10
Chapter 3
DESIGN
In this section we describe the user interface of the Android app and its components. We
also point out how we kept the problem statement in mind while designing the interface.
We describe the structure of our application, including the back-end, how we utilize
various APIs provided by Netflix and Rotten Tomatoes, the technologies we have used,
and the recommendation algorithms we proposed and implemented.
3.1
User Interface Design

No login: In order to use our application, that is browse/filter titles, rate titles, and
receive recommendations, no login is necessary. However, this does not mean that
the user has to repeat the same ratings. We store the rating information in the
database using the Android device ID. In some way, therefore, the user's Android
device becomes her very own walking movie-preference keeper. However, in the
future we plan to implement an optional login capability so that she can transfer
her preferences between devices: Android to PC, as an example.

Rating/recommendation interface: It is common to use the same star-rating
display to both show and collect rating. We show at most 5 stars to display the
recommended value of the movie to the user. The user rate by dragging on the
star-display to express her opinion on the movie. Note that, initially, when we do
11
not have enough ratings from the user, the star-display shows the average of the
three sources: Netflix average rating, Rotten Tomatoes Critics score, and Rotten
Tomatoes audience score.

Multiple levels of details: As mentioned before, so far we have used two data
sources: Netflix and Rotten Tomatoes. Rotten Tomatoes is rich with textual
information, such as reviews and so on. Even Netflix genres are very fine grained
now. For example, the "Comedy" genre has many finer grained sub-genres,
including "Independent Comedies", "British Comedies", "Family Comedies",
"Sports Comedies". As a result, the genre information can become quite
voluminous. Since screen-space is very limited on phones, we use a two-level
information display. On the movie-list display, which is the level 1 display, we
show brief information about the movie. By clicking on the movie image,
however, the user can see much more details, including Rotten Tomatoes critics
score and consensus review and all the fine level genres of a movie on a dialog
box.

Find titles many ways: We want the user to feel as constrain-free as possible to
discover the titles she may like. In order to help with this discovery process, we
design a comprehensive set of filtering tools. The user can play around by
selecting the range of release years, the set of genres, the set of MPAA maturity
ratings, and so on.
12
3.2
Recommender algorithms
We have applied two recommendation algorithms in this study. The particular algorithm
depends on the availability of data from Rotten Tomatoes. If the target user’s rated
movies are not in Rotten Tomatoes, then we apply an algorithm we call AVERAGEDEVIATED RECOMMENDATION algorithm. Otherwise we apply an algorithm we call
CORRELATION-WEIGHTED RECOMMENDATION algorithm. In the following we describe
both of these algorithms. Most of the symbols we use to describe the algorithms are listed
in Table 3-1.
3.2.1
AVERAGE-DEVIATED RECOMMENDATION algorithm.
This algorithm, explained below, looks at the target user’s ratings against corresponding
Netflix average ratings, and computes average deviation. The recommended value of the
target movie is computed by adding this average deviation to the Netflix average rating
value of the target movie. Note that the target use’s average deviation can be positive,
zero, or negative depending on how she rated the movies. This procedure is applied for
various categories, e.g., genres, languages, year, MPAA rating, etc., of the movies, if
there are enough ratings in a category. AVERAGE-DEVIATED RECOMMENDATION
algorithm is shown in equation 3.1.
𝑀𝑢𝑡
𝑅̂𝑢𝑡𝑎𝑡 = 𝑅̅𝑛𝑓,𝑎𝑡 + ∑ (
𝑖
𝑅𝑢𝑡,𝑎 − 𝑅̅𝑛𝑓,𝑎𝑖
𝑖
|𝑀𝑢𝑡 |
)
3.1.
13
Where, ut is the target user we want to compute recommendations for, at is the target
movie, 𝑀𝑢𝑡 is the set of movies rated by the target user so far, 𝑅̅𝑛𝑓,𝑎𝑖 is the average rating
𝑀𝑢𝑡
of Netflix members on the movie, ai. Therefore, ∑𝑖
𝑅𝑢𝑡,𝑎 − 𝑅̅𝑛𝑓,𝑎𝑖
𝑖
(
|𝑀𝑢𝑡 |
) is essentially our
target user’s average deviation from the Netflix average ratings, since |𝑀𝑢𝑡 | stands for the
size of the set of movies rated by the target user.
Algorithm 1: AVERAGE-DEVIATED RECOMMENDATION
/* Returns list of movies ordered by recommended values */
Function average_deviated_rec(Ratings_ut)
Ratings_ut = target user’s rating vector
N = size of Ratings_ut
Mean_deviation = 0
Foreach movie, a, rated by the target user
Mean_deviation
=
Mean_deviation
+
(Ratings_ut(a)
Netflix_avg(a))
End
Mean_deviation = Mean_deviation/N
Predicted_ratings = Empty_list
Foreach movie, a, not rated by the target user
Predicted_ratings(a) = Netflix_avg(a) + Mean_deviation
End
Sort(Predicted_ratings)
Return Predicted_ratings
End Function
–
14
3.2.2
CORRELATION-WEIGHTED RECOMMENDATION algorithm.
When Rotten Tomatoes contains enough data of the target user’s rated movies, we use a
different recommendation algorithm. We consider three users in this case to help our
target user receive recommendations. Representing a user is as a vector of movie-ratings,
the three users we consider essentially are a) The Netflix average rating vector, b) Rotten
Tomatoes Critics’ scores, and c) Rotten Tomatoes audience scores.
CORRELATION-WEIGHTED RECOMMENDATION algorithm is based on the user-based
approach described in section 2.2. The algorithm below, also illustrated in Figure 3.1,
works as follows. First it finds the similarity, in terms of Pearson correlation coefficient,
between the target user
Table 3-1: Meaning of the symbols used in this chapter
Symbol
Meaning
𝑢𝑡
Target user---the user we are computing recommendations for
𝑎𝑡
Target movie
𝑅̂𝑢𝑡 𝑎𝑡
𝑅̅𝑢𝑡
𝑤𝑢𝑖,𝑢𝑡
Recommended/predicted value for the target user and the target movie
Average rating of the target user
Pearson correlation coefficient between the target user and a user, 𝑢𝑖
𝑁𝐹, 𝑅𝑇𝐶, 𝑅𝑇𝑈 Representing Netflix, Rotten Tomatoes Critics, and audience respectively
𝑀𝑢𝑖
Set of movies rated by a user, 𝑢𝑖
15
and each of the (three) other users. After that the recommended value of a target movie is
estimated by taking each of the three users’ ratings on the target movie into consideration
and multiplied or weighted by the similarities (correlations) with the target user, as
expressed in equation 3.2.
𝑅̂𝑢𝑡𝑎𝑡 = 𝑅̅𝑢𝑡 +
∑𝑖∈{𝑁𝐹,𝑅𝑇𝐶,𝑅𝑇𝑈}(𝑅𝑢𝑖 𝑎𝑡 − 𝑅̅𝑢𝑖 )𝑤𝑢𝑖 ,𝑢𝑡
∑𝑖∈{𝑁𝐹,𝑅𝑇𝐶,𝑅𝑇𝑈} 𝑤𝑢𝑖 ,𝑢𝑡
3.2.
Note that Rotten Tomatoes rating scale is 0-100, which is different from Netflix rating
scale. In order to make them compatible, we multiply the Rotten Tomatoes ratings by
5/100 = 0.05. Note further that correlation is scale invariant. Therefore, we do not need to
worry about scale during correlation computation.
Algorithm 2: CORRELATION-WEIGHTED RECOMMENDATION
/* Returns list of movies ordered by recommended values */
Function correl_weighted_rec(Ratings_ut)
Ratings_ut = target user’s rating vector
Netflix_avg = Netflix average rating vector
RT_critics = Rotten Tomatoes Critics’ rating vector
RT_audience = Rotten Tomatoes Audience average rating vector
Movies_common = movies in
intersection(Ratings_ut,
Netflix_avg,
RT_critics,
RT_audience)
If ( size_of(Movies_common) < THRESHOLD)
Return average_deviated_rec()
End If
Correl = list of pairwise correlation between vectors using
16
Movies_common only
Predicted_ratings = Empty_list
Avg_ut = average(Ratings_ut)
Sum_correl = sum(Correl)
Foreach movie, a, not rated by the target user
Predicted_ratings(a) = Avg_ut
Others_deviation = 0
Foreach user, u, in {NF, RTC, RTU}
Others_deviation += (rating(u, a) – average(u))* Correl(u,
ut)
End
Predicted_ratings(a) += Others_deviation/ Sum_correl
End
Sort(Predicted_ratings)
Return Predicted_ratings
End Function
The CORRELATION-WEIGHTED RECOMMENDATION algorithm described above is refined
further as we describe below. As our target user keeps rating movies, we examine if she
has rated enough in one category/genre. If she has, then we estimate predicted ratings on
the titles of the genre/category. For each category/genre we repeat this process. Finally
we present the movies by recommendation order with the preference on the categories
she rated so far.
17
Figure 3.1: Overview of the
CORRELATION-WEIGHTED recommendation process. Four
users’ rating profiles are shown as rating vectors, using the same scale of 0-5. The target user did
not rate the target movie (the last one, The Da Vinci Code) yet. In order to compute the
recommendation of this target movie, we first compute the similarities between the target user
and three other pseudo-users. Equation 3.2 is then used for recommendation.
18
3.3
Datasets
In this section we explore the data we utilize, that is, data from Netflix and Rotten
Tomatoes. We also explore the benefit of using both of the sources by showing
relationship between them.
(a)
(b)
Figure 3.2: Distribution of Netflix titles: (a) By release year---the highlighted bar represents
2005-2007, (b) by average rating---the highlighted bar represents the rating value of 3.0.
We start with Netflix dataset [1] exploration. Figure 3.2 shows the distribution of Netflix
titles by release year and users’ average rating. First point to note is that highest number
of titles are from 2005-2007. This is because many of the TV shows are from these years.
It is interesting to observe that users’ average rating---almost nothing is in the absolute
‘dislike’ zone with the rating value of 1. This is possibly because users do not want to
19
waste time by very bad movies that they can sense early, and therefore, they did not
report a lot of bad feedback on movies.
Figure 3.3 shows pair-wise correlation between the Netflix average rating, the Rotten
Tomatoes critics’ and audience score of about 3,100 movies where all three sources have
valid data. As previously mentioned, Rotten Tomatoes does not carry all the titles that
Netflix carries, reasons including the fact that Netflix has many Foreign titles and TV
shows.
One point clear from Figure 3.3 is that none of the correlations are very strong. This is
good, since if they were highly correlated, having all of them would not add extra values,
since that would be similar to having multiple copies of the same rating dataset. The
strongest correlation is between the two sources from Rotten Tomatoes. This is also
interesting, given the fact that Rotten Tomatoes audience are regular users like Netflix
users as well. Perhaps, the audience who visit Rotten Tomatoes are influenced by the
ratings of the critics. Finally, note that audience or regular users’ scores are more positive
than that of the critics.
20
(a)
(b)
(c)
Figure 3.3: Correlation between various types of ratings. About 3,100 movies considered for this
analysis. Each dot in the scatter-plots represent a movie. None of the correlations are very strong
(> ±0.8). Correlation between Netflix average rating and Rotten Tomatoes critics’ score is shown
in (a), correlation between Rotten Tomatoes critics’ and audience scores are shown in (b), and
21
finally, correlation between Netflix average rating and Rotten Tomatoes audience scores are
shown in (c).
22
Chapter 4
IMPLEMENTATION
In this chapter we describe how we implemented our design as described in the last
chapter. We describe the client or the front-end software, the back-end or the server-side
components, the protocols or process of collecting data from various web-services, etc.
Figure 4.1 shows a high level overview and flow between various components of the
software, and with the data sources residing on the internet. We next describe them.
4.1
Back-end
In this section we give an overview of the implemented back-end.
4.1.1
Data collection from the web-services
We rely on Netflix and Rotten Tomatoes web services for our movie and opinion or
ratings data. The collected data is then stored in MySQL relational database tables.
Netflix offers multiple ways to access their web services. Using OData [13], an HTTP
and JSON (JavaScript Object Notation) [14]-based querying and updating protocol, is the
simplest among them, since their OData service does not require any complex
authentication steps, just the authentication key that Netflix gives when someone
becomes a Netflix Developer Network member [15]. We have automated Netflix data
collection by implementing parser tools in java using odata4j library [16], an open source
23
java library that lets user create OData consumers and producers. These parsers
essentially do two main steps: a) extract and retain only the data fields we care about
from quite a lot of redundant information that Netflix provides in their OData, or Rotten
Tomatoes shares in their JSON data, and b) transform the extracted data into a format that
is easily transferable into the database.
Rotten Tomatoes offers a JSON API to their developer members to retrieve some of the
data they host about movie and rating information. We created a PHP script to collect
data using their JSON API that essentially does two things: a) retrieve and parse, if
available, each of the titles that Netflix carries, and b) insert the retrieved data into our
MySQL database. Unfortunately, Rotten Tomatoes limits the maximum number of API
calls to 10,000/day for each developer account. Since we wanted to download Rotten
Tomatoes data for as many as 80,000 titles that Netflix carries, we had to run the script
for a few days. Also, as mentioned before, Rotten Tomatoes carries only a subset of the
titles that Netflix carries.
24
Figure 4.1: High level application components and data collection. Steps a-c are for data
collection routines on the servers to collect movie data from Netflix and Rotten Tomatoes web
services. Dashed lines represent communication path between the Android client and the backend.
4.1.2
Application/Web Server and Database
After we collect movie and rating data from Netflix and Rotten Tomatoes, we direct them
to various MySQL tables. We store the data in our database server, instead of fetching
data off the web, because we generate recommendations on the fly, as the user interacts
25
with the software, and as she keeps providing new ratings. The recommendation
algorithm needs at least one pass through the entire dataset, and it would be a
considerable performance bottleneck if we attempt to collect all data from the internet
every time the user interacts with the software.
We designed the database tables (Figure 4.2 shows the entity-relationship diagram) with
performance in mind, since response delay is not desirable. For example, to avoid
expensive join operations while a user is interacting with the system, we kept as much
data as possible in a single table, title. We also optimized the tables with re-building
indexes after each data refresh from Netflix/Rotten Tomatoes. Note that instead of doing
incremental update to our database with only new releases, we replace the existing data
with a new batch of data every two weeks. We do this because Netflix keeps changing
their genre structure, among other pieces of information. Therefore, it is safer to replace
rather than update existing data.
Our application/web server is the open source Apache Server with a PHP module. The
application server is the brain of the software, and does quite a few operations including:

Parse the user requests (in JSON) originating from the client

Recognize the user, using the Android ID
26
Figure 4.2: Entity-relationship diagram of the database tables used.

Set the filtering criteria as directed by the target user

Decide the appropriate recommendation algorithm and does the computation:
average of Netflix and Rotten Tomatoes ratings, as described in section 3.2.1.
Otherwise,
the CORRELATION-WEIGHTED RECOMMENDATION algorithm, as
described in section 3.2.2, is computed.

Sort the movies by recommended values and decide which set of movies to send
based on which page the user was on.
27
4.2
Client or Front-end
So far we have implemented our smartphone client for the Android platform. We start
with the Android platform, since at the time of this writing, Android has the largest
market share, at least in the US [17].
Implemented a client which has two major components.
1.
Lists movie information. The user can view as many pages of movie information as
she wants. At this stage she sees average rating of each movie instead of recommended
values for her. The interface allows her to rate any movie shown on any page. Once
enough ratings (15 at the moment) are collected, recommended ratings or values are
shown for later movies. Figure 4.1 shows the main screen of the app, which shows a list
of titles. Clicking on a picture brings the movie details.
2.
Filtering: The user can filter the list of movies by selecting various options
including genres, languages, year, MPAA rating. Figure 4.4 shows a snapshot of this
capability.
3.
When the user clicks the movie image a dialog box appears with detailed
information of a specific movie. It also display rotten tomatoes critics consensus if
available. Figure 4.5 shows an example of this dialog box.
28
Figure 4.3: The main user interface of RecFlix
29
Figure 4.4: Comprehensive set of options for filtering titles
30
Figure 4.5: Movie description dialog appears when a user clicks on a movie image.
31
4.3
User study
In this section we describe the limited user study we have done to collect their feedback
on the existing apps and the app we have developed. In the future when we deploy the
app on Android Marketplace (now named Google Play), we would be able to collect
much richer set of reviews on our app through user feedback.
4.3.1
Approach
Our app resides on the Android simulator that comes with the Android development kit.
Therefore, it is not a complete comparison between other Android apps that reside on the
phone. However, we believe the simulator mimics the functionality of a real smartphone
quite closely, as far as the features of this app are concerned.
We recruited five users who use Android phones and are members of Netflix. All of these
subjects are heavy Netflix streaming users, either through smart TVs or through some
type of streaming devices. Another disclaimer is that these subjects are colleagues, and
therefore it is not a fully scientific randomly assigned controlled experiment. Users gave
feedback in two steps. First we interviewed them about their current experience with
Netflix streaming in general, and apps that they use or have seen. Then they have
interacted with our app and gave the second round of feedback. Note that we deleted the
32
ratings a subject provided after her session was over to not mix with preferences of each
other, since we identify a user based on the Android ID, and everybody used our PC, and
therefore, had one Android ID.
4.3.2
Result
In this section we provide two sets of survey results: one set we asked before the subjects
experienced our software, and another set after they have interacted with RecFlix. Figure
4.6 shows the two questions we asked them initially. The goal of these questions is to
verify our hypothesis that the Netflix streaming service is somewhat misunderstood
(a)
(b)
Figure 4.6 Interview prior to RecFlix interaction: a) Rate the following aspects of Netflix
streaming as you experience it through TV/streaming device, b) How many titles do you believe
Netflix streaming has?
33
in terms of the extent it carries the titles, and to see where improvement is required.
4.3.3
Performance Analysis
Figure 4.7 shows the average response on the three areas we interviewed after the trial of
RecFlix and the Netflix official Android app. As the figure shows, RecFlix is no worse
than the mature Android Netflix app with respect to ease of use and recommendation
quality, and the subjects found it easier to discover titles on RecFlix. Subjects also
commented that the ability to see Rotten Tomatoes content, in addition to the usual
information provided by Netflix was helpful.
Not shown here, however, we also asked the participants about what was lacking on
RecFlix. Predominantly, the feature everybody wanted to have was the ability to add the
titles to their Netflix queue.
34
Figure 4.7: Post-interview questions. RecFlix is contrasted against the official Netflix app for
Android.
4.3.4
Discussion
From the limited study described above, it is clear that the state of art, both with or
without Android apps begs for improvement. Our app, although needs further refinement,
is a good step toward meeting some of the need.
35
Chapter 5
CONCLUSION
In this report we described our design of an Android app for Netflix streaming titles to
offer a retrieval and search of the available titles, better than the other current similar
apps that are available. We have provided a simple recommendation algorithm that
attempts to overcome requirements common elsewhere, such as having to login. Our
recommendation approach also is more tailored to the smartphone user, since in most
cases recommendations are generated for one account shared by multiple members who
tend to watch different types of movies. Limited trial of the software with some users
show the promise of our approach.
Much remains to be done. In the final version of this software, we plan to add the
capability to let the user add any movie on the interface to her Netflix queue, so that the
movies she wanted to watch are right on the streaming player. Of course, that would
require the user to login to her Netflix account. If a user logs in to the Netflix account, we
can easily retrieve the movies she has rated on Netflix, and utilize for recommendations.
Other online sites offer rich movie information as well. Amazon.com, for example, has
numeric ratings and textual reviews on movie titles. In the future, we can combine this
information to make better recommendations. Once we have many users of our system,
36
however, we would be able to accumulate enough ratings from the users in our database
alone and utilize these ratings to make recommendations.
Learning the preferences of a user may need some guidance. In the current version, we
simply list movies ordered by average ratings, and the user can rate as many she can from
this list. However, prior research [18] [19] tell us that this is not the best approach to
collect preferences, since a) the user might not have seen many of the titles to have any
opinion about them, and b) opinion on each title may not reveal movie-taste equally--rating on a movie that has a wide rating distribution may tell us more about the user,
instead of a movie which everybody likes and our user likes too is not very informative.
37
BIBLIOGRAPHY
[1] Netflix. [Online]. www.netflix.com
[2] Amazon. [Online]. http://www.amazon.com/
[3] Hulu. [Online]. http://www.hulu.com/
[4] Roku. [Online]. http://www.roku.com/roku-channel-store
[5] Rottentomatoes. [Online]. http://www.rottentomatoes.com/
[6] Appstorehq. [Online]. http://www.appstorehq.com/netflix-android-682775/app
[7] Pcmag. [Online]. http://www.pcmag.com/article2/0,2817,2385354,00.asp
[8] Quora. [Online]. http://www.quora.com/Do-people-find-Netflixs-recommendationsalgorithm-useful-in-practice
[9] A. T. Gediminas Adomavicius, "Toward the Next Generation of Recommender Systems: A
Survey of the State-of-the-Art and Possible Extensions.," IEEE Trans. Knowl. Data Eng.,
vol. 17, no. 6, pp. 734-749, 2005.
[10] J. A. K. A. B. J. R. Jonathan L. Herlocker, "An algorithmic framework for performing
collaborative filtering," in Proceedings of the 22nd annual international ACM SIGIR
conference on Research and development in information retrieval, Berkeley, California,
38
United States, 1999, pp. 230-237.
[11] G. K. J. K. J. R. Badrul Sarwar, "Item-based collaborative filtering recommendation
algorithms," in Proceedings of the 10th international conference on World Wide Web, Hong
Kong, Hong Kong, 2001.
[12] S. R. a. P. Norvig, Artificial Intelligence: A Modern Approach, 3rd ed..
[13] Open Data Protocol (OData). [Online]. http://www.odata.org/
[14] JSON (JavaScript Object Notation). [Online]. http://json.org/
[15] Netflix Developer Network. [Online]. http://developer.netflix.com/
[16] odata4j: An OData Toolkit for Java.. [Online]. http://code.google.com/p/odata4j/
[17] B. Molen. (2012, Mar.) Engadget. [Online].
http://www.engadget.com/2012/03/07/comscore-us-subscriber-count-reaches-100-millionandroid-and-i/
[18] I. A. D. C. S. K. L. S. M. M. J. A. K. J. R. Al Mamunur Rashid, "Getting to know you:
learning new user preferences in recommender systems.," in IUI, 2002, pp. 127-134.
[19] Y. K. R. L. Nadav Golbandi, "On bootstrapping recommender systems," in Proceedings of
the 19th ACM international conference on Information and knowledge management,
Toronto, ON, Canada, 2010, pp. 1805-1808.