An Initial Study on Recommender Systems Abstract

advertisement
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
An Initial Study on Recommender Systems
Abstract
Recommender systems emerged and became increasingly popular throughout the past decade in its
application of E-Commerce. This article introduces and discusses the different types of recommender
systems. They use implicit and explicit data collections, and various algorithms to yield
recommendations. The shortcomings of the systems are also being discussed. Sequential decision
processing seems to be favored due to its commercial benefits and more widely used. And the mobile
applications of recommender systems spread the use even further and deeper into the consumers.
Key Words: Recommender Systems, Sequential decision process, MDP-based recommendation
1
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
Background
We often make choices without sufficient personal experience, and we rely on other people’s
recommendations in our everyday life. (Resnick P. et al 1997) The earliest systems were traditional
information-filtering and information-retrieval systems which could not really recommend rather than
giving certain results accordingly to the requests. Tapestry experimental mail filtering system was
introduced by Goldberg D. et al (1992) developed at Xerox Palo Alto Research Center. Tapestry was
designed to deal with both content-base and collaborative filtering. The collaborative part then entailed
people helping out each other by recording their reactions or annotations to documents they had read
explicitly. This was the real first recommender system. Worth mentioning that the same research center
is also the place personal computers were invented. Recommender systems assist and augment the
natural process of decision making. (Resnick P. et al 1997) Recommender systems are widely used in
E-commerce, entertainment, content-consumption, and service industry, nowadays on internet services
for helping the users finding the items they want and boost commercial benefits for the merchants or
service provider. (Ricci, F. et al. 2011) Amazon.com, YouTube, Facebook, Ebay, Netflix, iTunes,
IMDB and Yelp are some of the big names you may think of when mentioning about recommender
systems. In fact, recommender systems play such an important role that Netflix had started a
competition for increasing the accuracy of the recommendation their current system could yield. In
2009, they awarded the winning team Belkor’s pragmatic chaos a million-dollar prize over the 10%
increase of accuracy the team had offered with their recommender system, which was really a
combination of numerous types of algorithms that compensates each other’s disadvantages with their
advantages.1 A recommender system predicts a particular user’s likelihood to give a certain item high
rating according to the characteristics of the item or what other people with similar taste think about the
item. Correspondingly, the systems use content-based approach and collaborative filtering approach.
And this helps people to cope with information overload.
1
http://www.netflixprize.com//community/viewtopic.php?id=1537
2
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
Services provided by recommender systems
Recommender systems are widely used by online stores for they improve user convenience and store
benefits. Schafer J. B. et al (1999) Claimed that E-commerce sales can be enhanced by recommender
system in three ways. It turns browsers into buyers, cross-sell items that are suggested at the checkout
page, increase users’ loyalty by making the purchase only a few clicks away or awarding frequent
customers with good deals and such.
Herlocker et al. (2004) has a list of eleven popular tasks a recommender system can assist to
implement:
- Find some good items: A featured list of ranked items found that fit the user’s requirements.
Prisjakt.nu features this function when a text string is entered in the search field, and dynamic search
result will should a limited list of items that match the search text string.
- Find all good items: A list of all the items that satisfy all the criteria the user set from the item
database. On prisjakt.nu, after entering the search string, hitting the Enter key will show the entire list
of items matching the search string.
- Annotation in text: A list of items that are recommended according to the current context and the long
term user preference. A certain TV series on a certain channel can be recommended according to the
user’s long term viewing habits.
- Recommend a sequence: A list of the item been searched and some related items that are not
necessarily fitting search criteria, but it may be interesting for the user. A typical phrase would be: “If
you like this, you may also like that”.
- Recommend a bundle: A list of related items that can work together to serve a purpose better for the
user. Typically when you buy a camera, you may consider buying a memory card, a pouch and
complete the purpose of the camera.
- Just browsing: For users that browse without a prominent purpose, the recommender system’s task is
to assist the user to browse items within the scope that are interesting to the user at that specific
browsing session.
- Find credible recommender: There are users that are skeptical of the recommendation yielded by the
system. The recommender system’s task is then to allow the user to test the system’s behavior
3
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
-Improve the profile: The system can take inputs from the user about his/her likes and dislikes, in
general, explicit preference information. Otherwise the system can only treat a user with empty
preference as an average user with all the average preferences compared to other users in the system.
- Express self: Some users care little about recommendations, but it is important for them to be able to
express their opinions and beliefs of certain item. A comment section is where the system can take such
inputs, and also the satisfaction it creates can leverage as a motivation for purchasing the item related.
- Help others: Certain users may be even more motivated to leave a full review or rating of the item due
to their belief in this benefiting the community. And that can be hugely motivational for other potential
buyers to set their minds.
- Influencing others: Certain users could be exclusively influential, trying to convince other users
buying or nor buying the product. Even malicious user can fall into this category.
This huge list of the popular tasks a recommender system can encounter is a classical reference within
the field. It demonstrates a huge variety of diverse data and knowledge that a recommender system may
need dealing with and consequently various recommendation approaches shall take place targeting
different scenarios. This list of tasks are actually been used to evaluate recommender systems and
algorithms
The reasons for recommender systems being deployed by modern service providers are:
They can boost sales for certain items; promote unpopular items that may be highly interested by
relevant customers; increase user satisfaction and help understand the users’ desires better, and lock on
loyal customers. (Ricci, F. et al. 2011)
4
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
Data inputs and outputs in recommender systems
The input in a recommender system depends hugely on the filtering algorithm deployed. (Vozalis E.
and Margaritis, K.,G., 2003)
The inputs for content-based recommender systems fall into one of these categories:
1. Ratings reflect the opinion of users on the items, and most commonly ratings are either in the form
of a place on a scale or a binary value, that means 1 or 0, yes or no.
2. Demographic data refer to information of the users such as gender, education, address, a list of
personal preferences that require user input, and these can be hard to obtain unless certain incentives
were given.
3. Content data is the textual analysis of the document that contains the item’s physical dimensions,
composing components, functionalities, etc. This featured information can be used as input to filtering
algorithms that can match the item to relevant user’s profiles and eventually provide recommendations.
The collection of data can happen either implicitly or explicitly. An explicit data collection entails a
gathering process that requires manual input. The user will need to type in his demographic data by
him/herself on the profile page in the registering process, set ratings, or write a comment about the item
they have purchased or experienced, rank a list of items or create a list of favorite items, choose in
between two options. 2
An implicit data collection entails a gathering process that does not require manual input. Implicit data
include observed user behavior and derived user preferences that are gathered during the user’s
interaction with the service interface. Each human-computer interaction is a recorded as a transaction.
The transaction can be for instance a view, one track play, one purchase. This transaction information
and the analysis of the user’s historical transaction information can be used as input to collaborative
filtering algorithms. In simpler terms, implicit data collection can include:
2
http://en.wikipedia.org/wiki/Recommender_system
5
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
-
The users’ item viewing pattern
-
Item viewing times
-
A purchase history
-
A wishing list
-
Similar likes and dislikes in the user’s social network
Hongzhan Hu
honhu753
850325-2333
Implicit data collection tends to be preferred since most users prefer to not have the trouble of inputting
personal data surrender commercial services on the web or simply prefer not to bother typing in. These
types of data are used in collaborative filtering recommender systems.
Vozalis E. et al. (2003) claims that the output of a recommender system can either be a prediction or a
recommendation.
- A prediction shows the degree of anticipation of the item corresponding to the relevant active user. A
scale from 1 to 5, for example, can be used as reference of anticipation level. This type of system
output is also known as individual scoring.
- A recommendation is a list of N items that the target user is expected to like. This list normally only
contains items that the user has not purchased, viewed or rated according to Vozalis E. et al. (2003).
This variant of system output is also known as Top-N recommendation or Ranked Scoring.
.
6
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
Schools of recommendation approaches
Content-based recommender systems (CBRS)
A content-based recommender system assumes each user to operate independently, and
each item is represented by features or characteristics. Each user’s profile is created or
updated according to each feedback on the desirability of the items that were listed in a
list of recommendations. Component filtering, which takes account of user profile and
item features, generates the list of the recommendation. Of course each time an active
user gives a feedback such as ratings, the systems learns the user preference and
corresponds into features that are desired and updates the user profile of preference for
later generation of the most concurrent user recommendations. A detailed general
architecture of CBRS is shown in Chart 1 by Semeraro G. (2010).
For the advantages, CBRS is has total user independence and rely solely on active
individual users to build their own profiles, and does not rely on data on other users as in
CFRS. CBRS has an excellent transparency. It can, in fact, list all the content-features
that lead to the recommendation to provide an explanation. Amazon.com recommends
often a list of items at the shopping cart page based on what are in the cart and past
shopping history. If you have bought a camera, most of the recommended items are
accessories to that specific camera you have bought. There is no problem with new
items when CBRS is deployed since the system extracts item content-features from the
data sheets, so the item gets to be recommended even though there has never been any
user rating the items.
7
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
Figure1 (Semerar G. 2010)
For the short-comings, CBRS relies heavily on the quality of the content-features of the
items. The content data cannot always contain enough information or meaningful
features to single out likable items from the bunch of items that the user does not like,
and match the user’s interests. Keywords are naturally used by human to relate to an
item, but they are unfortunately not suitable for representing content due to polysemy,
synonymy, and multi-word concepts. (Eco,U. 2007) Simply put, the recommendation
may be really off the topic if the keyword the system tries to match has for instance,
ambiguities, or it is synonym to abbreviations and such. Apple can be both a computer
company and a fruit. CBRS can be overspecialized too. The recommended items may
always come from the same categories of items, and that means there won’t be any
surprises. Also the recommended items can be too obvious and useless for users who
already know a lot within one category of items. Also, there is the serendipity problem.
(McNee S.M. et al 2006) You might start up watching one cat video on YouTube and
8
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
end up spending hours on cat videos just because the suggested video list keeps on
suggesting you videos in the same category. This is known as the homophily trap.
(Zuckerman E. 2008) CBRS is also known as not capable of generating accurate
predictions from complex items such as movies and music tracks that are not easily
described with content information which CFRS takes as an advantage.
Collaborative filtering recommender systems (CFRS)
If recommendations made by CBRS are based on individual’s preferences, then CFRS
gives recommendations based on the preferences of other similar users. This works
perfectly with non-describable items such as movies and music which CBRS has
difficulties in recommending. This is CFRS’s huge advantage, and this is the reason for
CFRS to be more popular. CFRSs make Similarity decisions, and the cosine angle
computation or Pearson’s correlation is mainly used in clustering users and items. (Chen
A.Y. and McLoed D. 2007)
Typically, a CFRS starts with using the cosine measure to decide the similarity between
each two items a customer has purchased, and creates a matrix of item-item
relationships. The combination of genetic algorithm and Naïve Bayes Classifier are
used for defining item-user relationships. (Ko et al., 2001). Genetic algorithm or nearestneighbor algorithm forms clusters of system users with similar taste, and Naïve Bayes
Classifier defines the association between the items. The recommender system then
makes similarity decisions by matching the cluster of users and the cluster of items and
the cluster of users, and user profiles are formed with associated rules. User-user
relationships are described with hierarchical models in the form of an index of
categories. (Jung et al., 2001) In the index, or tree structure, each node contains a
9
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
cluster of users with similar preferences. A recommendation is made based on all the
correlations between users, items, also user-item relationships in CFRS. (Chen A.Y.
and McLoed D. 2007)
For the shortcomings, CFRS may have privacy concerns, cold start problem, scalability
problem, gray sheep problem and updates frequency problems etc.
CFRS monitors system users’ behavior patterns. Although it is for making
recommendations, not users may be uncomfortable with their browsing histories
recorded and use the system as an unregistered user or disable cookies executed for
running the CFRS on the browsers. On one hand, the user may not fully utilize the
recommender system. On the other hand the system may not suffice the quantity of the
observational information of users’ behavior patterns to provide accurate predictions.
Cold start problem is frequently faced when the system has a new item or a new user. A
new item may not be rated or labeled. Such items are easily abandoned in the
recommendation process due to the lack of association with other well rated and labeled
items. In the case of a new user, the system may present poor accuracy in
recommendations due to, once again lack of association.
If the scale of the user profiles is small or the user has strange taste, which is referred to
as grey sheep, system accuracy can be poor due to that the similarity decisions can’t be
made. Users don’t always have the same taste in items, and when the user’s taste
changes, the user system profile need to be changed too. The problem is how often the
system should update its users’ profiles. Too many updates may require enormous
computational power, and too few updates can’t keep up with the recommendation
10
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
accuracy.
Hybrid recommender systems
In some new recommender systems, for compensating collaborative filtering’s shortcomings, content-based filtering is simultaneously involved to increase system accuracy
on user preference. And such a system that utilized both techniques is called a hybrid
recommender system. (Balabanovic & Shoham, 1997). A hybrid recommender system
solves the problem with extreme cases coverage that a simply CFRS can’t handle.
(Chen A.Y. and McLoed D. 2007)
“In many ways, collaborative and content-based approaches provide
complimentary capabilities. Collaborative methods are best at recommending
reasonable well-known items to users in communities of similar tastes when
sufficient user data is available but effective content information is not. Contentbased methods are best at recommending unpopular items to users with unique
tastes when sufficient other data is unavailable but effective content information is
easy to obtain.” (Mooney 2000)
11
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
Conclusion and development orientation
The two basic schools of recommendation approaches each have their advantages and
disadvantages in this matter. Most of the limitations in each one of the approaches can
be complimented by the other. Much said about the recommendation accuracy. People
often experience the homophily trap with current recommender system. That is all the
users in one cluster get the same recommendations, and they never get out of that circle
of items. A good recommender system should be able to provide positive surprises from
time to time and also provide alternative recommendations to break the fatigue of the
users seeing the same items in the recommendation list.
Future recommendation systems should be dynamic, and the profiles should be able to
be updated in real time. This and the synchronization of various profiles implies the
need of huge amount of computational power, network bandwidth etc. Current
algorithms and techniques all have relatively high memory computational complexity,
and that leads to long system processing time and data latency. Therefore, new
algorithms and techniques that can reduce memory computational complexity eventually
eliminate synchronization problems will be the one of development orientations.
12
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
References
Chen, A.Y. and McLoed D. (2006) Collaborative Filtering for Information Recommendation
Recommendation Systems.
Eco, U., Sator arepo eccetera. (2007) Bompiani, (in Italian).
Goldberg, D.; Nichols, D.; Oki, B. M.; and Terry, D. (1992) Using collaborative filtering to weave an
information tapestry. Commun. ACM 35, 12 , 61—70.
Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. (2004) , Evaluating collaborative filtering
recommender systems. ACM Transaction on Information Systems 22(1), 5–53
Jung, J. J., Yoon, J. S., & Jo, G. S. (2001). Collaborative information filtering by using categorized
bookmarks on the web. Web Knowledge Management and Decision Support. 14th International
Conference on Applications of Prolog. Revised papers (Lecture Notes in Artificial Intelligence
Vol.2543). Berlin, Germany : Springer-Verlag. x+305, 237-50.
Ko, S. J., & Lee, J. H. (2001). Discovery of user preference through genetic algorithm and Bayesian
categorization for recommendation. Conceptual Modeling for New Information Systems Technologies,
ER 2001 Workshops, HUMACS, DASWIS, ECOMO, and DAMA. McNee, S.M. , Riedl, J. and
Konstan J. (2006) Accurate is not always good: How accuracy metrics have hurt recommender
systems. In Extended Abstracts of the 2006 ACM Conference on Human Factors in Computing Systems,
pages 1-5, Canada.
Mooney, R. J., (2000) Content-based book recommending using learning for text categorization
International Conference on Digital Libraries, Proceedings of the fifth ACM conference on Digital
13
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
libraries,San Antonio, Texas, USA, Pages: 195 – 204
Schafer, J. B.; Konstan, J. and Riedl J. (1999) Recommender system in e-commerce, EC
'99 Proceedings of the 1st ACM conference on Electronic commerce
Vozalis, E., Margaritis, K.,G. (2003) Analysis of Recommender Systems Algorithms, presented at
HERCMA, Athens, Greece, 2003, pp. 732-745
Zuckerman, E., (2008) Homophily, serendipity, xenophilia,
http://www.ethanzuckerman.com/blog/2008/04/25/homophily-serendipity-xenophilia/
Computer Science Vol.2465). Berlin, Germany : Springer-Verlag,. xvii+500, 471-84.
14
Artificial Inteligence
729G11
An Initial Study of Recommender Systems
Hongzhan Hu
honhu753
850325-2333
Figures
Figure 1, Semeraro, G. and the SWAP group (2010). Content-based recommender systems problems,
challenges and research directions. [ONLINE] Available at: http://ls13-www.cs.unidortmund.de/homepage/ITWP2010/slides/semeraro.pdf. [Last Accessed 2012-11-16].
Read more: http://www.americanessays.com/tool-box/apa-format-citation-generator/#ixzz2CLbfnZjV
15
Download