Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 An Initial Study on Recommender Systems Abstract Recommender systems emerged and became increasingly popular throughout the past decade in its application of E-Commerce. This article introduces and discusses the different types of recommender systems. They use implicit and explicit data collections, and various algorithms to yield recommendations. The shortcomings of the systems are also being discussed. Sequential decision processing seems to be favored due to its commercial benefits and more widely used. And the mobile applications of recommender systems spread the use even further and deeper into the consumers. Key Words: Recommender Systems, Sequential decision process, MDP-based recommendation 1 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 Background We often make choices without sufficient personal experience, and we rely on other people’s recommendations in our everyday life. (Resnick P. et al 1997) The earliest systems were traditional information-filtering and information-retrieval systems which could not really recommend rather than giving certain results accordingly to the requests. Tapestry experimental mail filtering system was introduced by Goldberg D. et al (1992) developed at Xerox Palo Alto Research Center. Tapestry was designed to deal with both content-base and collaborative filtering. The collaborative part then entailed people helping out each other by recording their reactions or annotations to documents they had read explicitly. This was the real first recommender system. Worth mentioning that the same research center is also the place personal computers were invented. Recommender systems assist and augment the natural process of decision making. (Resnick P. et al 1997) Recommender systems are widely used in E-commerce, entertainment, content-consumption, and service industry, nowadays on internet services for helping the users finding the items they want and boost commercial benefits for the merchants or service provider. (Ricci, F. et al. 2011) Amazon.com, YouTube, Facebook, Ebay, Netflix, iTunes, IMDB and Yelp are some of the big names you may think of when mentioning about recommender systems. In fact, recommender systems play such an important role that Netflix had started a competition for increasing the accuracy of the recommendation their current system could yield. In 2009, they awarded the winning team Belkor’s pragmatic chaos a million-dollar prize over the 10% increase of accuracy the team had offered with their recommender system, which was really a combination of numerous types of algorithms that compensates each other’s disadvantages with their advantages.1 A recommender system predicts a particular user’s likelihood to give a certain item high rating according to the characteristics of the item or what other people with similar taste think about the item. Correspondingly, the systems use content-based approach and collaborative filtering approach. And this helps people to cope with information overload. 1 http://www.netflixprize.com//community/viewtopic.php?id=1537 2 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 Services provided by recommender systems Recommender systems are widely used by online stores for they improve user convenience and store benefits. Schafer J. B. et al (1999) Claimed that E-commerce sales can be enhanced by recommender system in three ways. It turns browsers into buyers, cross-sell items that are suggested at the checkout page, increase users’ loyalty by making the purchase only a few clicks away or awarding frequent customers with good deals and such. Herlocker et al. (2004) has a list of eleven popular tasks a recommender system can assist to implement: - Find some good items: A featured list of ranked items found that fit the user’s requirements. Prisjakt.nu features this function when a text string is entered in the search field, and dynamic search result will should a limited list of items that match the search text string. - Find all good items: A list of all the items that satisfy all the criteria the user set from the item database. On prisjakt.nu, after entering the search string, hitting the Enter key will show the entire list of items matching the search string. - Annotation in text: A list of items that are recommended according to the current context and the long term user preference. A certain TV series on a certain channel can be recommended according to the user’s long term viewing habits. - Recommend a sequence: A list of the item been searched and some related items that are not necessarily fitting search criteria, but it may be interesting for the user. A typical phrase would be: “If you like this, you may also like that”. - Recommend a bundle: A list of related items that can work together to serve a purpose better for the user. Typically when you buy a camera, you may consider buying a memory card, a pouch and complete the purpose of the camera. - Just browsing: For users that browse without a prominent purpose, the recommender system’s task is to assist the user to browse items within the scope that are interesting to the user at that specific browsing session. - Find credible recommender: There are users that are skeptical of the recommendation yielded by the system. The recommender system’s task is then to allow the user to test the system’s behavior 3 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 -Improve the profile: The system can take inputs from the user about his/her likes and dislikes, in general, explicit preference information. Otherwise the system can only treat a user with empty preference as an average user with all the average preferences compared to other users in the system. - Express self: Some users care little about recommendations, but it is important for them to be able to express their opinions and beliefs of certain item. A comment section is where the system can take such inputs, and also the satisfaction it creates can leverage as a motivation for purchasing the item related. - Help others: Certain users may be even more motivated to leave a full review or rating of the item due to their belief in this benefiting the community. And that can be hugely motivational for other potential buyers to set their minds. - Influencing others: Certain users could be exclusively influential, trying to convince other users buying or nor buying the product. Even malicious user can fall into this category. This huge list of the popular tasks a recommender system can encounter is a classical reference within the field. It demonstrates a huge variety of diverse data and knowledge that a recommender system may need dealing with and consequently various recommendation approaches shall take place targeting different scenarios. This list of tasks are actually been used to evaluate recommender systems and algorithms The reasons for recommender systems being deployed by modern service providers are: They can boost sales for certain items; promote unpopular items that may be highly interested by relevant customers; increase user satisfaction and help understand the users’ desires better, and lock on loyal customers. (Ricci, F. et al. 2011) 4 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 Data inputs and outputs in recommender systems The input in a recommender system depends hugely on the filtering algorithm deployed. (Vozalis E. and Margaritis, K.,G., 2003) The inputs for content-based recommender systems fall into one of these categories: 1. Ratings reflect the opinion of users on the items, and most commonly ratings are either in the form of a place on a scale or a binary value, that means 1 or 0, yes or no. 2. Demographic data refer to information of the users such as gender, education, address, a list of personal preferences that require user input, and these can be hard to obtain unless certain incentives were given. 3. Content data is the textual analysis of the document that contains the item’s physical dimensions, composing components, functionalities, etc. This featured information can be used as input to filtering algorithms that can match the item to relevant user’s profiles and eventually provide recommendations. The collection of data can happen either implicitly or explicitly. An explicit data collection entails a gathering process that requires manual input. The user will need to type in his demographic data by him/herself on the profile page in the registering process, set ratings, or write a comment about the item they have purchased or experienced, rank a list of items or create a list of favorite items, choose in between two options. 2 An implicit data collection entails a gathering process that does not require manual input. Implicit data include observed user behavior and derived user preferences that are gathered during the user’s interaction with the service interface. Each human-computer interaction is a recorded as a transaction. The transaction can be for instance a view, one track play, one purchase. This transaction information and the analysis of the user’s historical transaction information can be used as input to collaborative filtering algorithms. In simpler terms, implicit data collection can include: 2 http://en.wikipedia.org/wiki/Recommender_system 5 Artificial Inteligence 729G11 An Initial Study of Recommender Systems - The users’ item viewing pattern - Item viewing times - A purchase history - A wishing list - Similar likes and dislikes in the user’s social network Hongzhan Hu honhu753 850325-2333 Implicit data collection tends to be preferred since most users prefer to not have the trouble of inputting personal data surrender commercial services on the web or simply prefer not to bother typing in. These types of data are used in collaborative filtering recommender systems. Vozalis E. et al. (2003) claims that the output of a recommender system can either be a prediction or a recommendation. - A prediction shows the degree of anticipation of the item corresponding to the relevant active user. A scale from 1 to 5, for example, can be used as reference of anticipation level. This type of system output is also known as individual scoring. - A recommendation is a list of N items that the target user is expected to like. This list normally only contains items that the user has not purchased, viewed or rated according to Vozalis E. et al. (2003). This variant of system output is also known as Top-N recommendation or Ranked Scoring. . 6 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 Schools of recommendation approaches Content-based recommender systems (CBRS) A content-based recommender system assumes each user to operate independently, and each item is represented by features or characteristics. Each user’s profile is created or updated according to each feedback on the desirability of the items that were listed in a list of recommendations. Component filtering, which takes account of user profile and item features, generates the list of the recommendation. Of course each time an active user gives a feedback such as ratings, the systems learns the user preference and corresponds into features that are desired and updates the user profile of preference for later generation of the most concurrent user recommendations. A detailed general architecture of CBRS is shown in Chart 1 by Semeraro G. (2010). For the advantages, CBRS is has total user independence and rely solely on active individual users to build their own profiles, and does not rely on data on other users as in CFRS. CBRS has an excellent transparency. It can, in fact, list all the content-features that lead to the recommendation to provide an explanation. Amazon.com recommends often a list of items at the shopping cart page based on what are in the cart and past shopping history. If you have bought a camera, most of the recommended items are accessories to that specific camera you have bought. There is no problem with new items when CBRS is deployed since the system extracts item content-features from the data sheets, so the item gets to be recommended even though there has never been any user rating the items. 7 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 Figure1 (Semerar G. 2010) For the short-comings, CBRS relies heavily on the quality of the content-features of the items. The content data cannot always contain enough information or meaningful features to single out likable items from the bunch of items that the user does not like, and match the user’s interests. Keywords are naturally used by human to relate to an item, but they are unfortunately not suitable for representing content due to polysemy, synonymy, and multi-word concepts. (Eco,U. 2007) Simply put, the recommendation may be really off the topic if the keyword the system tries to match has for instance, ambiguities, or it is synonym to abbreviations and such. Apple can be both a computer company and a fruit. CBRS can be overspecialized too. The recommended items may always come from the same categories of items, and that means there won’t be any surprises. Also the recommended items can be too obvious and useless for users who already know a lot within one category of items. Also, there is the serendipity problem. (McNee S.M. et al 2006) You might start up watching one cat video on YouTube and 8 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 end up spending hours on cat videos just because the suggested video list keeps on suggesting you videos in the same category. This is known as the homophily trap. (Zuckerman E. 2008) CBRS is also known as not capable of generating accurate predictions from complex items such as movies and music tracks that are not easily described with content information which CFRS takes as an advantage. Collaborative filtering recommender systems (CFRS) If recommendations made by CBRS are based on individual’s preferences, then CFRS gives recommendations based on the preferences of other similar users. This works perfectly with non-describable items such as movies and music which CBRS has difficulties in recommending. This is CFRS’s huge advantage, and this is the reason for CFRS to be more popular. CFRSs make Similarity decisions, and the cosine angle computation or Pearson’s correlation is mainly used in clustering users and items. (Chen A.Y. and McLoed D. 2007) Typically, a CFRS starts with using the cosine measure to decide the similarity between each two items a customer has purchased, and creates a matrix of item-item relationships. The combination of genetic algorithm and Naïve Bayes Classifier are used for defining item-user relationships. (Ko et al., 2001). Genetic algorithm or nearestneighbor algorithm forms clusters of system users with similar taste, and Naïve Bayes Classifier defines the association between the items. The recommender system then makes similarity decisions by matching the cluster of users and the cluster of items and the cluster of users, and user profiles are formed with associated rules. User-user relationships are described with hierarchical models in the form of an index of categories. (Jung et al., 2001) In the index, or tree structure, each node contains a 9 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 cluster of users with similar preferences. A recommendation is made based on all the correlations between users, items, also user-item relationships in CFRS. (Chen A.Y. and McLoed D. 2007) For the shortcomings, CFRS may have privacy concerns, cold start problem, scalability problem, gray sheep problem and updates frequency problems etc. CFRS monitors system users’ behavior patterns. Although it is for making recommendations, not users may be uncomfortable with their browsing histories recorded and use the system as an unregistered user or disable cookies executed for running the CFRS on the browsers. On one hand, the user may not fully utilize the recommender system. On the other hand the system may not suffice the quantity of the observational information of users’ behavior patterns to provide accurate predictions. Cold start problem is frequently faced when the system has a new item or a new user. A new item may not be rated or labeled. Such items are easily abandoned in the recommendation process due to the lack of association with other well rated and labeled items. In the case of a new user, the system may present poor accuracy in recommendations due to, once again lack of association. If the scale of the user profiles is small or the user has strange taste, which is referred to as grey sheep, system accuracy can be poor due to that the similarity decisions can’t be made. Users don’t always have the same taste in items, and when the user’s taste changes, the user system profile need to be changed too. The problem is how often the system should update its users’ profiles. Too many updates may require enormous computational power, and too few updates can’t keep up with the recommendation 10 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 accuracy. Hybrid recommender systems In some new recommender systems, for compensating collaborative filtering’s shortcomings, content-based filtering is simultaneously involved to increase system accuracy on user preference. And such a system that utilized both techniques is called a hybrid recommender system. (Balabanovic & Shoham, 1997). A hybrid recommender system solves the problem with extreme cases coverage that a simply CFRS can’t handle. (Chen A.Y. and McLoed D. 2007) “In many ways, collaborative and content-based approaches provide complimentary capabilities. Collaborative methods are best at recommending reasonable well-known items to users in communities of similar tastes when sufficient user data is available but effective content information is not. Contentbased methods are best at recommending unpopular items to users with unique tastes when sufficient other data is unavailable but effective content information is easy to obtain.” (Mooney 2000) 11 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 Conclusion and development orientation The two basic schools of recommendation approaches each have their advantages and disadvantages in this matter. Most of the limitations in each one of the approaches can be complimented by the other. Much said about the recommendation accuracy. People often experience the homophily trap with current recommender system. That is all the users in one cluster get the same recommendations, and they never get out of that circle of items. A good recommender system should be able to provide positive surprises from time to time and also provide alternative recommendations to break the fatigue of the users seeing the same items in the recommendation list. Future recommendation systems should be dynamic, and the profiles should be able to be updated in real time. This and the synchronization of various profiles implies the need of huge amount of computational power, network bandwidth etc. Current algorithms and techniques all have relatively high memory computational complexity, and that leads to long system processing time and data latency. Therefore, new algorithms and techniques that can reduce memory computational complexity eventually eliminate synchronization problems will be the one of development orientations. 12 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 References Chen, A.Y. and McLoed D. (2006) Collaborative Filtering for Information Recommendation Recommendation Systems. Eco, U., Sator arepo eccetera. (2007) Bompiani, (in Italian). Goldberg, D.; Nichols, D.; Oki, B. M.; and Terry, D. (1992) Using collaborative filtering to weave an information tapestry. Commun. ACM 35, 12 , 61—70. Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. (2004) , Evaluating collaborative filtering recommender systems. ACM Transaction on Information Systems 22(1), 5–53 Jung, J. J., Yoon, J. S., & Jo, G. S. (2001). Collaborative information filtering by using categorized bookmarks on the web. Web Knowledge Management and Decision Support. 14th International Conference on Applications of Prolog. Revised papers (Lecture Notes in Artificial Intelligence Vol.2543). Berlin, Germany : Springer-Verlag. x+305, 237-50. Ko, S. J., & Lee, J. H. (2001). Discovery of user preference through genetic algorithm and Bayesian categorization for recommendation. Conceptual Modeling for New Information Systems Technologies, ER 2001 Workshops, HUMACS, DASWIS, ECOMO, and DAMA. McNee, S.M. , Riedl, J. and Konstan J. (2006) Accurate is not always good: How accuracy metrics have hurt recommender systems. In Extended Abstracts of the 2006 ACM Conference on Human Factors in Computing Systems, pages 1-5, Canada. Mooney, R. J., (2000) Content-based book recommending using learning for text categorization International Conference on Digital Libraries, Proceedings of the fifth ACM conference on Digital 13 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 libraries,San Antonio, Texas, USA, Pages: 195 – 204 Schafer, J. B.; Konstan, J. and Riedl J. (1999) Recommender system in e-commerce, EC '99 Proceedings of the 1st ACM conference on Electronic commerce Vozalis, E., Margaritis, K.,G. (2003) Analysis of Recommender Systems Algorithms, presented at HERCMA, Athens, Greece, 2003, pp. 732-745 Zuckerman, E., (2008) Homophily, serendipity, xenophilia, http://www.ethanzuckerman.com/blog/2008/04/25/homophily-serendipity-xenophilia/ Computer Science Vol.2465). Berlin, Germany : Springer-Verlag,. xvii+500, 471-84. 14 Artificial Inteligence 729G11 An Initial Study of Recommender Systems Hongzhan Hu honhu753 850325-2333 Figures Figure 1, Semeraro, G. and the SWAP group (2010). Content-based recommender systems problems, challenges and research directions. [ONLINE] Available at: http://ls13-www.cs.unidortmund.de/homepage/ITWP2010/slides/semeraro.pdf. [Last Accessed 2012-11-16]. Read more: http://www.americanessays.com/tool-box/apa-format-citation-generator/#ixzz2CLbfnZjV 15