Cold-start Collaborative Filtering Based on User Registration Process Peng-yu Zhu 1, Zhong Yao2 1 2 School of Economics and Management, Beihang University, Beijing, China School of Economics and Management, Beihang University, Beijing, China (zhupengyu@sem.buaa.edu.cn) Abstract - A key challenge in recommender system research is how to make recommendations to new users. Recently the idea of solving the problem within the context of learning user and item profiles has been proposed. Those methods constructed a decision tree for the initial interview, enabling the recommender to query a user adaptively according to her prior responses. However, those methods have overlooked the new users’ personal attributes. In this paper, we present the method CCFBURP, which constructs an algorithm with two steps, in the first of which we screen neighbors of the target user, using its personal attributes, while in the second of which we train the interview model on the dataset constituted of the neighbors and alternative projects. Then the recommender system forecasts goal of optional project ratings of the target user. Experimental results on the MovieLens dataset demonstrate that the proposed CCFBURP algorithm significantly outperforms existing methods for cold-start recommendation. Keywords - Collaborative filtering, Recommender systems, Cold-start problem, User Registration Process I. INTRODUCTION Increasing people have declined to purchase interesting items from Internet, yet the boom of information relevant to customers, products and transactions has lead to information overload problem in E-Commerce[1]. Meanwhile, in order to supply customers with various personal services, personalized recommender systems with recommendation techniques have been widely applied, which have already been considered one of the most important methods of personal service in websites[2]. The developers of Tapestry (one of the first recommender systems) coined the phrase “collaborative filtering (CF),” which has been widely adopted in practice[3]. A key challenge for building an effective CF recommender system is the well-known cold-start problem - How to provide recommendations to new users? New users are unlikely to be given good recommendations for the lack of their rating or purchase history[4]. Pure collaborative filtering methods base their recommendations on community preferences (e.g., user ratings and purchase histories), which would ignore user and item attributes (e.g., demographics and product descriptions) [5, 6, 7, 8, 9]. A vast amount of methods have been introduced to solve the cold-start problem. Schein et al. proposed the aspect model latent variable method for cold- start recommendation, which combines both collaborative and content information in model fitting[10]. Kim and Li proposed a probabilistic model to address the cold-start problem, in which items are classified into groups, and predictions are made for users considering the Gaussian distribution of user ratings[11]. On the other hand, much of the collaborative filtering literature have focused on factor models recently, for instance, variants of the singular value decomposition (SVD)[12]. However, in code recommender systems, the available data consists of calls to methods in contexts. This data seems to be binary: given a context- method pair, it predicts whether or not the method is called within this context[13] . A natural approach to solving the cold-start problem is to elicit new user preferences by querying users’ responses progressively through an initial interview process[14]. Specifically, at the visit of a new user, the recommender provides a seed item as a question and asks the user for her opinion; based on which the recommender gradually refines the user’s characterization that can provide more satisfactory recommendations to the user in the future. Ke Zhou et al. proposed the Functional Matrix Factorizations (FMF) method based on the thought of query responses and functional matrix factorizations[15]. Their method has been demonstrated to be effective in cold-start recommendation problems by experimental results. The shortage of FMF is that it has overlooked the users’ personal attributes, for instance, their age, sex, vocation, location. In this paper, we present a method Cold-start Collaborative Filtering Based on User Registration Process(CCFBURP), i.e. a novel cold-start recommendation method that estimates users’ preferences through collecting information in their registration process. Our proposed method tends to explore the correlation between users more effectively . In CCFBURP, we construct a two-step algorithm, in the first of which we screen neighbors of the target user using its personal attributes, and in the second we train the interview model on the dataset constituted of the neighbors and alternative projects. Then the recommender system forecasts goal of optional project ratings of the target user. Experimental results on the MovieLens dataset demonstrate that the proposed CCFBURP algorithm significantly outperforms existing methods for cold-start recommendation. The remainder of this paper is organized as follows: In Section 2, matrix factorization for collaborative filtering are introduced in the first place, followed by the presentation of functional matrix factorization for constructing the interview process of cold-start collaborative filtering by restricting the user profiles to be a function in the form of a decision tree. In Section 3, FMF is optimized with adopting users’ personal attributes, and then the progress of CCFBURP is constructed. Then, the evaluation of the proposed method on the MovieLens dataset is carried on and the results in Section 4 is analyzed. Finally in Section 5, the conclusion is drawn along with presenting several future directions. II. FUNCTIONAL MATRIX FACTORIZATION DECISION TREE In this section, we describe the functional matrix factorization (FMF) method for cold-start collaborative filtering which explores the well-known matrix factorization methods for constructing the interview process. The key innovation is that we parameterize user profiles to be a function of the responses to the possible questions of the interview process and use matrix factorization to compute the profiles. A. Low Rank Approximation of the Rating Matrix Consider tabulated data, organized in the observed matrix R R nm , which we seek to approximate by a product of two matrices R U V .So we got T rij u ki v kj u i v j , i 1, n, j 1, m k where rij corresponds to the rating of item j by user i. Given the set of known ratings, the parameters u i and v j can be estimated through fitting the training data by solving the following optimization problem: T min (rij ui v j ) 2 ui ,v j {( i , j )}O The problem can be solved by existing numerical optimization methods. In our implementation, we use the alternating optimization for its amenability for the coldstart settings. Specifically, the optimization process performs the following two updates alternatively. First, for i 1,2, , n , minimizing with respect to u i with all u i , j i and all ui arg min ui (r ( i , j )O ij v j fixed: ui v j ) 2 T it is a linear regression problem with squared loss. The closed form solution can be expressed as 1 T ui v j v j rij v j (i , j )O (i , j )O Similarly, for j 1,2, , m , we can approximate v j with the closed form solution which can be expressed as T v j ui ui (i , j )O 1 rijui (i , j )O B. Functional Matrix Factorization Now we consider constructing the interview process for cold-start collaborative filtering. Assume that a new user registers at the recommendation system and nothing is known about her. To capture the preferences of the user, the system initiates several interview questions to query the responses from the user. Based on the responses, the system constructs a profile for the user and provides recommendations accordingly. We propose to parameterize the user profile u i in such a way that the profile u i is tied to user i’s responses in the form of a function. Assume there are P possible interview questions, and an answer to a question takes value in the finite set {1,1,0}, representing “Dislike”, “Like” and “Unknown”, respectively. Then, we introduce a P-dimensional vector a i , which denotes the representing the answers of user i to the P questions. And we tie the profile to the answers by assuming ui T ai , where T is a function that maps the responses a i to the user profile u i R K .So we get rij v j T ai . T Our goal is to learn both T and v j from the observed ratings. To this end, substituting ui T ai into the low rank matrix factorization model, we have the following optimization problem: T , V arg min T ,V (r ( i , j )O ij v j T (ai )) 2 V T 2 (1) This objective function can be optimized through an alternating minimization process. 1. Given T (a ) , we can compute v j by regularized least square regression. v j arg min vj (r ij ( i , j )O v j T (ai )) 2 v j T 2 This problem has a closed-form solution given by 1 (2) v j T (ai )T (ai )T I rijT (ai ) (i , j )O (i , j )O where I is the identity matrix of appropriate size. 2. Given v j , we try to fit a decision tree T (a ) such that T arg min T (r ( i , j )O ij T (ai )T v j ) 2 (3) To reduce the implementation and computational complexity, we address this problem by proposing an efficient greedy algorithm for finding an approximate solution. C. Decision Tree Construction Starting from the root node, the set of users at current node are partitioned into three disjoint subsets RL ( p) , RD ( p) and RU ( p) corresponding to “Like”, “Dislike” and “Unknown” of their responses to the interview question p: R L ( p) {i | a ip " Like"} If xif or x jf is null value, or xif = x jf =0, ij f =0. In ij f =1. RD ( p) {i | aip " Dislike "} other cases, RU ( p) {i | aip "Unknown"} 1.When attribute f is a dyadic scalar or a nominal variable. To find the optimal question p that leads to the best split, we minimize the following objective: T T min (rij u L v j ) 2 (rij u D v j ) 2 p xif and x jf corresponds to attribute f of u i and u j . iRL ( p ) ( i , j )O iRD ( p ) ( i , j )O (r uU v j ) 2 T ij iRU ( p ) ( i , j )O (4) where u L , u D and u U are the optimal profiles for users in the child nodes corresponds to the answers of “Like”, “Dislike” and “Unknown”, respectively: uL arg min (ri j u T v j ) 2 iRL ( p ) ( i , j )O u uD arg min u uU arg min u (r uT v j )2 (r uT v j )2 iRD ( p ) ( i , j )O iRU ( p ) ( i , j )O ij ij If For dyadic scalars, the dissimilarity could be calculated through the simple matching method. pm d i, j p Where p corresponds to the number of all variables, and m corresponds to the number of variables that user i and j matched. 2.When attribute f is an interval variable. xif x jf d ij f max x hf min x hf h After the root node is constructed, its child nodes can be constructed in a similar way, recursively. Finally, we can get the users’ profiles when he arrived in a leaf node of the decision tree. Then, the estimated rating of user i to item j can be expressed as rij v j T ui . III. COLD-START COLLABORATIVE FILTERING BASED ON USER REGISTRATION PROCESS A. User Dissimilarity Matrix FMF assumed that a new user who registered at the recommendation system was a black stranger before she answered the interview questions. However, thanks to her registration process, we do know something about her personal attributes, such as age, sex, vocation, location, etc. Generally, people with similar attributes are likely to share their interests in similar things. For example, most boys at the age of 15~20 likes The Pirates of the Caribbean and Harry Potter, while the men of 55~60 ages prefer Casablanca. Therefore, it will be more accurately for new users to recommend the resources which are enjoyed by the similar users, who have similar personal attributes with the new ones. We introduced User Dissimilarity Matrix to measure the dissimilarity between two users. Assume there are p kinds of personal attributes in the user dataset, dissimilarity between u i and u j can be expressed as: d i, j f 1 p f 1 f h Where h includes all non-null objects of attribute f. For interval variables, the dissimilarity is usually calculated by distance between users. Euclidean distance has been most widely used: d i, j 2 2 xi1 x j1 xi 2 x j 2 xip x jp B. Cold-start Collaborative Filtering Based on User Registration Process To construct the Cold-start Collaborative Filtering Based on User Registration Process, the user dissimilarity matrix must be structured as a necessary preparation. And we should choose the appropriate measures of users’ attributes. In the MovieLens dataset, users’ attributes were stored in table u.user, which had a tab separated list of user id, age, gender, occupation and zip code. We defined the variables in Eq. (5) as follows: 1.All ij f =1, f 1,4 , for all users’ attributes are not null value. 2.If the the age difference between the two users is no more than 5, dij1 =0. If the difference is more than 5, dij1 =1. 3.If the two users’ genders are the same, (5) 2 In the user dissimilarity matrix, d (i, j ) corresponds to the dissimilarity between user i and user j. Given a threshold , if d i, j user i and j could be a neighbor to each other. p ij f d ij f xif = x jf , d ij f =0, or d ij f =1. dij1 =0, or dij1 =1. 4.If the two users’ occupations are the same, ij or dij1 =1. dij1 =0, users’ zip codes start with a same number, dij1 =0. If not, dij1 =1. We suggested a registration progress for new users, which consists of two parts. First, the new users registered an account and provided her personal information. Then we computed the dissimilarities between the new user and old ones. After that, we could find her neighbors from the existing users whose dissimilarities with her were less than or equal to the given . These neighbors’ rating to all items constituted her modelling data set, witch was denoted as R * . Afterwards the n items that had got the most ratings in R * were selected to be possible interview questions. Second, we trained the decision tree T (a ) and reduced data, which contains 1682 movies, 943 users, 100000 ratings, and each user in the reduced data has rated at least 20 movies. We split the users into two disjoint subsets, the training set and the test set, containing 80% and 20% users. Then we split the items in the test set into two disjoint subsets, the answer set containing 80% items, which is used to generate the user responses in the interview process, while the evaluation set containing the rest 20% items, which is used to evaluate the performance after the interview process. The data set is divided as in Figure 1, where A represents the training set, B is the answer set, and C is the evaluation set. Items A Users 5. As we know, the first number of zip code in USA corresponds to several neighboring states, and we supposed that people who live in neighboring states have some kind of similarity. So we can present that if the two B v j through the progress mentioned in Section 3. C Fig. 1. Division of the data set C. Computational Complexity B. Performance Evaluation The computation complexity for constructing the decision tree of FMF in Section 2 is (6) O D N i2 LMK 3 LM 2 K 2 i where D is the depth of the tree, N i is the number of The performance of a collaborative filtering algorithm will be evaluated in terms of the widely used root mean square error (RMSE) measure, which is defined as follows ratings by the user i. L represents the number of nodes in the tree, M is the number of possible interview questions, K is the dimension of the latent space. In all of these variables, D, N i , L and K can be assigned by administrator. M is the amount of all the items without further restrictions, which could be a really huge number. In Section 3, we chose the top-N items as possible interview questions, which had got the most ratings from the neighbors of the new user. The computation complexity turned into (7) O D N i2 LNK 3 LN 2 K 2 i N Is far less than M under ordinary circumstances. For instance, M is 1682 in our MovieLens dataset. If we set N to be 150, the second part of the computation complexity reduced to less than 10%, as well as the third part to lsee than 1%. As the user number increased, the effect became more apparent. IV. EXPERIMENTS A. Data Set The MovieLens data set has been widely used in the fild of CF. It contains 3900 movies, 6040 users and around 1 million ratings, the ratings are integers ranging from 1 (bad) to 5 (good). In our experiments, we choose the r 2 rij RMSE i 1 N where N represents the amount of test ratings, rij is the n ij ground truth values for movie j by user i and rij is the predicted value by our algorithm. C. Results and Analysis We compare the performance with four baseline methods described as follows: Mode Method (MM): In the whole training set, find the mode of ratings for movies by all users, and predict the ratings in the test set as the mode value. Mean Value Method (MVM): In the whole training set, compute the mean value of ratings for movies by all users, and predict the ratings in the test set as the mean value. Neighbor Mode Method (NMM): In the optimized training set which is composed of the N neighbors of the new user, find the mode of ratings for movies by all users, and predict the ratings in the test set as the mode value. Neighbor Mean Value Method ( NMVM ) : In the optimized training set which is composed of the N neighbors of the new user, compute the mean value of ratings for movies by all users, and predict the ratings in the test set as the mean value. We set D=5 and K=20 in Eq. (6) and (7), and we provide =0.5. Then we consider different N=10, 50, 100, 150,300 in Eq. (7), same as N in NVM and NMVM respectively. The RMSE results are is reported in Table 1, and depicted in Figure 2. TABLE I. RMSE ON MOVIELENS DATA SET FOR COLD-START USERS WITH RESPECT TO THE NUMBER OF NEIGHBORS MM MVM NMM NMVM FMF CCFBURP N=10 N=50 1.2547 1.3120 1.0836 1.2055 1.2664 0.9825 N=100 1.0451 1.1548 1.0147 1.1348 0.9536 0.9457 N=150 N=300 0.9875 1.0895 1.0348 1.1446 0.9327 0.9396 1.4 1.3 1.2 MM MVM NMM NMVM FMF CCFBURP 1.1 1 0.9 0.8 N=10 N=50 N=100 N=150 N=300 Fig. 2. RMSE of MM, MVM, NMM, NMVM, FMF and CCFBURP for cold-start users on MovieLens Data Set Comparing the performance of MM, MVM and FMF, we can see that FMF is superior to the others, as same as CCFBURP to NMM and NMVM. This observation illustrate that the interview processes in FMF and CCFBURP are beneficial to improve the algorithm accuracy. Then we compare MM with NMM, MVM with NMVM and FMF with CCFBURP in pairs. We can see that the formers are static while the latter ones are dynamic respond to the changes of N, the RMSE first decreases, and reaches the optima around N=150, thereafter, the RMSE increases and tends to the results of the formers. We attribute this to the fact that when N is too small, the predicted ratings are influenced much by preconception of the selected users. As N increasing, the influence of preconception decreases, and RMSE decreases. However, as N continues increasing, it is getting closer and closer to the number of all items, which is why the RMSE tends to the results of the static algorithms. V. CONCLUSION The main focus of this paper is on the cold-start problem in recommender systems. We have presented the Cold-start Collaborative Filtering Based on User Registration Process, a framework for learning latent factors for user/item profiling. The proposed CCFBURP algorithm has considered the whole registration process of new users, and used the information issued from the process to predict the users’ performance. Experimental results on the MovieLens dataset demonstrate that the proposed CCFBURP algorithm significantly outperforms existing methods for cold-start recommendation. For future work, we plan to investigate the influence of parameter variation. Moreover, we also plan to explore the rules of calculating the dissimilarity, which plays an important role in the first step of CCFBURP. REFERENCES [1] Huang, Z., W. Chung, et al, “A graph model for ECommerce recommender systems”, Journal of the American Society for Information Science and Technology, Vol.55. No.3, pp. 259-74, 2004. [2] K. Wei, J. Huang, S. Fu, “A survey of E-Commerce recommender systems”, in Proceedings of the International Conference on Service Systems and Service Management, pp. 1-5, 2007. [3] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, “Using collaborative filtering to weave an information tapestry,” in Proceedings of Communications of ACM, vol. 35, no. 12, pp. 61–70, 1992. [4] Adomavicius, G., A.Tuzhilin, “Toward the next generation of recommender system: a survey of the state-of-the-art and possible extensions”, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, NO.6, pp734-749, 2005. [5] J. S. Breese, D. Heckerman, and C. Kadie. “Empirical analysis of predictive algorithms for collaborative filtering”, in Proceedings of the Fourteenth Conference on Uncertainty in Arterial Intelligence, pp. 43-52,1998. [6] W. Hill, L. Stead, M. Rosenstein, and G. Furnas. “Recommending and evaluating choices in a virtual community of use”, in Proceedings of the Conference on Human Factors in Computing Systems, pp. 194-201, 1995. [7] J. A. Konstan, B. N. Miller, et al, “GroupLens: Applying collaborative filtering to Usenet news”, in Proceedings of Communications of the ACM, 40(3), pp. 77-87, 1997. [8] P. Resnick and H. R. Varian. “Recommender systems”, in Proceedings of Communications of the ACM, 40(3), pp. 5658, 1997. [9] U. Shardanand and P. Maes. “Social information filtering: Algorithms for automating `word of mouth'”, in Proceedings of the Conference on Human Factors in Computing Systems, pp. 210-217, 1995. [10] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock, “Methods and metrics for cold-start recommendations,” in Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’02), pp. 253260,2002. [11] B. M. Kim and Q. Li, “Probabilistic model estimation for collaborative filtering based on items attributes,” in Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 185-191, 2004. [12]A. Paterek. “Improving regularized singular value decomposition for collaborative filtering”, in Proceedings of KDD Cup and Workshop, pp. 5-8.2007. [13] M. Weimer et al, “Maximum Margin Matrix Factorization for Code Recommendation,” in RecSys '09 Proceedings of the third ACM conference on Recommender systems, pp. 309-312, 2009. [14] N. Golbandi, Y. Koren, and R. Lempel. “On bootstrapping recommender systems”, in Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 1805-1808. 2010. [15] K. Zhou et al, “Functional Matrix Factorizations for ColdStart Recommendation,” in SIGIR '11 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 315-324,New York, USA, 2011.