- proposed a probabilistic model to ...

advertisement
Cold-start Collaborative Filtering Based on User Registration Process
Peng-yu Zhu 1, Zhong Yao2
1
2
School of Economics and Management, Beihang University, Beijing, China
School of Economics and Management, Beihang University, Beijing, China
(zhupengyu@sem.buaa.edu.cn)
Abstract - A key challenge in recommender system
research is how to make recommendations to new users.
Recently the idea of solving the problem within the context
of learning user and item profiles has been proposed. Those
methods constructed a decision tree for the initial interview,
enabling the recommender to query a user adaptively
according to her prior responses. However, those methods
have overlooked the new users’ personal attributes. In this
paper, we present the method CCFBURP, which constructs
an algorithm with two steps, in the first of which we screen
neighbors of the target user, using its personal attributes,
while in the second of which we train the interview model on
the dataset constituted of the neighbors and alternative
projects. Then the recommender system forecasts goal of
optional project ratings of the target user. Experimental
results on the MovieLens dataset demonstrate that the
proposed CCFBURP algorithm significantly outperforms
existing methods for cold-start recommendation.
Keywords - Collaborative filtering, Recommender
systems, Cold-start problem, User Registration Process
I. INTRODUCTION
Increasing people have declined to purchase
interesting items from Internet, yet the boom of
information relevant to customers, products and
transactions has lead to information overload problem in
E-Commerce[1]. Meanwhile, in order to supply customers
with various personal services, personalized recommender
systems with recommendation techniques have been
widely applied, which have already been considered one
of the most important methods of personal service in
websites[2]. The developers of Tapestry (one of the first
recommender systems) coined the phrase “collaborative
filtering (CF),” which has been widely adopted in
practice[3].
A key challenge for building an effective CF
recommender system is the well-known cold-start
problem - How to provide recommendations to new users?
New users are unlikely to be given good
recommendations for the lack of their rating or purchase
history[4]. Pure collaborative filtering methods base their
recommendations on community preferences (e.g., user
ratings and purchase histories), which would ignore user
and item attributes (e.g., demographics and product
descriptions) [5, 6, 7, 8, 9].
A vast amount of methods have been introduced to
solve the cold-start problem. Schein et al. proposed the
aspect model latent variable method for cold- start
recommendation, which combines both collaborative and
content information in model fitting[10]. Kim and Li
proposed a probabilistic model to address the cold-start
problem, in which items are classified into groups, and
predictions are made for users considering the Gaussian
distribution of user ratings[11]. On the other hand, much of
the collaborative filtering literature have focused on factor
models recently, for instance, variants of the singular
value decomposition (SVD)[12]. However, in code
recommender systems, the available data consists of calls
to methods in contexts. This data seems to be binary:
given a context- method pair, it predicts whether or not
the method is called within this context[13] .
A natural approach to solving the cold-start problem
is to elicit new user preferences by querying users’
responses progressively through an initial interview
process[14]. Specifically, at the visit of a new user, the
recommender provides a seed item as a question and asks
the user for her opinion; based on which the recommender
gradually refines the user’s characterization that can
provide more satisfactory recommendations to the user in
the future. Ke Zhou et al. proposed the Functional Matrix
Factorizations (FMF) method based on the thought of
query responses and functional matrix factorizations[15].
Their method has been demonstrated to be effective in
cold-start recommendation problems by experimental
results.
The shortage of FMF is that it has overlooked the
users’ personal attributes, for instance, their age, sex,
vocation, location. In this paper, we present a method
Cold-start Collaborative Filtering Based on User
Registration Process(CCFBURP), i.e. a novel cold-start
recommendation
method
that
estimates
users’
preferences through collecting information in their
registration process. Our proposed method tends to
explore the correlation between users more effectively .
In CCFBURP, we construct a two-step algorithm, in the
first of which we screen neighbors of the target user using
its personal attributes, and in the second we train the
interview model on the dataset constituted of the
neighbors and alternative projects.
Then the
recommender system forecasts goal of optional project
ratings of the target user. Experimental results on the
MovieLens dataset demonstrate that the proposed
CCFBURP algorithm significantly outperforms existing
methods for cold-start recommendation.
The remainder of this paper is organized as follows:
In Section 2, matrix factorization for collaborative
filtering are introduced in the first place, followed by the
presentation of functional matrix factorization for
constructing the interview process of cold-start
collaborative filtering by restricting the user profiles to be
a function in the form of a decision tree. In Section 3,
FMF is optimized with adopting users’ personal attributes,
and then the progress of CCFBURP is constructed. Then,
the evaluation of the proposed method on the MovieLens
dataset is carried on and the results in Section 4 is
analyzed. Finally in Section 5, the conclusion is drawn
along with presenting several future directions.
II. FUNCTIONAL MATRIX FACTORIZATION
DECISION TREE
In this section, we describe the functional matrix
factorization (FMF) method for cold-start collaborative
filtering which explores the well-known matrix
factorization methods for constructing the interview
process. The key innovation is that we parameterize user
profiles to be a function of the responses to the possible
questions of the interview process and use matrix
factorization to compute the profiles.
A. Low Rank Approximation of the Rating Matrix
Consider tabulated data, organized in the observed
matrix R  R nm , which we seek to approximate by a
product of two matrices R  U V .So we got
T
rij   u ki v kj  u i v j , i  1, n, j  1, m
k
where rij corresponds to the rating of item j by user i.
Given the set of known ratings, the parameters u i and
v j can be estimated through fitting the training data by
solving the following optimization problem:
T
min  (rij  ui v j ) 2
ui ,v j
{( i , j )}O
The problem can be solved by existing numerical
optimization methods. In our implementation, we use the
alternating optimization for its amenability for the coldstart settings. Specifically, the optimization process
performs the following two updates alternatively.
First, for i  1,2,  , n , minimizing with respect to u i
with all u i , j  i and all
ui  arg min
ui
 (r
( i , j )O
ij
v j fixed:
 ui v j ) 2
T
it is a linear regression problem with squared loss. The
closed form solution can be expressed as
1


T  
ui    v j v j    rij v j 
 (i , j )O
  (i , j )O 
Similarly, for j  1,2, , m , we can approximate
v j with the closed form solution which can be expressed
as

T 
v j    ui ui 
 (i , j )O

1


  rijui 


 (i , j )O 
B. Functional Matrix Factorization
Now we consider constructing the interview process
for cold-start collaborative filtering. Assume that a new
user registers at the recommendation system and nothing is
known about her. To capture the preferences of the user,
the system initiates several interview questions to query
the responses from the user. Based on the responses, the
system constructs a profile for the user and provides
recommendations
accordingly.
We
propose
to
parameterize the user profile u i in such a way that the
profile u i is tied to user i’s responses in the form of a
function.
Assume there are P possible interview questions, and
an answer to a question takes value in the finite set {1,1,0}, representing “Dislike”, “Like” and “Unknown”,
respectively. Then, we introduce a P-dimensional vector a i ,
which denotes the representing the answers of user i to the
P questions. And we tie the profile to the answers by
assuming ui  T ai  , where T is a function that maps the
responses a i to the user profile u i  R K .So we get
rij  v j T ai  .
T
Our goal is to learn both T and v j from the observed
ratings. To this end, substituting ui  T ai  into the low
rank matrix factorization model, we have the following
optimization problem:
T , V  arg min
T ,V
 (r
( i , j )O
ij
 v j T (ai )) 2   V
T
2
(1)
This objective function can be optimized through an
alternating minimization process.
1. Given T (a ) , we can compute v j by regularized least
square regression.
v j  arg min
vj
 (r
ij
( i , j )O
 v j T (ai )) 2   v j
T
2
This problem has a closed-form solution given by
1

 

(2)
v j   T (ai )T (ai )T  I    rijT (ai ) 
 (i , j )O
  (i , j )O

where I is the identity matrix of appropriate size.
2. Given v j , we try to fit a decision tree T (a ) such
that
T  arg min
T 
 (r
( i , j )O
ij
 T (ai )T v j ) 2
(3)
To reduce the implementation and computational
complexity, we address this problem by proposing an
efficient greedy algorithm for finding an approximate
solution.
C. Decision Tree Construction
Starting from the root node, the set of users at current
node are partitioned into three disjoint subsets RL ( p) ,
RD ( p) and RU ( p) corresponding to “Like”, “Dislike” and
“Unknown” of their responses to the interview question p:
R L ( p)  {i | a ip " Like"}

If
xif or x jf is null value, or xif = x jf =0,  ij f  =0. In
 ij f  =1.
RD ( p)  {i | aip " Dislike "}
other cases,
RU ( p)  {i | aip "Unknown"}
1.When attribute f is a dyadic scalar or a nominal
variable.
To find the optimal question p that leads to the best
split, we minimize the following objective:
T
T
min   (rij  u L v j ) 2    (rij  u D v j ) 2
p
xif and x jf corresponds to attribute f of u i and u j .
iRL ( p ) ( i , j )O
iRD ( p ) ( i , j )O
  (r
 uU v j ) 2
T
ij
iRU ( p ) ( i , j )O
(4)
where u L , u D and u U are the optimal profiles for
users in the child nodes corresponds to the answers of
“Like”, “Dislike” and “Unknown”, respectively:
uL  arg min   (ri j  u T v j ) 2
iRL ( p ) ( i , j )O
u
uD  arg min
u
uU  arg min
u
  (r
 uT v j )2
  (r
 uT v j )2
iRD ( p ) ( i , j )O
iRU ( p ) ( i , j )O
ij
ij
If
For dyadic scalars, the dissimilarity could be calculated
through the simple matching method.
pm
d i, j  
p
Where p corresponds to the number of all variables,
and m corresponds to the number of variables that user i
and j matched.
2.When attribute f is an interval variable.
xif  x jf
d ij f  
max x hf  min x hf
h
After the root node is constructed, its child nodes can
be constructed in a similar way, recursively. Finally, we
can get the users’ profiles when he arrived in a leaf node of
the decision tree. Then, the estimated rating of user i to

item j can be expressed as rij  v j T ui .
III. COLD-START COLLABORATIVE FILTERING
BASED ON USER REGISTRATION PROCESS
A. User Dissimilarity Matrix
FMF assumed that a new user who registered at the
recommendation system was a black stranger before she
answered the interview questions. However, thanks to her
registration process, we do know something about her
personal attributes, such as age, sex, vocation, location, etc.
Generally, people with similar attributes are likely to share
their interests in similar things. For example, most boys at
the age of 15~20 likes The Pirates of the Caribbean and
Harry Potter, while the men of 55~60 ages prefer
Casablanca. Therefore, it will be more accurately for new
users to recommend the resources which are enjoyed by
the similar users, who have similar personal attributes with
the new ones.
We introduced User Dissimilarity Matrix to measure
the dissimilarity between two users. Assume there are p
kinds of personal attributes in the user dataset,
dissimilarity between u i and u j can be expressed as:
d i, j  
f 1
p
 
f 1
f

h
Where h includes all non-null objects of attribute f.
For interval variables, the dissimilarity is usually
calculated by distance between users. Euclidean distance
has been most widely used:
d i, j  
2
2
xi1  x j1  xi 2  x j 2    xip  x jp
 
B. Cold-start Collaborative Filtering Based on User
Registration Process
To construct the Cold-start Collaborative Filtering
Based on User Registration Process, the user dissimilarity
matrix must be structured as a necessary preparation. And
we should choose the appropriate measures of users’
attributes.
In the MovieLens dataset, users’ attributes were stored
in table u.user, which had a tab separated list of user id,
age, gender, occupation and zip code. We defined the
variables in Eq. (5) as follows:
1.All
 ij f  =1, f  1,4 ,
for all users’ attributes are
not null value.
2.If the the age difference between the two users is no
more than 5,
dij1 =0. If the difference is more than 5,
dij1 =1.
3.If the two users’ genders are the same,
(5)
2
In the user dissimilarity matrix, d (i, j ) corresponds
to the dissimilarity between user i and user j. Given a
threshold  , if d i, j   user i and j could be a
neighbor to each other.
p
  ij f d ij f 
xif = x jf , d ij f  =0, or d ij f  =1.
dij1 =0, or
dij1 =1.
4.If the two users’ occupations are the same,
ij
or
dij1 =1.
dij1 =0,
users’ zip codes start with a same number,
dij1 =0. If not,
dij1 =1.
We suggested a registration progress for new users,
which consists of two parts. First, the new users registered
an account and provided her personal information. Then
we computed the dissimilarities between the new user and
old ones. After that, we could find her neighbors from the
existing users whose dissimilarities with her were less than
or equal to the given  . These neighbors’ rating to all
items constituted her modelling data set, witch was
denoted as R * . Afterwards the n items that had got the
most ratings in R * were selected to be possible interview
questions. Second, we trained the decision tree T (a ) and
reduced data, which contains 1682 movies, 943 users,
100000 ratings, and each user in the reduced data has rated
at least 20 movies.
We split the users into two disjoint subsets, the training
set and the test set, containing 80% and 20% users. Then
we split the items in the test set into two disjoint subsets,
the answer set containing 80% items, which is used to
generate the user responses in the interview process, while
the evaluation set containing the rest 20% items, which is
used to evaluate the performance after the interview
process. The data set is divided as in Figure 1, where A
represents the training set, B is the answer set, and C is the
evaluation set.
Items
A
Users
5. As we know, the first number of zip code in USA
corresponds to several neighboring states, and we
supposed that people who live in neighboring states have
some kind of similarity. So we can present that if the two
B
v j through the progress mentioned in Section 3.
C
Fig. 1. Division of the data set
C. Computational Complexity
B. Performance Evaluation
The computation complexity for constructing the
decision tree of FMF in Section 2 is


(6)
O D N i2  LMK 3  LM 2 K 2 
 i

where D is the depth of the tree, N i is the number of
The performance of a collaborative filtering algorithm
will be evaluated in terms of the widely used root mean
square error (RMSE) measure, which is defined as follows
ratings by the user i. L represents the number of nodes in
the tree, M is the number of possible interview questions,
K is the dimension of the latent space. In all of these
variables, D, N i , L and K can be assigned by
administrator. M is the amount of all the items without
further restrictions, which could be a really huge number.
In Section 3, we chose the top-N items as possible
interview questions, which had got the most ratings from
the neighbors of the new user. The computation
complexity turned into


(7)
O D N i2  LNK 3  LN 2 K 2 
 i

N Is far less than M under ordinary circumstances. For
instance, M is 1682 in our MovieLens dataset. If we set N
to be 150, the second part of the computation complexity
reduced to less than 10%, as well as the third part to lsee
than 1%. As the user number increased, the effect became
more apparent.
IV. EXPERIMENTS
A. Data Set
The MovieLens data set has been widely used in the
fild of CF. It contains 3900 movies, 6040 users and around
1 million ratings, the ratings are integers ranging from 1
(bad) to 5 (good). In our experiments, we choose the
  r
2


 rij 

RMSE  i 1
N
where N represents the amount of test ratings, rij is the
n
ij

ground truth values for movie j by user i and rij is the
predicted value by our algorithm.
C. Results and Analysis
We compare the performance with four baseline
methods described as follows:
Mode Method (MM): In the whole training set, find the
mode of ratings for movies by all users, and predict the
ratings in the test set as the mode value.
Mean Value Method (MVM): In the whole training set,
compute the mean value of ratings for movies by all users,
and predict the ratings in the test set as the mean value.
Neighbor Mode Method (NMM): In the optimized
training set which is composed of the N neighbors of the
new user, find the mode of ratings for movies by all users,
and predict the ratings in the test set as the mode value.
Neighbor Mean Value Method ( NMVM ) : In the
optimized training set which is composed of the N
neighbors of the new user, compute the mean value of
ratings for movies by all users, and predict the ratings in
the test set as the mean value.
We set D=5 and K=20 in Eq. (6) and (7), and we
provide  =0.5. Then we consider different N=10, 50,
100, 150,300 in Eq. (7), same as N in NVM and NMVM
respectively. The RMSE results are is reported in Table 1,
and depicted in Figure 2.
TABLE I. RMSE ON MOVIELENS DATA SET FOR COLD-START USERS
WITH RESPECT TO THE NUMBER OF NEIGHBORS
MM
MVM
NMM
NMVM
FMF
CCFBURP
N=10
N=50
1.2547
1.3120
1.0836
1.2055
1.2664
0.9825
N=100
1.0451
1.1548
1.0147
1.1348
0.9536
0.9457
N=150
N=300
0.9875
1.0895
1.0348
1.1446
0.9327
0.9396
1.4
1.3
1.2
MM
MVM
NMM
NMVM
FMF
CCFBURP
1.1
1
0.9
0.8
N=10
N=50
N=100
N=150
N=300
Fig. 2. RMSE of MM, MVM, NMM, NMVM, FMF and CCFBURP
for cold-start users on MovieLens Data Set
Comparing the performance of MM, MVM and FMF,
we can see that FMF is superior to the others, as same as
CCFBURP to NMM and NMVM. This observation
illustrate that the interview processes in FMF and
CCFBURP are beneficial to improve the algorithm
accuracy. Then we compare MM with NMM, MVM with
NMVM and FMF with CCFBURP in pairs. We can see
that the formers are static while the latter ones are dynamic
respond to the changes of N, the RMSE first decreases,
and reaches the optima around N=150, thereafter, the
RMSE increases and tends to the results of the formers.
We attribute this to the fact that when N is too small, the
predicted ratings are influenced much by preconception of
the selected users. As N increasing, the influence of
preconception decreases, and RMSE decreases. However,
as N continues increasing, it is getting closer and closer to
the number of all items, which is why the RMSE tends to
the results of the static algorithms.
V. CONCLUSION
The main focus of this paper is on the cold-start
problem in recommender systems. We have presented the
Cold-start Collaborative Filtering Based on User
Registration Process, a framework for learning latent
factors for user/item profiling. The proposed CCFBURP
algorithm has considered the whole registration process of
new users, and used the information issued from the
process to predict the users’ performance. Experimental
results on the MovieLens dataset demonstrate that the
proposed CCFBURP algorithm significantly outperforms
existing methods for cold-start recommendation. For
future work, we plan to investigate the influence of
parameter variation. Moreover, we also plan to explore the
rules of calculating the dissimilarity, which plays an
important role in the first step of CCFBURP.
REFERENCES
[1] Huang, Z., W. Chung, et al, “A graph model for ECommerce recommender systems”, Journal of the
American Society for Information Science and Technology,
Vol.55. No.3, pp. 259-74, 2004.
[2] K. Wei, J. Huang, S. Fu, “A survey of E-Commerce
recommender systems”, in Proceedings of the International
Conference on Service Systems and Service Management,
pp. 1-5, 2007.
[3] D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, “Using
collaborative filtering to weave an information tapestry,” in
Proceedings of Communications of ACM, vol. 35, no. 12,
pp. 61–70, 1992.
[4] Adomavicius, G., A.Tuzhilin, “Toward the next generation
of recommender system: a survey of the state-of-the-art and
possible extensions”, IEEE Transactions on Knowledge and
Data Engineering, Vol. 17, NO.6, pp734-749, 2005.
[5] J. S. Breese, D. Heckerman, and C. Kadie. “Empirical
analysis of predictive algorithms for collaborative filtering”,
in Proceedings of the Fourteenth Conference on
Uncertainty in Arterial Intelligence, pp. 43-52,1998.
[6] W. Hill, L. Stead, M. Rosenstein, and G. Furnas.
“Recommending and evaluating choices in a virtual
community of use”, in Proceedings of the Conference on
Human Factors in Computing Systems, pp. 194-201, 1995.
[7] J. A. Konstan, B. N. Miller, et al, “GroupLens: Applying
collaborative filtering to Usenet news”, in Proceedings of
Communications of the ACM, 40(3), pp. 77-87, 1997.
[8] P. Resnick and H. R. Varian. “Recommender systems”, in
Proceedings of Communications of the ACM, 40(3), pp. 5658, 1997.
[9] U. Shardanand and P. Maes. “Social information filtering:
Algorithms for automating `word of mouth'”, in
Proceedings of the Conference on Human Factors in
Computing Systems, pp. 210-217, 1995.
[10] A. I. Schein, A. Popescul, L. H. Ungar, and D. M.
Pennock, “Methods and metrics for cold-start
recommendations,” in Proceedings of the 25th Annual
International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR ’02), pp. 253260,2002.
[11] B. M. Kim and Q. Li, “Probabilistic model estimation for
collaborative filtering based on items attributes,” in
Proceedings of the IEEE/WIC/ACM International
Conference on Web Intelligence, pp. 185-191, 2004.
[12]A. Paterek. “Improving regularized singular value
decomposition for collaborative filtering”, in Proceedings
of KDD Cup and Workshop, pp. 5-8.2007.
[13] M. Weimer et al, “Maximum Margin Matrix Factorization
for Code Recommendation,” in RecSys '09 Proceedings of
the third ACM conference on Recommender systems, pp.
309-312, 2009.
[14] N. Golbandi, Y. Koren, and R. Lempel. “On bootstrapping
recommender systems”, in Proceedings of the 19th ACM
international conference on Information and knowledge
management, pp. 1805-1808. 2010.
[15] K. Zhou et al, “Functional Matrix Factorizations for ColdStart Recommendation,” in SIGIR '11 Proceedings of the
34th international ACM SIGIR conference on Research and
development in Information Retrieval, pp. 315-324,New
York, USA, 2011.
Download