advertisement

MATEC Web of Conferences 4 4, 0 1 0 91 (2016 ) DOI: 10.1051/ m atecconf/ 2016 4 4 0 1 0 91 C Owned by the authors, published by EDP Sciences, 2016 An new method to collaborative filtering recommendation based on DBN and HMM Yong Mei Zhao1,Hai Rong Wan2,Zheng Xin Guo 1 Department of Electric and Science, College of Science, Air Force Engineering University, China 2 Department of Urban and Regional Planning, College of Urban and Environmental Sciences, Peking University, China 3 Department of Civil Engineering, Hunan University,China Abstract: The main problems of collaborative filtering are initial rating, data sparsity and recommendation in time. A recommendation approach based on HMM model, which creates nearest neighbour set by simulating the user behaviours of web browsing, is a good way to solve the above problems. However, the HMM or model parameters constantly vary with customer's changing preference. When there is a new type of data to join, the HMM can only be discovered by relearn, which will affect real time of recommendation. Therefore a recommendation approach based on DBN and HMM is proposed. The approach will improve real time recommendation, and experiments shows that it has high recommendation quality. Key words: hidden markov model(HMM), dynamic bayes network(DBN), collaborative filtering recommendation type of data to join, the HMM can only be 1 Introduction discovered by relearn, which will affect real time The main problems of collaborative of user behaviour recommendation. It is filtering [1] are data sparsity, initial rating, necessary to study how to improve the adaptive recommendation in time and the expansion of capability of recommend model on the basis of data space. In order to solve these problems, an using previous model by updating its structure. approach to collaborative filtering Short for dynamic Bayesian networks model recommendation based on HMM is one of the (DBN), because of its flexibility of modelling, is good solution [2].User behaviour changes widely used in many fusion algorithms. We can constantly with the diversity of online consumer create recommendation model based on DBN to products, by which model or model parameters implement network structure learning--adding should be modified. Previous recommendation new features on the basis of previous model once formed, model parameters cannot be collaborative filtering recommendation changed arbitrarily, thus when there is a new This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits XQUHVWULFWHGXVH distribution, and reproduction in any medium, provided the original work is properly cited. Article available at http://www.matec-conferences.org or http://dx.doi.org/10.1051/matecconf/20164401091 MATEC Web of Conferences technology, and combining all previous training variables, they are independent. set and new sample for leaning, which will both Wti can hardly be determined in practice, in save time and optimize the network structure, order to solve the faults, the discount factor then makes recommendation model meets user’s G 0 G 1 is incorporated into the model. A needs well. first order polynomial model with the discount 2 A collaborative filtering prediction model based on HMM [3] factor is called a first order polynomial discount The HMM collaborative filtering model with Pt users preference makes great improvements both in efficiency and accuracy recommendation results. The HMM collaborative filtering model i Et i D Et i Er i D Et E r i 1 Pt E r i 1 i ( ri / ri 1 ) E ti1 .Where, Pt is the level of i i 1 n ¦ Pi (O | O ) u Aij n i1 sequence level from time ti 1 to ti . (1) The state equation: Where , Aij is the preferences of usr i for represents; Paj is the references probability of ­° Pti Pti1 (ri / ri 1 ) Eti1 wti ,1 ® °̄ Eti (ri / ri 1 ) Eti1 wti ,2 target user a for commodity j;. Pi(O|¬) is the Where commodity j; Commodity j is what the path j similarity of the nearest user i. 3 A collaborative filtering recommendation model based on DBN wti The probability equation of observation The initial information: N ª¬ mt0 , Ct0 º¼ (7) Where variable: M ¦ p(M t m Yt ) p ( X t , At Yt , M t m) mt0 (2) m 1 Where, p(At|Yt,Mt=m) is expressed in a simple distribution like passion distribution; p(Xt|At,Yt,Mt=m) is expressed in conditional Gaussian model as following p(Xt At ,Yt , Mt m) N(Xt ;u BAt , ¦ ) u m u m GtT t 1 wt wt ~ N [0, Wt ] , The initial prior: (T0 | D0 ) ~ N [m0 , C0 ] ª mt0 º « » , Ct0 «¬ bt0 »¼ ªCt0.1 « «¬Ct0.3 Ct0.3 º » Ct0.2 »¼ vti is a term of observation error, which follows normal distribution with zero mean and (3) variances of Vti . Dti is the information Equation of state: Tt (6) ª wti ,1 º « » ¬« wti ,2 ¼» § ª Pt º· ¨ « 0 Dt0 » ¸ ¨ « Et ¸ ¼» ¹ ©¬ 0 3.1 Definition of DBN model[5,6] p ( X t , At Yt ) i the sequence at time ti, E t is the growth of is defined as following Paj model, the specific implementation is as follows: collection about system before time ti . (4) (5) Wti Where T t is the state parameter at time t, which F is a n×1 matrix; Gt is n ×n state transition matrix˗vt is observation error and wt is state 01091-p.2 (G ri 1)Gti Gti1 GtTi §0· ¨ ¸ , Gti ©1¹ ª1 ri / ri 1 º « » , Tti ¬0 ri / ri 1 ¼ § Pti · ¨¨ ¸¸ © Eti ¹ ICEICE 2016 3.2 The model of reasoning and learning Step E: algorithm Assume the observation vector is e, the 1) Model reasoning algorithm[7] In the process of model reasoning, we mainly conditional probability of node Ci will be shown in equation (10): calculate two parameters: and , which are P (Ci served as and in HMM. Parameter and can be calculated by the following formula:(assume the observation vector is e ) ¦ P ( e) P( X i The j O ij S ij , i (8) w | e) O S , i ¦ j O ij S ij specific i j i j The probability of every solidification node can be calculated according to equation(10), using the derived algorithm based on connected-tree algorithm. reasoning algorithm Xi appeared, assuming Xi=k, parents(Xi)=j. First choose one solidification node, which contains both Xi and its father node. variable (9) is as Let follows(algorithm 1)˖ For each variable Xi( poster order traversal) V jki be the collection of solidification nodes which satisfy Xi=k and parents (Xi)=j, If Xi is a leaf node, If j are the values of observation vectors, then N ijk O ij 1 , ¦ P (C i w | e) (11) wV jki Else O j i Nijk increases with the addition of the training sample. 0, Else If j are the values of observation vectors, i j Else (10) Then calculate Nijk--the number of times each j e) O Owi S wi ¦ w Owi S wi O ij cchildren ( xi ) ¦ f O cf P ( X c f |Xi Step M: After calculating Nijk, the conditional probability of each variable can be reevaluate as shown in formula (12): P( X i j) 0 k | parents( X i ) N ijk j) ¦ k Nijk N ijk N ij (12) 3.3 Collaborative filtering For each variable Xi(preordering traverse) recommendation algorithm based on If Xi is a root node, DBN (algorithm 3) Si j P( X i j) Step 1: Initialize the X0; Step 2: Perform the collaborative filtering Else prediction model based on HMM (Model 1), S ij ¦ vCON ( P) P( X i j | XP v) S vp ssilbings ( X i ) ¦ f O P( X s s f f | XP training v) model based on the standard characteristics; (where Xp is the father of Xi) 2) Model learning algorithm[8,9] When the observation vector is incomplete with any given network structure, the classical EM algorithm is suitable. The training process of the algorithm is:(Algorithm 2) Step 3: Perform algorithm 1, algorithm 2 and establish DBN; Step 4: Calculate P(XT|Y1:T), chose XT maximizing P(XT|Y1:T). 3.4 The update model based on DBN 01091-p.3 MATEC Web of Conferences Combining the previous model with the nearest which are used to define how HMM and DBN neighbour similarity recommended by HMM, training results influence the model. the original forecast model of collaborative filtering updating for formula: 3.5 n Paj 1 ¦ Si u Aij ni1 (13) Nearest neighbour collaborative filtering method after updating There are three parts of the updated collaborative i Si=\ * Pi (O | O ) ] * Pi ( X T Y1:T ) (14) Where ˈ Aij is the preference of user i for commodity j; Commodity j is what the path j represents. Paj is the probability of which target filtering recommendation model: data pre-processing, HMM filtration, DBN updating model, collaborative filtering prediction model. Figure 1 depicts its structure: user a like commodity j. Si is the similarity of the nearest user i. \ and ] are balance index Figure 1.The updated collaborative filtering process In nearest neighbour recommended phase, obvious mistake in the random data), 10000 of obviously, calculate Pi(O|) and Pi(XT|Y1:T) which are used for training network, the format separately, and choose XT to maximize is as follows:(2012-01-2222 ˖ 10 ˖ 15 ˈ P(XT|Y1:T) and Pi(O|). Si can be considered as the similarity between user i and the target user, wherever is HMM parameter for a target user. N uers of the most similarity will join the nearest neighbor set. 192ˊ168ˊ1ˊ23ˈhttp://www.haoting.com/ˈ Experimental process is: (1) Calculate the similarity between the users by training set, and find the nearest neighbours of users in various scenarios.(2) Predict all of the items of the whole then figure out the collection of recommended results.(3) Using recommendation results and real record in test set, calculate recommended efficiency according ˊ 192 168 ˊ ˊ 1 21 ˈ http://www.haoting.com/zhangjie/ ˈ 15 ˈ 122)ĂĂ The left 10000 of feature vectors are used for the 4 Evaluation of experiment users, 15 ˈ 24)(2015-06-0212 ˖ 02 ˖ 15 ˈ to the evaluation standards. Five parameters are used in the experiment, the access time of queries, IP address, URL, browse number and page load time. We take experiment with five users(for user1, user2, and user3, user5) data. User1 is a target user randomly generated experiment data of the nearest set recommendation. The same as user1, other users also randomly generate 20000 feature vectors for the recommendation experiment. First of all, training the nearest set after model updating: Perform algorithm 3, and calculate the similarity between users (Calculate Pi(XT|Y1:T), and choose XT \ to [ maximize Pi(XT|Y1:T)).When 0.5 , new similarity Si shows in table 1: Table 1. The similarity of users after model updating user the original 20000 feature vectors(the feature vector has been treated, so there was little 01091-p.4 State Similarity Similarity number based after model HMM on updating ICEICE 2016 user1 153 0.993 0.996 recommendation based on access path of users, user2 90 0.647 0.641 which takes user's other characteristics into user3 145 0.975 0.985 consideration, such as the user's static ratings, by user4 76 0.296 0.287 using the high integration of dynamic Bayesian. user5 69 0.188 0.218 In this way, we update the model, and achieve This figure shows Si--the similarity of different the dynamic behaviour data combined with static users in different states. Predict similarity characteristics of the data, then acquire a more between user1 and itself is 0.996. It is very accurate recommendation. intuitive from the figure that user3 is closest to References target user1,which can be classified as nearest neighbour set. 1.You Wen, Ye Shui-sheng. A Survey of Second, the recommendation process of user Collaborative Filtering Algorithm Applied in E preference commodities after updating model: -commerce Recommender System[J]. Computer We find the commodity 1001 meets the Technology and Development, 2006, 16(9): greatest degree of preferences of user3, and the 57-61. recommend probability of target users is: Puser1, 2.Wang Xia, Liu Qin. Research on Application 1001 of Collaborative Filtering in Recommender = 0.9894. Add user2 to the nearest neighbour set, calculate Systems [J].Applications of The Computer formula (14), we still find the commodity 1001 Systems, 2005, 4: 23-27. meets the greatest degree of preferences of 3.Huang Guang-qiu, Zhao Yong-mei, Approach user3. to collaborative filtering recommendation based on HMM [J]. Journal of Computer Applications, 2008.28(6)˖1601-1604. 4.ZHANG Zi-li, LIU Wei-yi, State prediction based on the dynamic Bayesian network [J]. 2007, 29(1): 35-39. 5.LI Xiao-yi, XU Zhao-di, SUN Xiao-wei. Study on Parameter Learning of Bayesian Network [J].Journal of Shenyang Agricultural University,2007-02,38(1):125-128 6.Wei Fang, Discovering User Interests Based on Bayesian Network [D]. Xi’an: Xi’an University of Architecture and Technology, 2007: 25-29. Figure 2.Results of experiment 7.Liu Jie, Chinese Proper Names Recognition Experiment result is shown in figure 2.The top curve providing a forecast of preferences probability of target user1 for commodity j with the nearest neighbour set (user3) while the bottom curve providing the results with the nearest neighbour (user3, user2). Based on Dynamic Bayesian Network [D], Shanxi: Shanxi University, 2006: 11-13. 8.Sun A-li, Research on Dynamic Bayesian Network Models for Audio-Visual Specch Recognition [D].Xi’an: Northwestern Polytechnical University, 2007: 28-40. 9.Wang Xing-bin, Research on Application of 5 Conclusion Bayesian Net in Robust Speech Recognition [D]. We mainly introduces the method of updating the model of collaborative filtering Henan: The PLA Information Engineering University, 2006:17. 01091-p.5