Document 14211970

advertisement
MATEC Web of Conferences 4 4, 0 1 0 91 (2016 )
DOI: 10.1051/ m atecconf/ 2016 4 4 0 1 0 91
C Owned by the authors, published by EDP Sciences, 2016
An new method to collaborative filtering recommendation
based on DBN and HMM
Yong Mei Zhao1,Hai Rong Wan2,Zheng Xin Guo
1
Department of Electric and Science, College of Science, Air Force Engineering University, China
2
Department of Urban and Regional Planning, College of Urban and Environmental Sciences, Peking University,
China
3
Department of Civil Engineering, Hunan University,China
Abstract: The main problems of collaborative filtering are initial rating, data sparsity and
recommendation in time. A recommendation approach based on HMM model, which creates
nearest neighbour set by simulating the user behaviours of web browsing, is a good way to solve
the above problems. However, the HMM or model parameters constantly vary with customer's
changing preference. When there is a new type of data to join, the HMM can only be discovered
by relearn, which will affect real time of recommendation. Therefore a recommendation
approach based on DBN and HMM is proposed. The approach will improve real time
recommendation, and experiments shows that it has high recommendation quality.
Key words: hidden markov model(HMM), dynamic bayes network(DBN), collaborative filtering
recommendation
type of data to join, the HMM can only be
1 Introduction
discovered by relearn, which will affect real time
The main problems of collaborative
of user behaviour recommendation. It is
filtering [1] are data sparsity, initial rating,
necessary to study how to improve the adaptive
recommendation in time and the expansion of
capability of recommend model on the basis of
data space. In order to solve these problems, an
using previous model by updating its structure.
approach
to
collaborative
filtering
Short for dynamic Bayesian networks model
recommendation based on HMM is one of the
(DBN), because of its flexibility of modelling, is
good solution [2].User behaviour changes
widely used in many fusion algorithms. We can
constantly with the diversity of online consumer
create recommendation model based on DBN to
products, by which model or model parameters
implement network structure learning--adding
should be modified. Previous recommendation
new features on the basis of previous
model once formed, model parameters cannot be
collaborative
filtering
recommendation
changed arbitrarily, thus when there is a new
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits XQUHVWULFWHGXVH
distribution, and reproduction in any medium, provided the original work is properly cited.
Article available at http://www.matec-conferences.org or http://dx.doi.org/10.1051/matecconf/20164401091
MATEC Web of Conferences
technology, and combining all previous training
variables, they are independent.
set and new sample for leaning, which will both
Wti can hardly be determined in practice, in
save time and optimize the network structure,
order to solve the faults, the discount factor
then makes recommendation model meets user’s
G 0 G 1 is incorporated into the model. A
needs well.
first order polynomial model with the discount
2 A collaborative filtering
prediction model based on HMM [3]
factor is called a first order polynomial discount
The HMM collaborative filtering model with
Pt
users preference makes great improvements both
in efficiency and accuracy recommendation
results. The HMM collaborative filtering model
i
Et
i
D Et
i
Er
i
D Et E r
i 1
Pt E r
i 1
i
( ri / ri 1 ) E ti1 .Where, Pt is the level of
i
i
1 n
¦ Pi (O | O ) u Aij
n i1
sequence level from time ti 1 to ti .
(1)
The state equation:
Where , Aij is the preferences of usr i for
represents; Paj is the references probability of
­° Pti Pti1 (ri / ri 1 ) Eti1 wti ,1
®
°̄ Eti (ri / ri 1 ) Eti1 wti ,2
target user a for commodity j;. Pi(O|¬) is the
Where
commodity j; Commodity j is what the path j
similarity of the nearest user i.
3 A collaborative filtering
recommendation model based on
DBN
wti
The probability equation of observation
The initial information:
N ª¬ mt0 , Ct0 º¼
(7)
Where
variable:
M
¦ p(M
t
m Yt ) p ( X t , At Yt , M t
m)
mt0
(2)
m 1
Where, p(At|Yt,Mt=m) is expressed in a simple
distribution like passion distribution;
p(Xt|At,Yt,Mt=m)
is expressed in conditional Gaussian model as
following
p(Xt At ,Yt , Mt m) N(Xt ;u BAt , ¦ )
u
m
u
m
GtT t 1 wt wt ~ N [0, Wt ]
,
The initial prior: (T0 | D0 ) ~ N [m0 , C0 ]
ª mt0 º
« » , Ct0
«¬ bt0 »¼
ªCt0.1
«
«¬Ct0.3
Ct0.3 º
»
Ct0.2 »¼
vti is a term of observation error, which follows
normal distribution with zero mean and
(3)
variances of Vti . Dti is the information
Equation of state:
Tt
(6)
ª wti ,1 º
«
»
¬« wti ,2 ¼»
§ ª Pt
º·
¨ « 0 Dt0 » ¸
¨ « Et
¸
¼» ¹
©¬ 0
3.1 Definition of DBN model[5,6]
p ( X t , At Yt )
i
the sequence at time ti, E t is the growth of
is defined as following
Paj
model, the specific implementation is as follows:
collection about system before time ti .
(4)
(5)
Wti
Where T t is the state parameter at time t, which
F
is a n×1 matrix; Gt is n ×n state transition
matrix˗vt is observation error and wt is state
01091-p.2
(G ri 1)Gti Gti1 GtTi
§0·
¨ ¸ , Gti
©1¹
ª1 ri / ri 1 º
«
» , Tti
¬0 ri / ri 1 ¼
§ Pti ·
¨¨ ¸¸
© Eti ¹
ICEICE 2016
3.2 The model of reasoning and learning
Step E:
algorithm
Assume the observation vector is e, the
1) Model reasoning algorithm[7]
In the process of model reasoning, we mainly
conditional probability of node Ci will be shown
in equation (10):
calculate two parameters: and , which are
P (Ci
served as and in HMM.
Parameter and can be calculated by the
following
formula:(assume
the
observation
vector is e )
¦
P ( e)
P( X i
The
j
O ij S ij , i
(8)
w | e)
O S
, i
¦ j O ij S ij
specific
i
j
i
j
The probability of every solidification node can
be calculated according to equation(10), using
the derived algorithm based on connected-tree
algorithm.
reasoning
algorithm
Xi appeared, assuming Xi=k,
parents(Xi)=j. First choose one solidification
node, which contains both Xi and its father node.
variable
(9)
is
as
Let
follows(algorithm 1)˖
For each variable Xi( poster order traversal)
V jki be the collection of solidification
nodes which satisfy Xi=k and parents (Xi)=j,
If Xi is a leaf node,
If j are the values of observation vectors,
then
N ijk
O ij 1 ,
¦
P (C i
w | e)
(11)
wV jki
Else O j
i
Nijk increases with the addition of the training
sample.
0,
Else
If j are the values of observation
vectors,
–
i
j
Else
(10)
Then calculate Nijk--the number of times each
j e)
O
Owi S wi
¦ w Owi S wi
O ij
cchildren ( xi )
¦ f O cf P ( X c
f |Xi
Step M:
After calculating Nijk, the conditional probability
of each variable can be reevaluate as shown in
formula (12):
P( X i
j)
0
k | parents( X i )
N ijk
j)
¦ k Nijk
N ijk
N ij
(12)
3.3 Collaborative filtering
For each variable Xi(preordering traverse)
recommendation algorithm based on
If Xi is a root node,
DBN (algorithm 3)
Si
j
P( X
i
j)
Step 1: Initialize the X0;
Step 2: Perform the collaborative filtering
Else
prediction model based on HMM (Model 1),
S ij
¦ vCON ( P) P( X i
j | XP
v) S vp
– ssilbings ( X i ) ¦ f O P( X s
s
f
f | XP
training
v)
model
based
on
the
standard
characteristics;
(where Xp is the father of Xi)
2) Model learning algorithm[8,9]
When the observation vector is incomplete with
any given network structure, the classical EM
algorithm is suitable. The training process of the
algorithm is:(Algorithm 2)
Step 3: Perform algorithm 1, algorithm 2
and establish DBN;
Step 4: Calculate P(XT|Y1:T), chose XT
maximizing P(XT|Y1:T).
3.4 The update model based on DBN
01091-p.3
MATEC Web of Conferences
Combining the previous model with the nearest
which are used to define how HMM and DBN
neighbour similarity recommended by HMM,
training results influence the model.
the original forecast model of collaborative
filtering updating for formula:
3.5
n
Paj
1
¦ Si u Aij
ni1
(13)
Nearest
neighbour
collaborative
filtering method after updating
There are three parts of the updated collaborative
i
Si=\ * Pi (O | O ) ] * Pi ( X T Y1:T )
(14)
Where ˈ Aij is the preference of user i for
commodity j; Commodity j is what the path j
represents. Paj is the probability of which target
filtering
recommendation
model:
data
pre-processing, HMM filtration, DBN updating
model, collaborative filtering prediction model.
Figure 1 depicts its structure:
user a like commodity j. Si is the similarity of
the nearest user i.
\ and ] are balance index
Figure 1.The updated collaborative filtering process
In nearest neighbour recommended phase,
obvious mistake in the random data), 10000 of
obviously, calculate Pi(O|) and Pi(XT|Y1:T)
which are used for training network, the format
separately, and choose XT
to maximize
is as follows:(2012-01-2222 ˖ 10 ˖ 15 ˈ
P(XT|Y1:T) and Pi(O|). Si can be considered as
the similarity between user i and the target user,
wherever is HMM parameter for a target user.
N uers of the most similarity will join the nearest
neighbor set.
192ˊ168ˊ1ˊ23ˈhttp://www.haoting.com/ˈ
Experimental process is: (1) Calculate the
similarity between the users by training set, and
find the nearest neighbours of users in various
scenarios.(2) Predict all of the items of the whole
then figure out
the
collection
of
recommended results.(3) Using recommendation
results and real record in test set, calculate
recommended
efficiency
according
ˊ
192
168
ˊ
ˊ
1
21
ˈ
http://www.haoting.com/zhangjie/ ˈ 15 ˈ
122)ĂĂ
The left 10000 of feature vectors are used for the
4 Evaluation of experiment
users,
15 ˈ 24)(2015-06-0212 ˖ 02 ˖ 15 ˈ
to
the
evaluation standards.
Five parameters are used in the experiment, the
access time of queries, IP address, URL, browse
number and page load time. We take experiment
with five users(for user1, user2, and user3, user5)
data. User1 is a target user randomly generated
experiment
data
of
the
nearest
set
recommendation. The same as user1, other users
also randomly generate 20000 feature vectors for
the recommendation experiment.
First of all, training the nearest set after model
updating:
Perform algorithm 3, and calculate the similarity
between users (Calculate Pi(XT|Y1:T), and choose
XT
\
to
[
maximize
Pi(XT|Y1:T)).When
0.5 , new similarity Si shows in table
1:
Table 1. The similarity of users after model updating
user
the original 20000 feature vectors(the feature
vector has been treated, so there was little
01091-p.4
State
Similarity
Similarity
number
based
after model
HMM
on
updating
ICEICE 2016
user1
153
0.993
0.996
recommendation based on access path of users,
user2
90
0.647
0.641
which takes user's other characteristics into
user3
145
0.975
0.985
consideration, such as the user's static ratings, by
user4
76
0.296
0.287
using the high integration of dynamic Bayesian.
user5
69
0.188
0.218
In this way, we update the model, and achieve
This figure shows Si--the similarity of different
the dynamic behaviour data combined with static
users in different states. Predict similarity
characteristics of the data, then acquire a more
between user1 and itself is 0.996. It is very
accurate recommendation.
intuitive from the figure that user3 is closest to
References
target user1,which can be classified as nearest
neighbour set.
1.You Wen, Ye Shui-sheng. A Survey of
Second, the recommendation process of user
Collaborative Filtering Algorithm Applied in E
preference commodities after updating model:
-commerce Recommender System[J]. Computer
We find the commodity 1001 meets the
Technology and Development, 2006, 16(9):
greatest degree of preferences of user3, and the
57-61.
recommend probability of target users is: Puser1,
2.Wang Xia, Liu Qin. Research on Application
1001
of Collaborative Filtering in Recommender
= 0.9894.
Add user2 to the nearest neighbour set, calculate
Systems [J].Applications of The Computer
formula (14), we still find the commodity 1001
Systems, 2005, 4: 23-27.
meets the greatest degree of preferences of
3.Huang Guang-qiu, Zhao Yong-mei, Approach
user3.
to collaborative filtering recommendation based
on HMM [J]. Journal of Computer Applications,
2008.28(6)˖1601-1604.
4.ZHANG Zi-li, LIU Wei-yi, State prediction
based on the dynamic Bayesian network [J].
2007, 29(1): 35-39.
5.LI Xiao-yi, XU Zhao-di, SUN Xiao-wei. Study
on Parameter Learning of Bayesian Network
[J].Journal of Shenyang Agricultural
University,2007-02,38(1):125-128
6.Wei Fang, Discovering User Interests Based
on Bayesian Network [D]. Xi’an: Xi’an
University of Architecture and Technology,
2007: 25-29.
Figure 2.Results of experiment
7.Liu Jie, Chinese Proper Names Recognition
Experiment result is shown in figure 2.The top
curve providing a forecast of preferences
probability of target user1 for commodity j with
the nearest neighbour set (user3) while the
bottom curve providing the results with the
nearest neighbour (user3, user2).
Based on Dynamic Bayesian Network [D],
Shanxi: Shanxi University, 2006: 11-13.
8.Sun A-li, Research on Dynamic Bayesian
Network Models for Audio-Visual Specch
Recognition [D].Xi’an: Northwestern
Polytechnical University, 2007: 28-40.
9.Wang Xing-bin, Research on Application of
5 Conclusion
Bayesian Net in Robust Speech Recognition [D].
We mainly introduces the method of updating
the
model
of
collaborative
filtering
Henan: The PLA Information Engineering
University, 2006:17.
01091-p.5
Download