Uploaded by Alvise De Biasio

978-3-319-93818-9 28

advertisement
A Collaborative Filtering Algorithm Based
on Attribution Theory
Mao DeLei1(&), Tang Yan1, and Liu Bing1,2
1
2
Southwest University, Chongqing 400715, China
mdelei@163.com
Dazhou Vocational and Technical College, Dazhou 635001, Sichuan, China
Abstract. The Collaborative filtering algorithm predicts the user’s preference
for the project to complete a recommendation by analyzing the user preference
data, and usually takes the user’s rating as the user preference data. However,
there is a bias between user’s preference and user’s score of the real scene, so
the user’s rating as user preference can lead to lower recommendation accuracy.
For this problem, this paper proposes a user preference extraction method based
on attribution theory, calculates user preferences by analyzing user rating
behavior. Then, combining preference similarity and rate similarity, making up
the bias between user rating and user preference in collaborative filtering
algorithm. Experimental verification on universal Dataset Movies lens-1m
results shows that the algorithm is preferable to the existing collaborative filtering algorithm.
Keywords: Attribution theory
Recommender system
User preference Collaborative filtering
1 Introduction
The personalized recommendation system is a type of business intelligence platform
based on massive data mining, which provides personalized information service and
decision support to users through the recommendation algorithm [1]. The collaborative
filtering algorithm was first proposed in the 1992 Year by the Goldberg [2]. It is a part
of the most widely used personalized recommendation algorithms at present. The
collaborative filtering algorithm finds users with similar interests through analyzing
user preference data, and recommends interesting items to users based on the association with similar users. The most influential source of user preference data is the user project rating, thus highlighting two issues: 1. Incomplete rating data leads to sparse
user preference data; 2. user ratings and user preferences may have a bias. The above
problems lead to the lack of recommendation accuracy of collaborative filtering
algorithm, and there are still some defects in the cold-start [3].
For all aspects of collaborative filtering algorithm, researchers propose various
methods to improve the collaborative filtering algorithm. Son [4] presented a method
for fusing fuzzy similarity-based on user features and user similarity-based on historical
rating. Joshi [5] proposed an asynchronous stochastic gradient collaborative filtering
algorithm based on matrix decomposition, and the rating matrix is distributed on
© Springer International Publishing AG, part of Springer Nature 2018
Y. Tan et al. (Eds.): ICSI 2018, LNCS 10942, pp. 295–304, 2018.
https://doi.org/10.1007/978-3-319-93818-9_28
296
M. DeLei et al.
different machines, the parameters are updated by stochastic gradient optimization in
the network. In addition, a regularization method based on similarity is introduced.
Kim [6] proposed a Bayesian binomial mixed-model for collaborative filtering, modeling the missing data through three factors related to users, projects, and rating values.
Hofmann [7] referred to the LSI in information retrieval (Latent Semantic Indexing)
technology, used subspace instead of the original data set to improve the feasibility of
locating neighbors. The above research work has improved the cold-start problem and
the algorithm accuracy by diverse methods. However, there are no analysis and
improvement of the bias between user ratings and user preferences. Therefore, this
paper analyzes and improves the bias in order to increase the accuracy of the algorithm.
Bias between user ratings and user preferences means that same ratings reflect
different degrees of user preference. This bias does exist, because according to the
attribution theory, people’s behavior will be influenced by consistency, distinctiveness,
social expectations, positive and negative preferences and other factors. It is caused by
the above factors that even the same user rating may reflect different user preferences.
For example: (1) A and B, two users have different rating habits, A user is accustomed
to low ratings, B user is accustomed to high ratings; (2) A user more objective, even
user do not like the item that is still given an objective rating, B user will only give a
low rating on items he do not like; (3) A user will give high scores because of their
behavior, B users will not; Analysis of the above situations, A and B, two users will
reflect different preferences on the same rating.
Owing to the bias between user ratings and user preferences, a collaborative filtering algorithm based on attribution theory is proposed (AT-CF). Firstly, the user’s
rating behavior is analyzed by attribution theory, and the user preference is extracted.
Attribution theory analysis user rating behavior that involves three of information,
namely: Consensus, Distinctiveness, positive and negative preference, with these three
information to quantify user preferences. Then, combining preference similarity and
rating similarity, we can compensate for the bias between user’s rating and user
preference in collaborative filtering algorithm. Improve collaborative filtering algorithm
(AT-CF) compare with traditional CF, HU-FCF [4], SGO Based regularization [5]
BM/CPT-v [6]. The test results demonstrate that the ATP-CF has a better recommended effect, and an average absolute error (MAE) with a smaller value.
2 Related Theories
2.1
Collaborative Filtering Algorithm
Collaborative filtering algorithm was proposed in 1992 Year by Goldberg [2] to
develop the recommendation system Tapestry. So far, it is still the most widely personalized recommendation technology. The algorithm analysis user preference data,
mining the similarity between users for predicting the user’s preference to complete a
recommendation. When the similarity between users is computed, these methods are
generally used, such as modified cosine similarity, Jaccard, Pearson correlation coefficient and so on. A brief idea of collaborative filtering algorithm the following Fig. 1:
A Collaborative Filtering Algorithm Based on Attribution Theory
297
Fig. 1. Method of collaborative filtering algorithm
As a result of the high application value and research value of the collaborative
filtering algorithm, the research on collaborative filtering has traditionally been hot. In the
recommended systems for well-known web sites such as Amazon.com, Taobao.com,
YouTube, Collaborative filtering algorithm is the major components. In academic
research, ACM recommender system (recsys), ACM SIGIR, AI Communications, IEEE
Intelligent Systems related to the recommendation system, there are often articles published on research and improvement of collaborative filtering algorithms.
In contemporary research of collaborative filtering algorithm, a lot of new methods
and new ideas are proposed, and the algorithm is improved better. However, it ignores
the possible bias between user rating and user preference. Therefore, this paper analyzes the bias to improve the accuracy of the algorithm.
2.2
Attribution Theory
As the subject of cognitive activity, people will be corresponding behavior in the
specific environment structure, and the attribution theory is the method to analyze this
behavior process [10]. Jones and K.E. Davis proposed an attribution method called
corresponding inference, which presents the concept of societal expectation. The
corollary says to be difficult to infer people’s true attitude when their behavior meets
society expectation. Therefore, positive and negative user preferences can be inferred
by societal expectations.
Kelly’s three-dimensional attribution theory suggests that attribution of behavior to
people always involves three factors: (1) actors; (2) objective stimuli; (3) The relationship or environment. Attribution to the above three factors depends on the following three behavioral messages: (1) Consensus: Refers to whether other people have
the same reaction to the same stimuli as the actors. (2) Distinctiveness: Whether the
performer responds to similar stimuli in the same way. (3) Consistency: Refers to
whether the actors in an environment and at any time have same response to the unified
stimulus, and the behavior of the actors is stable and durable.
Kelly thinks that these three aspects of information constitute a covariant
three-dimensional framework. So depending on the above three kinds of behavioral
298
M. DeLei et al.
information, we can attribute the action of people to actors, objective stimuli or
environment.
It is because attribution theory may infer the exact attitude of people’s behavior, so
the bias between user rating and user preference can be compensated by analysis the
user rating behavior. Attributed user rating behavior to the actors, i.e. the user’s
preference affected the rating behavior. Therefore, we can quantify user preferences
through Consensus, Distinctiveness, and societal expectations (positive and negative
preferences) mentioned above.
3 Collaborative Filtering Algorithm Based on Attribution
Theory
To solve the bias between user rating and user preference, this paper proposes a
collaborative filtering algorithm based on attribution theory (AT-CF). AT-CF algorithm
analyzes three types of user rating behavior information: consistency, distinctiveness,
positive and negative preference. Calculating these three behavioral information, and
combination of three kinds of behavioral information to quantify user preferences, so as
to better reflect the user’s real preferences. The method of weighted fusion of rating
similarities and preference similarities can compensate for bias between user rating and
user preferences. Calculating the nearest neighbor set of the user, predict the rating of
the target user on the unrated item.
3.1
Attribution Theory Extract User Preference
In order to extract user preference, the attribution theory is utilized to analyze the user
rating behavior, which involves three kinds of behavioral information: Consensus,
Distinctiveness and positive and negative preference.
Consensus
Consensus [10] refers to the deviation between user and other users in rating item.
The greater the deviation, the lower the Consensus, and when the Consensus is low, it
may reflect the user’s preference for the item. Consensus is measured by mean square
root error, the greater the mean square root error, the smaller the Consensus. Calculate
Consensus by formula (1):
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
2
v2Vui ðru rv Þ
Consensusðu; iÞ ¼
n
ð1Þ
Parameter description in the formula (1): vui represent user collection with other
user rate the same item, ru ; rv Represents a user rating, n is the number of vui .
Distinctiveness
Distinctiveness [10] refers to the rating deviation of similarity item. The smaller the
deviation of the user’s rating in similar items, the lower distinctiveness, and it can
A Collaborative Filtering Algorithm Based on Attribution Theory
299
reflect the user’s preference towards the project. Similarly, the mean square root error is
used to compute the distinctiveness, as showed in the formula (2):
Distinctivenessðu; iÞ ¼
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
2
Ci 2IsimðiÞ ðrui ruCi Þ
n
ð2Þ
Parameter description in the formula: IsimðiÞ represents a collection of similar items,
rui ruCi represents a user rating for similar item, n represents the number of similar items.
For the experimental data in this paper, the categories of movies processed
as a dimension vector of a category feature, represented by a matrix Category. A row
of a matrix represents an item, the column represents a category label for the
item. Calculate the category similarity of the item by using of formula (3). N item with
the highest similarity to the project is then taken as a collection of similar items, the
in formula 2.
0
100000001000000100
1
B 000000001000000001 C
C
B
B 000100000000001000 C
C
B
Category ¼ B
C
..
C
B
.
A
@
010000000000000000
1
SimilarityðIa ; Ib Þ ¼
!:!
5 Ia Ib
ð3Þ
Positive or Negative Preferences
The societal expectation of an item is to distinguish the positive and negative
preferences. The difference between user’s rating and social expectation indicates the
user’s preference to item. The social expectation score value of the item is calculated by
a weighted average, as showed in the formula (4): Positive and negative preferences
such as Formula (5):
P5
ERateðiÞ ¼
j¼1
nj j
n
Preferenceðu; iÞ ¼ rui ERateðiÞ
ð4Þ
ð5Þ
User Preference Extraction
To sum up, three informations about user rating behavior is extracted by attribution
theory: positive and negative preference, Consensus, Distinctiveness. By using the
formula (6), compute the attribution theory user preference (ATP):
ATP ¼
Preference þ Consensus þ Distinctiveness Preference [ 0
Preference Consensus Distinctiveness Preference\0
ð6Þ
300
M. DeLei et al.
3.2
Similarity Fusion
Although the bias between user ratings and user preferences is a problem, they are still
in line with a positive correlation. Therefore, the preference similarity cannot replace
the rating similarity, through fusion preference similarity and rating similarity, to
compensate for the bias between user ratings and user preferences.
Rating Similarity Usage Pearson Similarity [11] to calculate, Pearson similarity is a
central to the user’s rating interval when processing the rating data, which may avoid
the differences caused by different rating habits. Such as formulas (7):
P
i2Iuv ðrui ru Þðrvi rv Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
SimRcore ðu; vÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ð7Þ
P
P
2
2
ð
r
r
Þ
ð
r
r
Þ
u
v
i2Iuv ui
i2Iuv vi
The calculated data of the preference similarity are derived from the user preference
extracted by the attribution theory, using the Adjusted Cosine [11] calculates preference similarity. The Adjusted Cosine can take into account the two directions of user
preference, namely: positive and negative preference, and the degree of user preference.
Such as formulas (8):
P
i2Iuv ðrui ru Þðrvi rv Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
SimATP ðu; vÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
2 P
2
ð
r
r
Þ
u
i2Iu ui
i2Iv ðrvi rv Þ
ð8Þ
Formula (7) and (8) parameter Description: Iuv is the collection that users collectively rating, Iu , Iv is the collection that users rating for item, rui , rvi are users rating for
item, ru , rv are users average rating for item.
When fusion rating similarity and preference similarity, a and b parameters controls
the weight of two similarity fusion, and the similarity fusion takes the formula (9):
Simðu; vÞ ¼ a SimATP ðu; vÞ þ b SimRaing ðu; vÞ ð1 a bÞSimATP ðu; vÞ
SimRating ðu; vÞ
3.3
ð9Þ
AT-CF Algorithm
The collaborative filtering algorithm based on attribution theory (AT-CF) is implemented by Pseudo code, as shown in Table 1:
A Collaborative Filtering Algorithm Based on Attribution Theory
301
Table 1. AT-CF is implemented by Pseudo code
4 Experiment
4.1
Dataset
Movie lens-1m [12] provided by the Group Lens project team, 6040 User rate 3952
item. It is widely used in recommendation system evaluation, and the sparsity is higher
95.8%. The dataset is split into training sets and test sets, each accounting for 80% and
20%. Since dataset are extremely sparse and divided into training and test sets, there is
no over fitting problem. In the experiment, using Training sets to predict the rating of
un-rating item, and then comparing actual rating with the predict rating in test sets.
4.2
Evaluation Criteria
MAE (mean absolute error) is the average of the absolute error, which can reflect the
actual situation of predictive error. The smaller the value of MAE, the higher accuracy
of the algorithm. As follows formula (10):
MAE ¼
1 X
rpi ri N
ð10Þ
302
M. DeLei et al.
Formula parameter Description: rpi represents a predictive rating for item, MAE
represents the true rating of item.
4.3
Experiment Result
Comparing AT-CF with Traditional CF
In order to verify the AT-CF algorithm proposed in this paper, AT-CF is compared
with the traditional collaborative filtering algorithm that with Adjust cosine similarity
and Pearson similarity. In the experiment, the nearest neighbor number of users is 5,
10, 15, 20, 25, 30, 35, 40, 45, 50, which compares the recommended effect of ATP-CF
and traditional collaborative filtering algorithm under different nearest neighbor values.
The comparison result is displayed in Fig. 2.
Fig. 2. MAE value of AT-CF and the traditional CF
Comparing with the collaborative filtering algorithm which uses the Adjust cosine
similarity, the accuracy of algorithm increase 3%. Comparing with Pearson’s similaritybased collaborative filtering algorithm, the accuracy of algorithm increase 1.5%.
Comparing AT-CF with Exiting CF
In order to test the optimization degree of AT-CF algorithm, the paper compares the
algorithm with extra research work. The object of comparison: Son HU-FCF [4]
(2014), bikash Joshi SGO Based regularization [5] BM/CPT-v (2016), Kim [6] (2014).
The consequence is the following Fig. 3.
Influence of a and b
The parameters a and b in the formula (9) are controlled, and a and b represent the
respective weights when the similarity of the score and the preference similarity are
merged. Figure 4 display the results of the experiments where the nearest neighbor
A Collaborative Filtering Algorithm Based on Attribution Theory
303
number is 5 when a = [0.1 1] and b = [0.1 1]. When the nearest neighbor number is
not 5, the trend of algorithm is still following the Fig. 4.
Fig. 3. MAE value of AT-CF and the exiting CF
Fig. 4. Influence of a and b
As can be seen in Fig. 4, for each combination of parameters, as the b increases, the
MAE value decreases and the test result gets better. When a = 0.1 and b = 0.6, the
MAE value is the smallest, and the test result is the best. Therefore, the experiment part
is carried out under the condition of this parameter.
Experiments show that the accuracy of AT-CF algorithm is increased by 4%–5% of
MAE, when the similarity fusion parameters a = 0.1 and b = 0.6. Therefore, the
AT-CF algorithm utilizes the attribution theory to analyze consistency, distinctiveness,
positive and negative preferences, and can indeed quantify user preferences better. The
weighted method between rating similarity and preference similarity can compensate
for user’s bias and user’s preference.
304
M. DeLei et al.
5 Conclusion
This paper proposes a collaborative filtering algorithm based on attribution theory. We
use three types of behavior information to analyze the user rating behavior: positive and
negative preferences, consistency and distinctiveness. To some extent, the method of
combining the rating similarity of preference similarity can compensate for the bias
between user’s rating and user’s preference. Experimental results on the Movies Lens
dataset show that the AT-CF algorithm has a better recommendation and the recommendation accuracy is increased by 4%–5%.
Because Kelly’s attribution theory overemphasizes logic and makes it idealistic,
Kelly supplements several principles such as the principle of reinforcement and the
principle of compensation, and further work needs to consider such factors as the
principle of enhancement. And, in many cases cannot get the relevant behavioral
information. In this regard, further work must take into account the additional information and environmental aspects of the actors involved references.
References
1. Jie, L., Dianshuang, W., Mao, M., Wang, W., Zhang, G.: Recommender system application
developments: a survey. Inf. Syst. 74(4), 12–32 (2015)
2. Goldberg, D., Nichols, D., Oki, B.M., et al.: Using collaborative filtering to weave an
information tapestry. Commun. ACM 35(12), 61–70 (1992)
3. Son, L.H.: Dealing with the new user cold-start problem in recommender systems: a
comparative review. Inf. Syst. 58(5), 87–104 (2016)
4. Son, L.H.: HU-FCF: A hybrid user-based fuzzy collaborative filtering method in
recommender system. Expert Syst. Appl. 41(5), 6861–6870 (2014)
5. Joshi, B., Iutzeler, F., Amini, M.R.: Asynchronous distributed matrix factorization with
similar user and item based regularization. In: ACM Conference on Recommender Systems,
pp. 75–78 (2016)
6. Kim, Y.D., Choi, S.: Bayesian binomial mixture model for collaborative prediction with
non-random missing data. In: ACM Conference on Recommender Systems, pp. 201–208
(2014)
7. Hofmann, T.: Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst.
(TOIS) 22(1), 89–115 (2004)
8. Zou, C., Zhang, D., Wan, J., Hassan, M.M., LIloret, J.: Using concept lattice for
personalized recommendation system design. IEEE Syst. J. 11(1), 305–314 (2017)
9. Patra, B.K., Launonen, R., Ollikainen, V., Nandi, S.: A new similarity measure using
bhattacharyya coefficient for collaborative filtering in sparse data. Knowl.-Based Syst. 82(7),
163–177 (2015)
10. Kelley, H.H.: Attribution theory in social psychology. Nebr. Symp. Motiv. 15(6), 192–238
(1967)
11. Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Item-based collaborative filtering
recommendation algorithms. In: International Conference on World Wide Web, vol. 4(1),
pp. 285–295 (2001)
12. Harper, F.M., Konstan, J.A.: The movie lens datasets: history and context. ACM 5(4), 19
(2015)
Download