International Journal of Application or Innovation in Engineering & Management... Web Site: www.ijaiem.org Email: , Volume 2, Issue 9, September 2013

advertisement
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 9, September 2013
ISSN 2319 - 4847
Straight Through Grouping for Recommender
System
Najma HAMZAOUI, Wafae BAIDA, Abdelfettah SEDQUI, Abdelouahid LYHYAOUI
Abdelmalek Essaâdi University, LTI Lab, ENSA of Tangier, Tangier, Morocco
PO Box: 1818, Tangier Principal
Abstract
In this paper, a new classification approach in recommender system is presented. First, we propose a co-classification of a data
set to identify natural groups of similar objects. Then, we perform with a mechanism for the extraction process of these groups.
The results are remarkable, especially for large scales data that requires more pre-treatment organizational evolutionary
resources. This innovative method seems to give efficient and accurate grouping and recommendations.
Keywords: Collaborative Filtering, Group Technology, Recommender System, Co-Clustering
1. INTRODUCTION
With the huge amount of information flowing through the Web, it is increasingly difficult to find the information needed
quickly and efficiently. However, with the advent of recommender systems (RS) during the 90 years [1, 2, 3], reducing
information overload has become easy. Indeed, it is a system that aims to help users find interesting items, provide
relevant information that meets their satisfaction and their real needs through a process of collecting, filtering and
recommendation of the information. The RS are usually classified into three categories [4], content-based, collaborative
filtering, and hybrid approach. Content-based methods are based on the characteristics of items to generate
recommendations. On the other hand, CF is considered the technique of the most successful recommendation. Indeed, it
is the most used in recommender systems for e-commerce. This technique allows recommending an item to a user based
on the user profiles that are closest to him. In a recommendation system based on the CF data are presented in matrix,
whose rows and columns are respectively based on a set of users and a set of items. Each user evaluates a set of elements
by assigning certain values. The hybrid approach generates recommendations by combining the two approaches to
content-based and collaborative filtering. To avoid the weaknesses of collaborative filtering techniques, research has
focused on classification techniques based on models, in order to be more accurate and efficient. Thus, on the basis of user
notes (ratings), these techniques include users (or items) forming clusters. This approach gives a new way to identify the
neighborhood to make the recommendation, without using the entire database[2,3,7,8], it has produced several methods,
like hierarchical clustering, [ 6 , 9,10,11,12,13,14,15]. Another class of model, which is the evidence in collaborative
filtering in recent years, is the matrix factorization [29, 30, 31, 32, 33, 34, 35]. Most of these methods can be grouped
according to monofiltering and bifiltering approaches.
In this article, we adapt and apply the Bond Energy Algorithm (BEA) to perform a simultaneous co-clustering of the
matrix. This allows us to find a natural clusters corresponding to the different components communities, and without a
priori constraints on the number of class. Our work is structured as follows: Section 2 cites and summarizes the work
related to recommender systems. In section 3, it describes the BEA algorithm, its application for co-classification and
proposed for the extraction of classes representing natural communities’ solution. Finally, we conclude in Section 4 by the
presentation of results and prospects.
2. RELATED WORK
The memory-based algorithms have shortcomings and are not suitable for a large system. For this, to achieve better
prediction performance, researchers have proposed several approaches based on model-based CF [6]. The techniques of
model-based, address better the problem of scalability in dealing with groups of examples, rather than the whole database.
In a recommendation system, we find users who share the same tastes and interests. Consequently, we can combine to
form a community. Many approaches address this problem in the literature provides several methods. Most of these
methods can be grouped into two categories: monofiltering [17,18,19,20] or bifiltering (co-clustering). Despite the
problem of scarcity, the biggest challenge is scalability FC. Many researchers have shown that the use of the technique of
co-clustering is more robust to solve this problem, and it is a viable way to increase the scalability while maintaining a
good quality of recommendation [17,19, 23] . The co-clustering involves both the grouping of users and items
simultaneously. In [21], the algorithm used the simultaneous partitioning. The authors of [22, 23] used as a method of coclustering, but introducing an analysis of the duality between users and items. They propose a system based on coclustering and a new similarity measure algorithm. Thus, when the database is large, it is more appropriate to use the
Volume 2, Issue 9, September 2013
Page 1
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 9, September 2013
ISSN 2319 - 4847
ClustKnn method presented in [24, 25, 26]. The authors first compressed data by constructing a model of efficient
clustering; the recommendations are generated using an effective approach based on the nearest neighbors. A summary of
the FC-based clustering can be found in [5, 27]. A recent class of successful models of collaborative filtering is based on
matrix factorization. Many methods have shown that the use of factorization methods for co-clustering gives a better
result, as is the case of SVD, NMF, Tri-NMF, PMF, Non linear MFP MFP Bayesian methods, and NPCA [29, 30, 31, 32,
33, 34, 35].
3. THE PROPOSED APPROACH
The size of networks requires the need to find methods can make them easy to manage. This requirement involves finding
ways to structure it as groups with common characteristics. Nowadays, data using cross-classification or co-clustering
comes from bioinformatics, text mining, but also the industry. In the industrial sector, the co-clustering is the name of the
Group Technology (GT). It is a concept based on the identification and exploitation of the similarities and the similarity
between the products and processes of design and manufacturing in order to rationalize production and reduce
manufacturing costs [37]. In this sense, we present an algorithm named co-classification Bond Energy Algorithm (BEA),
which is suitable for co-clustering of users and items. It is derived from the industrial world and generalizes coclassification algorithms working on GT. It is an algorithmic approach proposed by [28].
Methods of co-clustering process all rows and all columns in a table of data simultaneously seeking to obtain
homogeneous blocks.
The data representation for the traditional collaborative filtering specially for item-based CF is based on the construction
of an N×M item-user matrix U, showed in table I.
in the
row and column of the matrix means the rating value
for item i of user j.
Table I. Binding energy of an element with its four nearest neighbors in the incidence matrix A
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
…
For a recommendation system, the use of TG, and specifically BEA results in the formation of communities. This
algorithm is based on the rearrangement of rows and columns to reformulate the matrix useful as a matrix of blocks.
Indexes the rows and columns of these blocks represent the users and items of community members that are characterized
by a strong similarity. Each block represents a strong association between users and items. Only a few methods have
focused on complete extraction, specially, for large data, something that involves the search for a new method to perform
an accurate and automatic extraction after the formation of natural blocks by BEA.
We can summarize the process of our Straight Through Grouping method (STGM) for Recommender System as follows:
 Set together those who are similar by applying BEA
 Detect, extract communities.
3.1 The propose of BEA
The purpose of BEA is to achieve a co-classification of a sparse matrix to identify groups of objects by making
permutations of rows and columns of the incidence matrix. It also seeks to display and discover the associations and
interrelationships between the groups with each other. It is based on the connection between an element of the incidence
matrix A and the four nearest neighbors as illustrated in Table I. According to [28], these bonds can be considered as
energy. Taking into account the frontier energy calculated, the permutation of rows and columns is made at the end to
collect the elements of the matrix to create the group with maximum energy. The permutation is based on the value of
coefficient of maximum energy (ME) as follows:
(1)
Generally, the measure of effectiveness (ME) of a matrix A is the sum of its bond strength, where the bond strength
between two nearest neighboring elements is defined as their product. The ME, is then given by:
Volume 2, Issue 9, September 2013
Page 2
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 9, September 2013
ISSN 2319 - 4847
With convention
) and A is a non-negative matrix of dimension NxM. ME maximization
is taken over all N! M! Possible matrices, which can be obtained, from the input matrix by permutations of the rows and
columns. This problem, as shown in [29], is reduced to two separate optimizations, one for rows and one for columns.
Since the problems are equivalent, only the maximizing the sum of the bond lines or maximizing the sum of the bond
columns should be discussed.
A algorithm, which exploits the nearest-neighbor feature, and is believed to be much faster and just as satisfactory (in the
sense of achieving near optimal arrangements), has been developed and used successfully to determine array orderings
corresponding to local optima of the ME. The algorithm is as follows.
Table II. BEA ALGORITHM
1. Place one of the columns arbitrarily. Set i = 1.
2. Try placing individually each of the remaining N-i columns in each of the i +1 possible positions (to the left and right of
the
i columns already placed), and compute each column's contribution to the ME. Place the column that gives the largest
incremental contribution to the ME in its best location. Increment i by 1 and repeat until i = N.
3. When all the columns have been placed, repeat the procedure on the rows. (The row placement is unnecessary;
however, if
the input array is symmetric, since the final row and column orderings will be identical.)
This algorithm has several important characteristics: It is finite and swift. The algorithm will always reduce an input
array to pure block form and the final groupings and relations, however, have been found to be insensitive to the initial
row (column) selected and their associated MEs have been found to be numerically close [29].
The algorithm will provide from the input matrix, a matrix of output as pure non-intersecting blocks or blocks
checkerboard forms. In the case of the checkerboard form, blocks of non-zero matrix elements on the main diagonal
represent pure groups, but the off-diagonal blocks indicate the relationships between the groups.
3.2 Extraction of communities:
BEA reforms the incidence matrix in the form of blocks with similar values, but this algorithm, don’t provide an
automatic extraction of these blocks. Our proposed solution for the automatic detection blocks follows the following steps:
After applying the BEA incidence dimension matrix (NxM), we propose to make a second reorganization based on the
weight calculated
for each line (user) and
of each column (item).
Avec
We consider that this weight is associated with each object. That it is the user or item. The basic building block of our
method is an algorithm for partitioning that will be used on all users and all items which are represented by their weight.
The usefulness of this weight appears when you decide to arranged them in descending order. Indeed, sorting of these
weights of rows and columns transform the result matrix from BEA, in the diagonal form, show homogeneity of members
of the same class (users or items) and will project the whole weight users on an axis and that of the items on another axis,
so as to differentiate between the classes of users and items.
Fig. 2a. Users sorted by the computed weights
Volume 2, Issue 9, September 2013
Page 3
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 9, September 2013
ISSN 2319 - 4847
Fig. 2b. Items sorted by the computed weights
By applying the approach on a portion of the MovieLens data [38], we found the results shown in fig.2.a and fig.2.b
These results show the existence of groups of users / items with similar weight, hence belonging to the same community.
for a user / item given, we find a remarkable variance between their weight, which explains the heterogeneity between the
famous users / items. Therefore, these two elements can represent the boundaries of each community.
Then, the proposed approach tries to apply a solution to co-clustering matrix incidence of users and items, and search for
partitions of users and items, based on the weight difference. Calculating the distance between the weight of users / items,
is expressed by the difference between two successively ordered elements. Based on this, we can easily detect the
boundaries of each class of users or items,
Fig.3.a: Distance between the weights of users
Fig. 3.b: Distance between the weights of items
By this way, we just make a graphical presentation as shown in fig.3.a and fig.3.b for all distances calculated for users
and for items. By observing the results presented, we can determine community members of users and items by choosing
a threshold and projecting on the axis of the users and on the axis of items.
when we project the difference weight of the items on a dimension, classes of items are separated. For any pair of weight
value of consecutive items separated by the greatest distance can be cut up and so on. We choose the best divisions. This
method produces a set of clusters defined by intervals includes the items or users of one dimension.
Volume 2, Issue 9, September 2013
Page 4
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 9, September 2013
ISSN 2319 - 4847
This method is based on the output matrix of the BEA algorithm. It calculates a utility function for each line (items) and
performs a descending sort on these values. The same procedure is applied to the columns (users) by calculating a utility
and doing a sort on the values of this function. The values of each utility function (rows and columns) used to calculate
the differences between successive values. This gives rise to two respective curves 3.a and 3.b. It performs a data partition
as the clusters do not overlap as shown in 3.a for users and 3.b for items. This method is quite effective, while these
divisions are easily accepted because the fact of the important separation between clusters
4. CONCLUSION
This article attempts to address the need to improve the recommendation system using new techniques of clustering
algorithm based on BEA, from Technology Group, used in the industrial field. It is very responsive and accurate for better
co-clustering of large data systems. The proposed solution STGM is considered an extension of the algorithm BEA,
especially in the extraction and determination of the number of classes by a transformation and a projection of the matrix
resulting from BEA. Indeed, based on the blocks provided by the BEA algorithm, we used a measure function to find the
classes of users and items. We thus obtained two corresponding projections for sorted users, and resources to determine
classes. The method has been successfully tested on a portion of the database MovieLens. We are currently evaluating this
method in comparison to others to apply it to various industrial problems in recommender system.
REFERENCES
[1] W.Hill, L.Stead, M.Rosenstein, and G. Furnas, ’Recommending and Evaluating Choices in a Virtual Community of
Use,’ Proc. Conf.Human factors in Computing Systems, 1995.
[2] P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and J. Riedl, `Grouplens: An Open Architecture for collaborative
filtering of Netnews', Proc. of CSCW '94, (1994).,27,72,97.
[3] U. Shardanand and P.Maes, Social Information Filtering: Algorithms for Automating World of Mouth, proc. Conf.
Human Factors in Comuting Systems, 1995.
[4] M. Balabanovic and T. Shoham, Fab:content-based, Collaborative Recommendation, Comm. ACM, Vol. 40, n 3, pp,
66-72,1997.
[5] R. Burke, Hybrid Recommender Systems: Survey and Experiments, User Model. User-Adapted Interaction 12 (4)
(2002) 331–370
[6] J.S.Breese , D. Heckerman, and C. Kadie, Empirical Analysis of Predictive Algorithms for Collaborative Flitering ,
Proc.14 th Conf. Uncertaintly in Artificial Intelligence, July 1998.
[7] J. Delgado and N.Ishii, Memory- Based Weighted majority Prediction for Recommender Systems, proc.ACM SIGIR
99 Workshop Recommender Systems: Algorithms and Evaluation, 1999.
[8] A. Nakamura and N.Abe, Collaborative Filtering Using Weighted Majority Prediction Algorithms, Proc.15 th int’l
Conf machine learning, 1998.
[9] D.Billsus and m.pazzani, learning Collaborative Information Filters, Proc. Int’l Conf. Machine Learning, 1998.
[10] L.Getoor and M.Sahami, Using Probabilistic Relational Models for Collaborative Filtering, Proc. Workshop Web
Usage Analysis and User Profiling (WEBKDD’99), Aug.1999.
[11] K.Goldberg, T.Roeder, D.Gupta, and C.Perkins, Eigentaste :A Consatant Time Collaborative Filtering Algorithm,
Information Retrieval J., vol.4, n0.2, pp.133-151,July 2001.
[12] T. Hofman, Collaborative Filtering via Gaussian Probabilistic Latent Semantic Analysis, Proc. 26th Ann. Int’l ACM
SIGIR Conf., 2003.
[13] B. Marlin, Modeling User Rating Profiles for Collaborative Filtering, Proc.17th Ann. Conf. Neural Information
Processing systems (NIPS’ 03), 2003.
[14] D. Pavlov and D.Pennock, A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, HighDimensional Domains, Proc. 16th Ann. Conf.
Neural information Processing Systems(NIPS’02), 2002.
[15] L.H. Hungar and D.P. Foster , Clustering Methods for Collaborative Filtering, Proc. Recommender Systems, Paper
from 1998 Workshop, Technical Report WS-98-08 1998.
[16] G. Adomavicius, A. Tuzhilin, Toward the next generation of recommender systems: a survey of the state-of-the-art
and possible extentions , IEEE Trans Knowledge Data Eng. 17 (6) (2005) 734–749.
[17] S. H. S. Chee, J. Han, and K. Wang, “RecTree: an efficient collaborative filtering method,” in Proceedings of the 3rd
International Conference on Data Warehousing and Knowledge Discovery, pp. 141–151, 2001.
[18] M. O'Connor and J. Herlocker, “Clustering items for collaborative filtering,” in Proceedings of the ACM SIGIR
Workshop on Recommender Systems (SIGIR '99), 1999.
[19] B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl, “Recommender systems for large-scale E-commerce: scalable
neighborhood formation using clustering,” in Proceedings of the 5th International Conference on Computer and
Information Technology (ICCIT '02), December 2002.
Volume 2, Issue 9, September 2013
Page 5
International Journal of Application or Innovation in Engineering & Management (IJAIEM)
Web Site: www.ijaiem.org Email: editor@ijaiem.org, editorijaiem@gmail.com
Volume 2, Issue 9, September 2013
ISSN 2319 - 4847
[20] G.-R. Xue, C. Lin, Q. Yang, et al., “Scalable collaborative filtering using cluster-based smoothing,” in Proceedings
of the ACM SIGIR Conference, pp. 114–121, Salvador, Brazil, 2005.
[21] George, T., & Merugu, S. A scalable collaborative filtering framework based on co-clustering. In Proceedings of the
IEEE ICDM conference 2005
[22] Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos Papadopoulos, Yannis Manolopoulos, Nearest-Biclusters
Collaborative Filtering, WEBKDD 2006
[23] Panagiotis Symeonidis, Alexandros Nanopoulos, Apostolos N. Papadopoulos, Yannis Manolopoulos.Nearestbiclusters collaboartive filtering based on constant and coherent values.Inf Retrieval 2007 DOI 10.1007/s10791-0079038-4
[24] J. Kelleher and D. Bridge. Rectree centroid: An accurate, scalable collaborative recommender. In Procs. of the
Fourteenth Irish conference on artificial Intelligence and Cognitive Science, pages 89-94, 2003
[25] Rashid, A.M.; Lam, S.K.; Karypis, G.; Riedl, J.; ClustKNN: A HighlyScalable Hybrid Model- & Memory-Based CF
Algorithm. WEBKDD 2006.
[26] A. M. Rashid,, S. K. Lam, A. LaPitz, G. Karypis,and J. Riedl. “Towards a Scalable kNN CF Algorithm:Exploring
Effective Applications of Clustering”.LNCS vol 4811/2007: Advances in Web Mining and Web Usage Analysis, 147166, Springer.
[27] RuLong Zhu, SongJie Gong, “Analyzing of Collaborative Filtering Using Clustering Technology” in Procs of ISECS
International Colloquium on Computing, Communication, Control, and Management, 2009.
[28] W.T. MCCORMICK, S. B. DEUTSCH, J. J. MARTIN, AND P. J. SCHWEITZER, "Identification of Data Structures
and Relationships by Matrix reorderingTechniques," Research Paper P-512, Institute for Defense Analyses,
Arlington, Va., December 1969.
[29] William T. McCormick Jr., Paul J. Schweitzer and Thomas W. White « Problem Decomposition and Data
Reorganization by a Clustering Technique» Operations Research, Vol. 20, No. 5 (Sep. - Oct., 1972), pp. 993-1009
[30] D. Billsus and M.J. Pazzani learning collaborative information fillters. In Proceedings of the Fifteenth International
Conference
on Machine Learning, pages {46,54}, 1998.
[31] R. Salakhutdinov and A. Mnih. Probabilistic matrix factorization. In Advances in neural information processing
Systems 2008
[32] N. D. Lawrence and R. Urtasun. Non-linear matrix factorization with gaussian processes. In Proceedings of the 26th
Annual International Conference on Machine Learning, 2009
[33] R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proc.
of the International Conference on Machine Learning, 2008
[34] K. Yu, S. Zhu, J. Lafferty, and Y.Gong. Fast nonparametric matrix factorization for Large Scale Collaborative
filtering. In Proc. of the international ACM SIGIR conference on Research and development in information retrieval,
2009.
[35] LEE D. D., SEUNG H. S., « Learning the parts of objects by non-negative matrix factorization », Nature, vol. 401,
1999, p. 788-791.
[36] J. Yoo, S. Choi “Orthogonal nonnegative matrix tri-factorization for co-clustering: Multiplicative updates on Stiefel
manifolds”,
Information Processing and Management 46 (2010)
[37] Walter Jean-Luc ; Mutel Bernard (Directeur de thèse) ; Université de Mulhouse FRANCE, (Université de
soutenance) » Study of group technology implementation. Application in the Holeg international firm”
[38] B. Miller, I. Albert, S. Lam, J. Konstan, J. Rield, Movielens unplugged: experiences with an occasionally connected
recommender systems, in:Proc. Internat. Conf. Intelligent User Interfaces, 2002, pp. 263–266.
Volume 2, Issue 9, September 2013
Page 6
Download