International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 12 - Apr 2014
#1
*2
#1
Student, Dept. of Computer Science & Engineering.
*
2
Assistant Professor, Dept. Of Information Science & Engineering.
Channabasaveshwara Institute of Technology, Gubbi(T) , Tumkur (D).
VTU, Belgaum, Karnataka, India.
Abstract
Recommender systems are becoming increasingly essential and people cannot function without them as various contents generated on web are more freestyle and less structured.
Different types of recommendations on web include movies, music, books recommendations, query suggestions, tag recommendations etc .the types of data sources used for recommendations is of no concern. These data sources can be represented in the form of various types of web graphs. We aim at providing a generalized framework, for mining web graphs for recommendations 1) we first propose a novel diffusion method which propagates similarities between different nodes and generates recommendations.2) then we demonstrate how generalization of different recommendation problems into our graph diffusion framework can be done. The proposed method can be utilized in many recommendation works including query suggestions, tag recommendations, image recommendations etc.
ability to provide a recommendation which is not planned but has a good result.
II. PROBLEM STATEMENTS
There are certain challenges to be faced by recommender systems.
It is very difficult to provide latent semantically relevant results to the query submitted by the user.
Each user have their own interest or requirements. It is difficult to provide personalization features.
It is not efficient to model different recommendation algorithms for various recommendation tasks.
Keywords
Web graphs, Recommendations, query suggestion, novel diffusion
It is time consuming and requires user rating matrix to be maintained. In many situations ratings may not be available in which case prediction of user interest becomes difficult.
I.
I NTRODUCTION
Web mining is the use of data mining techniques to automatically discover and extract information from web documents and services. Basically web mining is classified into three types as web structure mining, web usage mining, and web content mining. Web content mining is scanning and mining of graphs, texts and pictures of web pages to find the relevance of query to the content.web content mining includes four steps to extract information such as collecting content from the web, parsing usable data from formatted data, analyse and then convert the results of analysis into useful things like report, search index etc.,As the information in the web is more freestyle and less structured, the difficulties in mining the useful information from the web is also increasing.
In order to satisfy the web user’s needs and to improve their experience in web applications, recommender systems are widely deployed.
Recommender systems are based on collaborative filtering[5][6][7][8], which is a method that automatically assumes the active users interest, by collecting rating information from other similar users, so that the active users prefer those items ,preferred by other users. Collaborative filtering is implemented widely in many commercial systems such as at Amazon for product recommendation, movie recommendation at Netflix etc.
Collaborative filtering is having its own advantages 1) They have ability to filter items based on quality and taste 2) have
To overcome above challenges, there is a need for general framework that can be used for different recommendation tasks like image recommendation and query suggestion etc,.This framework is based on heat diffusion method and can be applied to both directed and undirected graphs.
III.METHODOLOGY
The system architecture is shown in fig 1.
User
6
1
7
Recommendation engine
Search engine
4
5
3
Graph construction
2
Heat diffusion
Fig 1: system architecture
ISSN: 2231-5381 http://www.ijettjournal.org
Query –
URL, image
Data set
Page 569
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 12 - Apr 2014
1.
User submits query to search engine
2.
Search engine extracts data which could be image dataset or query URL dataset.
3.
Construct graph for the extracted data set
4.
Apply heat diffusion model to calculate heat values
5.
Query-URL with heat values are submitted to recommendation engine.
6.
Top K recommendations are sent search engine.
7.
Search engine sends top most suggestion back to user.
The system flow is shown in fig 2.It has four modules as
Data collection
Query processing
Graph construction
Performance analysis
Click through data has the following information in every line: a user (u), a query (q) issued by the user, a URL (l) on which the user clicked, the rank(r) for that URL, and the time
(t) at which query was given for search. Thus, the click through data can be represented by a quintuple set
(u,q,l,r,t).This data set represents the raw data which was recorded by the search engine, and has lot of noise which will gradually affect the efficiency of the query suggestion algorithm. We have to filter the data to keep those repeated, well structured and English queries.
B. Query Processing
The query before given to search must be processed. Query suggestion [9] is accompanied with query substitution or query expansion [3], which stretches the unique query with original terms of search, So that the range of the search gets narrow.
Data cleaning is important process in the data mining that results in the data quality improvement. It is used to detect the relevant data and removing data present in the query.
The query processing method mainly performs removal of unformatted data and duplicate data. The obtained final data set has query id, rank, URL and time. This query has become ready for search optimization.
C. Graph Construction by Diffusion
Heat diffusion is a physical method. In any medium heat always flows from high temperature position to low temperature position. In this paper heat diffusion method is used for designing the propagation of information that is similar on web graphs. Representation of web with regular geometry is a difficult task. This encourages investigating the heat flow on a graph. Different kinds of graphs are considered and flow of graphs is also given below [4].
1) Diffusion on Undirected Graphs:
In undirected graph G = ( V , E ) where V is the vertex set, and E is the edges. There is an edge between v i to v j is the set of all edges. The edge is considered as a pipe that connects nodes v i and v j
. The value f i
( t ) defines the heat at node vi at time t , start from an initial distribution of heat given by f i
(0) at time zero. f ( t ) means the vector consisting of f i
( t ). This is formulated as,
Fig 2: system flow
A. Data Collection
The graph of query suggestion is constructed based on
AOL search engine’s click through data [1]. The click through data [10] is responsible for recording the web user’s activities, thereby reflecting the interest of users and the latent semantic relationships between users and user submitted queries as well as queries and web documents that were clicked. f i
(t + δt) - f i
(t) = α Σ
(vi,vj)ЄE
(f j
(t) - f i
( t )) …..(1)
δt where E is the set of edges. We express it in a matrix form. f(t + δt) – f(t) = α (H – D) f(t)……….(2)
δt
2) Diffusion on Directed Graphs:
The above model of heat diffusion is designed for undirected graphs, but in many cases, the Web graphs will be directed, particularly in online recommender systems or
ISSN: 2231-5381 http://www.ijettjournal.org
Page 570
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 12 - Apr 2014 knowledge sharing sites. Each and every user in knowledge sharing sites commonly has a trust list. The users in the faith list can impact this user deeply.
These relationships are considered a directed since user is in the faith list of user y , but user y might not be in the trust list of user x . At the same time, the extent to which trust relations are maintained is different since user u i may trust user u j through trust achieve 1 while trust user u k only with trust score
0.2. So, there are different weights accompanied with the relations.
Consider a directed graph, G = { V , E , W } where; V is the vertex set, V = { v
1
, v
2
,.. v n
}; W = w ij
, where w ij is probability that edge ( v i
, v j
) exists or the weight that is associated with this edge and E = ( v i
, v j
), an edge from v i to v j and w ij
> 0 is the set of all edges.
At the same time, node v i diffuses DH (i, t , Δ t ) sum of heat to its subsequent nodes. We assume that
1. The heat DH (i, t , Δ t ) must be proportional to the time period Δ t . convert this bipartite graph to fig 3(b). In the converted graph, every undirected edge in the original graph is converted into two directed edges. The weight on a directed query-URL edge is distributed by the number of times that the query is issued, while the weight on a directed URL-query edge is distributed by the number of times that the URL is clicked.
4) Query Suggestion Algorithm:
Once the graph is constructed, we can easily model the query suggestion algorithm.
1: A converted bipartite graph G= (V+UV*, E) consists a query set V+ and URL set V*. The two directed edges are weighted using the method introduced in previous section.
2: Given a query q in V+, a sub graph is constructed by using depth-first search in G. The search stops when the number of queries is larger than a predefined number.
3: As analysed above, set ⍺ =1 and without loss of generality,
2. The heat DH (i, t , Δ t ) must be proportional to the heat at node vi .
3. Every node has the same ability to diffuse heat.
4 The heat DH ( i , t , Δ t ) must be proportional to the weight. It is assigned in between node v i and its following nodes.
3) Graph Construction:
For the query-URL bipartite graph, consider an undirected bipartite graph B ql
= (V ql
, Eql), where V ql
=QUL, Q ={q
1
,q
2
,… qn}, and L={l
1
, l
2
, . . .. l p
}. Eql= (q i
,l j
) there is an edge from qi to l j
is the set of all edges. The edge (q j
,l k
) exists if and only if a user u i clicked a URL l k
after issuing a query q j
. set the initial heat value of query q fq(0)=1.Start the diffusion process using f (1) = e
αR f (0)
4: Output the Top-K queries with the largest values in vector f
(1) as the suggestions.
D .Result Analysis
Different results of Query suggestion algorithm are shown in this section.
1) Impact of Parameter α :
The α parameter has an important role to play in our proposed method. It is responsible for controlling the speed at which heat propagates on the graph.
Fig 3: Graph constructions for query suggestion.
(a) Query-URL bipartite graph. (b) Converted query-URL bipartite graph.
The values present on the edges of the graph specify the number of times a query is clicked on a URL. We cannot simply apply the bipartite graph derived from the click through data into the diffusion processes as this graph is an undirected graph, and cannot accurately illustrate the relationships between queries and URLs [2]. So, we need to
Fig 4 Impact of α
We can observe from the figure that the best value setting is
1. If we choose smaller thermal conductivity, the performance will be dropped as some of the relevant nodes cannot get enough amount of heat. On the other hand, if we choose relatively larger value of α , the performance will also decrease as the heat transfers very fast, some irrelevant nodes may gain more heat which is not necessary, hence will affect the performance.
ISSN: 2231-5381 http://www.ijettjournal.org
Page 571
International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 12 - Apr 2014
2) Impact of the Size of Sub graph:
Web graphs are normally of very large size; it performs on query suggestion algorithm on a sub graph which was derived from the original graph [8]. Hence, it is necessary to evaluate the extent to which the size of this sub graph affects the accuracy of the recommendation.
[7] Z. Huang, H. Chen, and D. Zeng, “Applying Associative Retrieval
Techniques to Alleviate the Sparsity Problem in Collaborative
Filtering,” ACM Trans. Information Systems, vol. 22, no. 1, pp. 116-42,
2004.
[8] B. Marlin, “Modeling User Rating Profiles for Collaborative
Filtering,” Advances in Neural Information Processing Systems 16, S.
Thrun, L. Saul, and B. Scho¨ lkopf, eds., MIT Press, 2004.
[9] Q. Mei, D. Zhou, and K. Church, “Query Suggestion Using Hitting
Time,” CIKM ’08: Proc. 17th ACM Conf. Information and Knowledge
Management, pp. 469-477, 2008.
[10] G.Dupret and M. Mendoza, “Automatic Query Recommendation Using
Click-Through Data,” Proc. Int’l Federation for Information Processing,
Professional Practice in Artificial Intelligence (IFIP PPAI),pp. 303-312,
2006.
Fig 5 Impact of the size of sub graph ( α= 1)
Fig 5 shows the changes of performance with different sizes of sub graph. We can observe that when the size of the graph is very small, like 500, the performance of our algorithm is not very good as this sub graph has to ignore some very relevant nodes. When the size of sub graph is increasing, the performance also gets increased.
IV.
C ONCLUSIONS
Collaborative filtering is a stimulating method having ability to filter the information by selecting and rejecting the items on the basis of quality and taste. Our proposed model makes use of collaborative technique with an enhancement of providing more latent semantically relevant results for recommendations and graph construction can also be shown. It also provides personalization features and is a single model used for many tasks like image recommendation and query suggestion. As a future work we can make it to work for video recommendations.
REFERENCES
[1] H. Ma, H. Yang, M.R. Lyu, and I. King, ” SoRec: Social
Recommendation Using Probabilistic Matrix Factorization” ,
CIKM ’08: Proc. 17th ACM Conf. Information and Knowledge
Management, pp.931- 94, 2008.
[2] H. Yang, I. King, and M.R. Lyu, ”DiffusionRank:Possible Penicillin for Web Spamming” ,SIGIR’07:Proc. 30th Ann. Int’l ACM SIGIR
Conf.Research and development in Information Retrieval,pp. 431438,
2007.
[3] A. Chirita, C.S. Firan, and W. Nejdl, ”Personalized Query Expansion for the Web” ,SIGIR ’07:Proc.30thAnn Int’l ACM SIGIR Conf.
Research and Development in Information Retrieval, pp. 7-14,2007.
[4] Hao Ma, Irwin King, ”Mining web graph for recommendations” , IEEE
TRANSACTIONS ONKNOWLEDGE AND DATA ENGINEERING
VOL.24, NO. 6, JUNE 2012
[5] A.S. Das, M. Datar, A. Garg, and S. Rajaram, “Google News
Personalization: Scalable Online Collaborative Filtering,” WWW’07:
Proc. 16th Int’l Conf. World Wide Web, pp. 271-280, 2007.
[6] J.L.Herlocker, J.A. Konstan, L.G. Terveen, and J.T. Riedl,“Evaluating
Collaborative Filtering Recommender Systems,”ACM Trans.
Information Systems, vol. 22, no. 1, pp. 5-53, 2004.
ISSN: 2231-5381 http://www.ijettjournal.org
Page 572