Paper

advertisement
A Survey on Web Recommendation
Systems Based on Web Usage Mining
N. M. Abo El-Yazeed
Demonstrator at High Institute for Management and Computer,
Port Said University, Egypt
no3man_mohamed@himc.psu.edu.eg
Abstract:
Web usage mining has become the subject of exhaustive research, as
its potential for Web-based personalized services, prediction of user near
future intentions, adaptive Web sites, and customer profiling are
recognized. Recently, a variety of recommendation systems to predict
user future movements through Web usage mining have been proposed.
However, the quality of recommendations in the current systems to
predict user future requests in a particular Web site is below satisfaction.
Different efforts have been made to address the problem of information
overload on the Internet. Web recommendation systems based on web
usage mining try to mine users’ behavior patterns from web access logs,
and recommend pages to the online user by matching the user’s browsing
behavior with the mined historical behavior patterns.
Keywords:
Web Mining, Web Usage Mining, Web-based recommendation systems,
Navigation pattern mining, Web Log, Web Personalization.
1. Introduction:
The volume of information available on the internet is increasing
rapidly with the explosive growth of the World Wide Web and the advent
of e-Commerce. While users are provided with more information and
service options, it has become more difficult for them to find the “right”
or “interesting” information, the problem commonly known as
information overload.
1
Recommender systems [1] are alternative, user-centric, promising
approaches to tackle the problem of information overload by adapting the
content and structure of websites to the needs of the users by taking
advantage of the knowledge acquired from the analysis of the users’
access behaviors. They can be generally defined as systems that guide
users toward interesting or useful objects in a large space of
possible options [2].
In recent years there has been an increasing interest in applying web
usage mining techniques to build web recommender systems [3,4,5]. Web
usage recommender systems take web server access logs as input, and
make use of data mining techniques such as association rule and
clustering to extract implicit, and potentially useful navigational patterns,
which are then used to provide recommendations. Web server access logs
record user browsing history, which contains plenty of hidden
information regarding users and their navigation. They could, therefore,
be a good alternative to the explicit user rating or feedback in deriving
user models. Unlike traditional techniques, which mainly recommend a
set (referred to as the recommendation set) of items deemed to be of
interest to the user base their decisions on user ratings on different items
or other explicit feedbacks provided by the user [6,7]. These techniques
discover user preferences from their implicit feedbacks, namely the web
pages they have visited. Clustering and collaborative filtering approaches
are ready to incorporate both binary and non-binary weights of pages,
although binary weights are usually used for computing efficiency [8].
Association Rule (AR) mining [9] can lead to higher recommendation
precision [8], and are easy to scale to large datasets, but how to
incorporate page weight into the AR models has not been explored in
previous studies.
2. Web Mining:
Web mining is the application of data mining Techniques to extract
knowledge from Web data, in which at least one of structure or usage
(Web log) data is used in the mining process. There are three broad
categories of Web mining [10]:
2
 Web content mining
Web content mining is the process to discover useful information from
text, image, audio or video data in the web. Web content mining
sometimes is called web text mining, because the text content is the most
widely researched area. The technologies that are normally used in web
content mining are NLP (Natural language processing) and IR
(Information retrieval).
 Web structure mining
Web structure mining operates on the Web’s hyperlink structure.
Web structure mining is the process of using graph theory to analyze the
node and connection structure of a web site. This graph structure can
provide information about ranking or authoritativeness and enhance
search results of a page through filtering. According to the type of web
structural data, web structure mining can be divided into two kinds.
The first kind of web structure mining is extracting patterns from
hyperlinks in the web. A hyperlink is a structural component that
connects the web page to a different location. The other kind of the web
structure mining is mining the document structure. It is using the tree-like
structure to analyze and describe the HTML (Hyper Text Markup
Language) or XML (eXtensible Markup Language) tags within the web
page.
 Web usage mining
Web usage mining also known as web log mining, aims to discover
interesting and frequent user access patterns from web browsing data that
are stored in web server logs, proxy server logs or browser logs. Web
usage mining is the application that uses data mining to analyze and
discover interesting patterns of user’s usage data on the web. The usage
data records the user’s behavior when the user browses or makes
transactions on the web site. It is an activity that involves the automatic
discovery of patterns from one or more Web servers.
The Web usage data includes the data from Web server access logs,
proxy server logs, browser logs, user profiles, registration data, user
3
sessions or transactions, cookies, user queries, bookmark data, mouse
clicks and scrolls, and any other data as the results of interactions.
3. Recommendation Systems:
In the WWW context, recommender systems are becoming widely
used by users and information retrieval systems to perform results of
both prefetching and recommendation. In the literature, most researchers
focus on Web usage mining that analyzes Web logs with a process of
discovering knowledge in databases. Indeed, Web sites are generating a
big amount of Web logs data that contain useful information about the
user behavior. The term “Web Usage Mining” was introduced by Cooley
et al. in 1997 when a first attempt of taxonomy of Web Mining was
done; in particular they define Web mining as the “discovery and
analysis of useful information from the World Wide Web”. It is also
defined as “the application of data mining techniques to large Web data
repositories”. By citing the definition that Cooley et al. gave in [11],
Web usage mining is the “automatic discovery of user access patterns
from Web servers” [12].
Analyzing web log files to extract useful patterns is called web usage
mining. Web usage mining approaches include clustering, association
rule mining, sequential pattern mining etc., to facilitate web page access
by users, web recommendation model is needed. The web usage mining
approaches can be applied to predict next page access.
4. Literature Review:
The importance of Web usage mining has led to a number of research
papers in the area. However, most of these papers were hindered by some
kind of limitations. Different combinations of mining techniques were
already suggested for web access recommendation:
Devanshu et al. [13] introduced new model based on Markov process
for web access prediction has drawback of high complexity due to
consideration of all access sequences throughout the prediction process.
Siripon et al. [14] introduced web access prediction model by
integrating roughest clustering with Markov model. It has major
4
drawback that lack of prediction accuracy due to approximation while
forming clusters. The possibility of an object for belonging to a cluster
can reduce the cluster tightness, which in turn affects prediction accuracy.
The sequential mining suggested in that work is all k-th order Markov
model.
F. Khalil et al. [15] has proposed a new framework for predicting the
next web page access “Modelling and predicting web page accesses using
Markov Processes”. F. Khalil has used the Markov model for web
predict-ion. If the Markov model is not able to predict the next page then
the association rule are used to predict the next web page.
Antonio Maratea and Alfredo Petrosino, [16] Personalized Web page
recommendation is strictly restricted by the nature of web logs, the
intrinsic complexity of the problem and the higher efficiency needs.
When handled by existing Web usage mining methods, because of the
existence of an large number of meaningful clusters and profiles for
visitors of a usually highly rated Website, the model-based or distancebased techniques are likely to create very strong and simple assumptions
or, on the other hand, to turn out to be highly complex and slow. The
author designed a heuristic majority intelligence technique, which
effortlessly adjusts to changing navigational patterns; with the low cost
explicitly individuate them ahead of navigation. The proposed technique
imitates human behavior in an unidentified environment in occurrence of
several individuals working in parallel and it has the ability to predict
with better accuracy and in real time the next page group visited by a
user. This Technique has been checked on real data from users who
browse a popular Website of common content. Average accuracy on test
sets is better on a 17 class problem and, most importantly, it continues to
be steady as the Web navigation goes on.
V.V.R.Maheswara Rao and V. Valli Kumari, [17] authors of this paper
introduce a new approach to predict users browsing behavior at two
levels to meet the nature of the navigation. One is category stage and the
other is web page stage. In stage one is to predict category. The
unnecessary categories can be excluded. The scope of calculation is
massively reduced. Next, using pruned Markov models using higher
order in the level two to predict the users browsing page is more
5
effectively and high operational performance. The results of experiment
prove the low state complexity and predictive power is well in both
stages.
A.Anitha, [18] introduced a new approach for next page access
prediction. Its use a combined approach of integrating Markov model and
a proposed model which find out highly homogeneous access patterns by
pair wise nearest neighbor based clustering. The resultant patterns are
highly relevant, and the size data set that is utilized for sequential mining
process is highly reduced. The proposed method resulted in good
prediction accuracy with less state space complexity. The drawback of
this work is, loosely connected access sequences are not considered for
mining process. Hence, it is suggested to extend this work by considering
noncontiguous access sequences also.
M.Jalali et al. [19] developed a recommendation system called
WebPUM, an online prediction using Web usage mining system and
propose a novel approach for classifying user navigation patterns to
predict users’ future intentions. The approach is based on the new graph
partitioning algorithm to model user navigation patterns for the
navigation patterns mining phase. Furthermore, longest common
subsequence algorithm is used for classifying current user activities to
predict user next movement. The proposed system has been tested on CTI
and MSNBC datasets. The results show an improvement in the quality of
recommendations. Furthermore, experiments on scalability prove that the
size of dataset and the number of the users in dataset do not significantly
contribute to the percentage of accuracy.
B.Nigam and S.Jain [20] proposed a new way of structuring the
Markov model named as Dynamic Nested Markov model for modeling
the user web navigation sessions. Dynamic Nested Markov model uses
the nesting concept; the higher-order Markov model is nested inside the
lower-order Markov model. Through this nesting, the second-order
Markov model is accommodated inside the first-order Markov model. In
Dynamic Nested Markov model, all the advantages of lower-order model
and higher-order model are achieved in one model. In this model focus is
on time complexity and coverage of the prediction state. Result shows
6
that the high coverage has achieved and time complexity has been
reduced.
A.Anitha and N.Krishnan [21], Authors focuses on providing
recommendations to learners as well as web masters to improve overall
effectiveness of web based teaching and learning. This work deals with
analysis of web log data and development of recommendation framework
using web usage mining techniques like upper approximation based
rough set clustering using k nearest neighbors, dynamic support pruned
all k-th order Markov model and all k-th order association rule mining by
dynamic frequent (k+1) item set generation using Apriori. The goal of
this integrated approach is to make accurate recommendations for
learning management systems with reduced state space complexity.
5. Conclusion:
World Wide Web is growing rapidly, and to facilitate web browsing
which help user in his surfing session, and to engage users of a website at
an early stage of surfing, a system for web access recommendation is
essential. So it is necessary to study the user web navigation behavior to
improve the quality of web services, offered to the web user. Analysis of
user web navigation behavior is achieved through modeling web
navigation history.
Many approaches were introduced to do this task most of them are
based on “Markov model” which is the widest one was used to model the
user web navigation sessions. Lower-order Markov model provides high
coverage, but with low accuracy. Higher-order Markov model give low
coverage but high accuracy with more time complexity.
6. Future Work:
Because of the drawback of current web access models such as high
complexity, less accuracy, and contradictory predictions and so on, it’s
necessary to enhance web pages recommendation approach to treat this
weakness by making improvements which result high recommendations
accuracy, low complexity and to eliminate current approaches
disadvantages.
7
7. Reference:
P. Resnick, H. R. Varian, “Recommender Systems”,
Communications of the ACM, VOL 40, No.3, pp. 56-58, March
1997.
[2] P. Burke, “Hybrid Recommender Systems: Survey and
Experiments”, User Modeling and User-Adapted Interaction ,pp.
331-370, 2002.
[3] X. Fu, J. Budzik, K. J. Hammond, “Mining Navigation History for
Recommendation”, In Intelligent User Interfaces, pp. 106–112,
2000.
[4] W. Lin, S.A. Alvarez, C. Ruiz, “Collaborative recommendation via
adaptive association rule mining”, In Proceedings of the Web
Mining for E-Commerce Workshop (WebKDD'2000), Boston,
August 2000.
[5] Y. H. Wu, Y. C. Chen, A. L. P. Chen, “Enabling Personalized
Recommendation on the Web based on User Interests and
Behaviors”, In 11th International Workshop on research Issues in
Data Engineering, 2001.
[6] M. Deshpande, G. Karypis, “Item-Based Top-N Recommendation
Algorithms”, ACM Transactions on Information Systems, VOL.
22, NO. 1, p. 143-177, January 2004.
[7] J. L. Herlocker, J. A. Konstan, A. Borchers, J. Riedl, “ An
Algorithmic
Framework
for
Performing
Collaborative
Filtering”, In SIGIR 99: Proceedings of the 22nd Annual
International ACM SIGIR Conference on Research and
Development in Information Retrieval, pp. 230-237, 1999.
[8] B. Mobasher, “Web Usage Mining and Personalization”, In
Practical Handbook of Internet Computing, Munindar, P. Singh
(ed.), CRC Press, 2005.
[9] M. Nakagawa, B. Mobasher, “A Hybrid Web Personalization
Model Based on Site Connectivity”, In The Fifth International
WEBKDD Workshop: Web mining as a Premise to Effective and
Intelligent Web Applications, pp. 59 – 70, 2003.
[10] J. Vellingiri and S.Chenthur Pandian, “A Survey on Web Usage
Mining”, Global Journal of Computer Science and Technology,
VOL. 11, Issue 4, Version 1.0, USA, March 2011.
[1]
8
[11] R. Cooley, J. Srivastava, and B. Mobasher, “Web mining:
Information and pattern discovery on the world wide web”, In 9th
IEEE International Conference on Tools with Artificial Intelligence
(ICTAI’97), November 1997.
[12] M. Géry and H. Haddad, “Evaluation of Web Usage Mining
Approaches for User’s Next Request Prediction”, WIDM '03
Proceedings of the 5th ACM international workshop on Web
information and data management, New York, NY, USA, pp.7481, 2003.
[13] D. Dhyani, S. S. Bhowmick, and W. K. Ng, “Modelling and
predicting web page accesses using Markov Processes”, IEEE,
Computer Society, 2003.
[14] S. chimphlee, N. Salim, M. S. B. Ngadiman, W. chimphlee, and S.
srinoy, “Rough Sets Clustering and Markov Model for Web Access
Prediction”, Proceedings of post graduate annual seminar, pp. 470474, 2006.
[15] F. Khalil, J. Li, and H. Wang, “A framework of combining Markov
model with association rules for predicting web page accesses”,
Proc. Fifth Australasian Data Mining Conference (AusDM2006),
volume 61, pp 177–184, 2006.
[16] A. Maratea and A. Petrosino, “An Heuristic Approach to Page
Recommendation in Web Usage Mining”, Ninth International
Conference on Intelligent Systems Design and Applications, pp.
1043-1048, 2009.
[17] V. V. R. M. Rao and V. V. Kumari, “An Efficient Hybrid
Successive Markov Model for Predicting Web User Usage
Behavior using Web Usage Mining”, International Journal of Data
Engineering (IJDE), VOL. 1, Issue (5), pp.43-62, 2011.
[18] A. Anitha, “A New Web Usage Mining Approach for Next Page
Access Prediction”, International Journal of Computer
Applications VOL. 8, No.11, pp.7-10, October 2010.
[19] M. Jalali, N. Mustapha, Md. N. Sulaiman and A. Mamat,
“WebPUM: A Web-based recommendation system to predict user
future movements”, Expert Systems with Applications, VOL. 37,
Issue 10, pp. 6201–6212 , 2010.
[20] B. Nigam and S. Jain, “Generating a New Model for Predicting the
Next Accessed Web Page in Web Usage Mining”, Third
9
International Conference on Emerging Trends in Engineering and
Technology, India, Goa, pp.485-490, 2010.
[21] A. Anitha and N. Krishnan, “A Web Usage Mining based
Recommendation Model for Learning Management Systems”,
Computational Intelligence and Computing Research (ICCIC)
IEEE International Conference, 2010.
10
Download