A Survey on Web Recommendation Systems Based on Web Usage Mining N. M. Abo El-Yazeed Demonstrator at High Institute for Management and Computer, Port Said University, Egypt no3man_mohamed@himc.psu.edu.eg Abstract: Web usage mining has become the subject of exhaustive research, as its potential for Web-based personalized services, prediction of user near future intentions, adaptive Web sites, and customer profiling are recognized. Recently, a variety of recommendation systems to predict user future movements through Web usage mining have been proposed. However, the quality of recommendations in the current systems to predict user future requests in a particular Web site is below satisfaction. Different efforts have been made to address the problem of information overload on the Internet. Web recommendation systems based on web usage mining try to mine users’ behavior patterns from web access logs, and recommend pages to the online user by matching the user’s browsing behavior with the mined historical behavior patterns. Keywords: Web Mining, Web Usage Mining, Web-based recommendation systems, Navigation pattern mining, Web Log, Web Personalization. 1. Introduction: The volume of information available on the internet is increasing rapidly with the explosive growth of the World Wide Web and the advent of e-Commerce. While users are provided with more information and service options, it has become more difficult for them to find the “right” or “interesting” information, the problem commonly known as information overload. 1 Recommender systems [1] are alternative, user-centric, promising approaches to tackle the problem of information overload by adapting the content and structure of websites to the needs of the users by taking advantage of the knowledge acquired from the analysis of the users’ access behaviors. They can be generally defined as systems that guide users toward interesting or useful objects in a large space of possible options [2]. In recent years there has been an increasing interest in applying web usage mining techniques to build web recommender systems [3,4,5]. Web usage recommender systems take web server access logs as input, and make use of data mining techniques such as association rule and clustering to extract implicit, and potentially useful navigational patterns, which are then used to provide recommendations. Web server access logs record user browsing history, which contains plenty of hidden information regarding users and their navigation. They could, therefore, be a good alternative to the explicit user rating or feedback in deriving user models. Unlike traditional techniques, which mainly recommend a set (referred to as the recommendation set) of items deemed to be of interest to the user base their decisions on user ratings on different items or other explicit feedbacks provided by the user [6,7]. These techniques discover user preferences from their implicit feedbacks, namely the web pages they have visited. Clustering and collaborative filtering approaches are ready to incorporate both binary and non-binary weights of pages, although binary weights are usually used for computing efficiency [8]. Association Rule (AR) mining [9] can lead to higher recommendation precision [8], and are easy to scale to large datasets, but how to incorporate page weight into the AR models has not been explored in previous studies. 2. Web Mining: Web mining is the application of data mining Techniques to extract knowledge from Web data, in which at least one of structure or usage (Web log) data is used in the mining process. There are three broad categories of Web mining [10]: 2 Web content mining Web content mining is the process to discover useful information from text, image, audio or video data in the web. Web content mining sometimes is called web text mining, because the text content is the most widely researched area. The technologies that are normally used in web content mining are NLP (Natural language processing) and IR (Information retrieval). Web structure mining Web structure mining operates on the Web’s hyperlink structure. Web structure mining is the process of using graph theory to analyze the node and connection structure of a web site. This graph structure can provide information about ranking or authoritativeness and enhance search results of a page through filtering. According to the type of web structural data, web structure mining can be divided into two kinds. The first kind of web structure mining is extracting patterns from hyperlinks in the web. A hyperlink is a structural component that connects the web page to a different location. The other kind of the web structure mining is mining the document structure. It is using the tree-like structure to analyze and describe the HTML (Hyper Text Markup Language) or XML (eXtensible Markup Language) tags within the web page. Web usage mining Web usage mining also known as web log mining, aims to discover interesting and frequent user access patterns from web browsing data that are stored in web server logs, proxy server logs or browser logs. Web usage mining is the application that uses data mining to analyze and discover interesting patterns of user’s usage data on the web. The usage data records the user’s behavior when the user browses or makes transactions on the web site. It is an activity that involves the automatic discovery of patterns from one or more Web servers. The Web usage data includes the data from Web server access logs, proxy server logs, browser logs, user profiles, registration data, user 3 sessions or transactions, cookies, user queries, bookmark data, mouse clicks and scrolls, and any other data as the results of interactions. 3. Recommendation Systems: In the WWW context, recommender systems are becoming widely used by users and information retrieval systems to perform results of both prefetching and recommendation. In the literature, most researchers focus on Web usage mining that analyzes Web logs with a process of discovering knowledge in databases. Indeed, Web sites are generating a big amount of Web logs data that contain useful information about the user behavior. The term “Web Usage Mining” was introduced by Cooley et al. in 1997 when a first attempt of taxonomy of Web Mining was done; in particular they define Web mining as the “discovery and analysis of useful information from the World Wide Web”. It is also defined as “the application of data mining techniques to large Web data repositories”. By citing the definition that Cooley et al. gave in [11], Web usage mining is the “automatic discovery of user access patterns from Web servers” [12]. Analyzing web log files to extract useful patterns is called web usage mining. Web usage mining approaches include clustering, association rule mining, sequential pattern mining etc., to facilitate web page access by users, web recommendation model is needed. The web usage mining approaches can be applied to predict next page access. 4. Literature Review: The importance of Web usage mining has led to a number of research papers in the area. However, most of these papers were hindered by some kind of limitations. Different combinations of mining techniques were already suggested for web access recommendation: Devanshu et al. [13] introduced new model based on Markov process for web access prediction has drawback of high complexity due to consideration of all access sequences throughout the prediction process. Siripon et al. [14] introduced web access prediction model by integrating roughest clustering with Markov model. It has major 4 drawback that lack of prediction accuracy due to approximation while forming clusters. The possibility of an object for belonging to a cluster can reduce the cluster tightness, which in turn affects prediction accuracy. The sequential mining suggested in that work is all k-th order Markov model. F. Khalil et al. [15] has proposed a new framework for predicting the next web page access “Modelling and predicting web page accesses using Markov Processes”. F. Khalil has used the Markov model for web predict-ion. If the Markov model is not able to predict the next page then the association rule are used to predict the next web page. Antonio Maratea and Alfredo Petrosino, [16] Personalized Web page recommendation is strictly restricted by the nature of web logs, the intrinsic complexity of the problem and the higher efficiency needs. When handled by existing Web usage mining methods, because of the existence of an large number of meaningful clusters and profiles for visitors of a usually highly rated Website, the model-based or distancebased techniques are likely to create very strong and simple assumptions or, on the other hand, to turn out to be highly complex and slow. The author designed a heuristic majority intelligence technique, which effortlessly adjusts to changing navigational patterns; with the low cost explicitly individuate them ahead of navigation. The proposed technique imitates human behavior in an unidentified environment in occurrence of several individuals working in parallel and it has the ability to predict with better accuracy and in real time the next page group visited by a user. This Technique has been checked on real data from users who browse a popular Website of common content. Average accuracy on test sets is better on a 17 class problem and, most importantly, it continues to be steady as the Web navigation goes on. V.V.R.Maheswara Rao and V. Valli Kumari, [17] authors of this paper introduce a new approach to predict users browsing behavior at two levels to meet the nature of the navigation. One is category stage and the other is web page stage. In stage one is to predict category. The unnecessary categories can be excluded. The scope of calculation is massively reduced. Next, using pruned Markov models using higher order in the level two to predict the users browsing page is more 5 effectively and high operational performance. The results of experiment prove the low state complexity and predictive power is well in both stages. A.Anitha, [18] introduced a new approach for next page access prediction. Its use a combined approach of integrating Markov model and a proposed model which find out highly homogeneous access patterns by pair wise nearest neighbor based clustering. The resultant patterns are highly relevant, and the size data set that is utilized for sequential mining process is highly reduced. The proposed method resulted in good prediction accuracy with less state space complexity. The drawback of this work is, loosely connected access sequences are not considered for mining process. Hence, it is suggested to extend this work by considering noncontiguous access sequences also. M.Jalali et al. [19] developed a recommendation system called WebPUM, an online prediction using Web usage mining system and propose a novel approach for classifying user navigation patterns to predict users’ future intentions. The approach is based on the new graph partitioning algorithm to model user navigation patterns for the navigation patterns mining phase. Furthermore, longest common subsequence algorithm is used for classifying current user activities to predict user next movement. The proposed system has been tested on CTI and MSNBC datasets. The results show an improvement in the quality of recommendations. Furthermore, experiments on scalability prove that the size of dataset and the number of the users in dataset do not significantly contribute to the percentage of accuracy. B.Nigam and S.Jain [20] proposed a new way of structuring the Markov model named as Dynamic Nested Markov model for modeling the user web navigation sessions. Dynamic Nested Markov model uses the nesting concept; the higher-order Markov model is nested inside the lower-order Markov model. Through this nesting, the second-order Markov model is accommodated inside the first-order Markov model. In Dynamic Nested Markov model, all the advantages of lower-order model and higher-order model are achieved in one model. In this model focus is on time complexity and coverage of the prediction state. Result shows 6 that the high coverage has achieved and time complexity has been reduced. A.Anitha and N.Krishnan [21], Authors focuses on providing recommendations to learners as well as web masters to improve overall effectiveness of web based teaching and learning. This work deals with analysis of web log data and development of recommendation framework using web usage mining techniques like upper approximation based rough set clustering using k nearest neighbors, dynamic support pruned all k-th order Markov model and all k-th order association rule mining by dynamic frequent (k+1) item set generation using Apriori. The goal of this integrated approach is to make accurate recommendations for learning management systems with reduced state space complexity. 5. Conclusion: World Wide Web is growing rapidly, and to facilitate web browsing which help user in his surfing session, and to engage users of a website at an early stage of surfing, a system for web access recommendation is essential. So it is necessary to study the user web navigation behavior to improve the quality of web services, offered to the web user. Analysis of user web navigation behavior is achieved through modeling web navigation history. Many approaches were introduced to do this task most of them are based on “Markov model” which is the widest one was used to model the user web navigation sessions. Lower-order Markov model provides high coverage, but with low accuracy. Higher-order Markov model give low coverage but high accuracy with more time complexity. 6. Future Work: Because of the drawback of current web access models such as high complexity, less accuracy, and contradictory predictions and so on, it’s necessary to enhance web pages recommendation approach to treat this weakness by making improvements which result high recommendations accuracy, low complexity and to eliminate current approaches disadvantages. 7 7. Reference: P. Resnick, H. R. Varian, “Recommender Systems”, Communications of the ACM, VOL 40, No.3, pp. 56-58, March 1997. [2] P. Burke, “Hybrid Recommender Systems: Survey and Experiments”, User Modeling and User-Adapted Interaction ,pp. 331-370, 2002. [3] X. Fu, J. Budzik, K. J. Hammond, “Mining Navigation History for Recommendation”, In Intelligent User Interfaces, pp. 106–112, 2000. [4] W. Lin, S.A. Alvarez, C. Ruiz, “Collaborative recommendation via adaptive association rule mining”, In Proceedings of the Web Mining for E-Commerce Workshop (WebKDD'2000), Boston, August 2000. [5] Y. H. Wu, Y. C. Chen, A. L. P. Chen, “Enabling Personalized Recommendation on the Web based on User Interests and Behaviors”, In 11th International Workshop on research Issues in Data Engineering, 2001. [6] M. Deshpande, G. Karypis, “Item-Based Top-N Recommendation Algorithms”, ACM Transactions on Information Systems, VOL. 22, NO. 1, p. 143-177, January 2004. [7] J. L. Herlocker, J. A. Konstan, A. Borchers, J. Riedl, “ An Algorithmic Framework for Performing Collaborative Filtering”, In SIGIR 99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 230-237, 1999. [8] B. Mobasher, “Web Usage Mining and Personalization”, In Practical Handbook of Internet Computing, Munindar, P. Singh (ed.), CRC Press, 2005. [9] M. Nakagawa, B. Mobasher, “A Hybrid Web Personalization Model Based on Site Connectivity”, In The Fifth International WEBKDD Workshop: Web mining as a Premise to Effective and Intelligent Web Applications, pp. 59 – 70, 2003. [10] J. Vellingiri and S.Chenthur Pandian, “A Survey on Web Usage Mining”, Global Journal of Computer Science and Technology, VOL. 11, Issue 4, Version 1.0, USA, March 2011. [1] 8 [11] R. Cooley, J. Srivastava, and B. Mobasher, “Web mining: Information and pattern discovery on the world wide web”, In 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’97), November 1997. [12] M. Géry and H. Haddad, “Evaluation of Web Usage Mining Approaches for User’s Next Request Prediction”, WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management, New York, NY, USA, pp.7481, 2003. [13] D. Dhyani, S. S. Bhowmick, and W. K. Ng, “Modelling and predicting web page accesses using Markov Processes”, IEEE, Computer Society, 2003. [14] S. chimphlee, N. Salim, M. S. B. Ngadiman, W. chimphlee, and S. srinoy, “Rough Sets Clustering and Markov Model for Web Access Prediction”, Proceedings of post graduate annual seminar, pp. 470474, 2006. [15] F. Khalil, J. Li, and H. Wang, “A framework of combining Markov model with association rules for predicting web page accesses”, Proc. Fifth Australasian Data Mining Conference (AusDM2006), volume 61, pp 177–184, 2006. [16] A. Maratea and A. Petrosino, “An Heuristic Approach to Page Recommendation in Web Usage Mining”, Ninth International Conference on Intelligent Systems Design and Applications, pp. 1043-1048, 2009. [17] V. V. R. M. Rao and V. V. Kumari, “An Efficient Hybrid Successive Markov Model for Predicting Web User Usage Behavior using Web Usage Mining”, International Journal of Data Engineering (IJDE), VOL. 1, Issue (5), pp.43-62, 2011. [18] A. Anitha, “A New Web Usage Mining Approach for Next Page Access Prediction”, International Journal of Computer Applications VOL. 8, No.11, pp.7-10, October 2010. [19] M. Jalali, N. Mustapha, Md. N. Sulaiman and A. Mamat, “WebPUM: A Web-based recommendation system to predict user future movements”, Expert Systems with Applications, VOL. 37, Issue 10, pp. 6201–6212 , 2010. [20] B. Nigam and S. Jain, “Generating a New Model for Predicting the Next Accessed Web Page in Web Usage Mining”, Third 9 International Conference on Emerging Trends in Engineering and Technology, India, Goa, pp.485-490, 2010. [21] A. Anitha and N. Krishnan, “A Web Usage Mining based Recommendation Model for Learning Management Systems”, Computational Intelligence and Computing Research (ICCIC) IEEE International Conference, 2010. 10