Predicting Users` next visit using Grey Moving Probability Markov

Predicting Users’ next visit using Grey Moving Probability Markov Model 1 Predicting Users’ next visit using Grey Moving Probability Markov Model Ch. Bindu Madhuri1, Prof.J.A.Chandulal2 1Department 2Department of Information Technology, JNTUK-UCEV, Vizianagaram. of Computer Science and Engineering, GITAM University, Visakhapatnam, India. ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Abstract In the context of web based applications, WUM techniques are implemented on the data collected. This paper mainly focuses on the problem of building models to represent the past users’ behavior, which in turn predict the most likely links a user, will request when viewing a web page. WUM is specifically designed to carry out applications by analyzing the usage data. The problem of predicting the next request during a user’s navigational session has been extensively studied. In this context, Grey Moving Probability Markov Models has been proposed to model the navigational sessions and for predicting the next navigation step using different transitional probability estimation approaches. This method makes use of GRPA with Variable-Length Markov Chains to analyze the navigational behavior of user in order generate the user session clusters. The experimental results represent that the approach can improve the quality of clustering for user navigation pattern in WUM systems and results used for predicting user’s next request in the huge web sites in order to customizing a web site to the needs of specific users. Keywords: Grey Moving Probability Markov Model, Web Personalization, Web Usage Mining (WUM) -----------------------------------------------------------------------------------------------------------------------1 Introduction Nowadays a huge amount of data exists owing to the large and fast expansion in the increase of data and the multiple users, due to which internet users are facing multiple problems. Therefore, in order to afford a user with precisely vital data becomes a decisive issue in web usage based applications. This paper mainly deals with performance upgrading of web management through enlarging and utilizing Web Usage Mining (WUM) hypothesis.WUM is a method with the aim of ascertains the essential associations between web usage (log) files, which is articulated in the form of usage data, by the characteristics study of the web usage data with the techniques of DM. With the abundance of information available on the WWW, it has become increasingly necessary for users to find the desired information resources, and to track and analyze their usage patterns. The issue is extracting the useful knowledge from the web by the application of data mining techniques is referred as Web Mining. Web Mining refers to the effort of Knowledge Discovery Data (KDD) from the web. It can be defined as the process of applying data mining techniques to extract useful knowledge from the huge amount of information available from the web. It is often categorized into three major areas [1, 2]: Web Content Mining, Web Structure Mining, and Web Usage Mining. Web Usage Mining (WUM) is an emergent domain in web mining, which exploits data mining techniques to discover valuable information from navigation behavior of World Wide Web (WWW) users. Web usage mining contains three main steps: Preprocessing, Knowledge extraction and Result analysis [4].Web Usage Mining [1] tries to make sense of the data generated by the web surfers’ sessions or behaviours. While the web content and structure mining utilize the data on the web, Web Usage Mining mines the log data derived from the interactions of users while interactions with the web. The web user data includes the data from the web server access logs, proxy server logs, browser logs, user profiles, registration data, user sessions, cookies, user queries, book mark data, mouse clicks. The preprocessing phase consists of [5] data cleaning, user identification, session identification, transaction identification. The pattern discovery [3] depicts upon methods and algorithms developed from several fields such as statistics, data mining, machine learning and pattern recognition. The pattern analysis is the last step in the overall web usage mining process. This analysis filters out uninteresting patterns found in the pattern discovery phase. The major application areas for web usage mining fall into five categories: Personalization, System improvement, Site modification, Business intelligence and Usage characterization [3]. Predicting Users’ next visit using Grey Moving Probability Markov Model 2 This paper mainly deals with WUM for which many DM techniques such as clustering have been applied to web server logs. We proposed Grey Moving Probability Markov Model (GMPMM), which will be used to predicting users’ future visits in order to customizing a web site to the needs of specific users. Grey Moving Probability Markov Model has been proposed to model the web user navigational sessions and for predicting the next navigational step of user using transition probabilities. The combination of GRPA with MC generates the user session clusters by the analysis of user navigational behavior. This method is advantageous only if the user navigation history is known then the moving probability model is able to predict consequent visits. The proposed methodology more specifically predicts the user's interest. The main argument proposed is the usefulness of the model for the evaluation of prediction, with the objective of discovering a web recommendation set. A strong indication of the fact that it predicts accurately is provided by a broad estimation of the proposed methods on actual data sets. This paper is organized as follows: The next section provides a Background. In section 3, a methodology for predicting users’ next visit is introduced. Section 4 represents experimental results, followed by conclusions in section 5. 2 Related Work Several WUM systems have been proposed to predict user’s preference and their navigation behavior. Lice and Keselj [7] proposed the automatic classification of web user navigation patterns and proposed a novel approach to classifying user navigation patterns and predicting users’ future requests. The approach is based on the combined mining of web server logs and the contents of the retrieved web pages. They used character N-grams to represent the contents of web pages, and combined them with user navigation patterns by building user navigation profiles composed of a collection of N-grams. In this system they can incorporate their current off-line mining system into an on-line web recommendation system to observe and calculate the degree of real users’ satisfaction on the generated recommendations, which are derived from the predicted requests by their system. Analog [8] is one of the first web usage mining (WUM) systems. It is structured according to an off-line and on-line component. The off-line component builds session clusters by analyzing past users activity recorded in server log files. In [9,10] Mobasher et al., present a system which provides dynamic recommendations, as a list of hypertext links, to users. The analysis is based on anonymous usage data combined with the structure formed by the hyperlinks of the site. SUGGEST, the WUM system proposed by Baraglia and palmerini, provides [11, 12] useful information to make easier the web user navigation and to optimize the web server performance. It adapts two levels architecture composed by an offline creation of historical knowledge and an on-line engine that understands users’ behavior. As the requests arrive at this system module it incrementally updates a graph representation of the web site based on active user sessions and classifies the active sessions using graph partitioning algorithm. Mehrdad Jalali, et al., [13] proposed a novel approach for classifying user navigation pattern by using Longest Common Subsequence (LCS) algorithm, which exploits for improving accuracy of classification. In the recent years, there has been an increasing number of research works done in web usage mining [14, 15, 16, 17, 18, and 19]. The main motivation of these studies is to get a better understanding of the reactions and motivations of users’ navigation. Some studies also apply the mining results to improve the design of websites [20], analyze the system performance and network communications or even build adaptive websites [21], can distinguish three web mining approaches that exploit web logs: Association Rules (AR) [22, 23], Frequent Sequences [24] and Frequent Generalized sequences [25, 26], algorithms for the three approaches were developed but few experiments have been done with real web log data. Grey, Haddad proposed a recommender system that predicts the users next requests based on their behavior discovered from web log data [27]. P.kumar, et al.,[28] proposed a Sequential PAM algorithm, to find the clusters and to improve the web personalization System. In this paper mainly deals with predicting the users’ next visit by using the Grey Moving Probability Markov Model in order to customizing a web site to the needs of specific users. Predicting Users’ next visit using Grey Moving Probability Markov Model 3 3 Methodology 3.1 Pattern discovery & Navigation Pattern After the data preprocessing the knowledge will be extracted by using Markov chains. We perform the navigation pattern on the derived access sessions done by the user. Then the on-line module builds active user sessions, which allow identifying pages related to the ones in the active session and predicts the further request page. 3.1.1 Preliminaries of Markov Chain Models This section gives the brief discussion of Markov Chain Models & Grey Moving Probability Markov Model. The Grey System Theory seeks only the intrinsic structure of the system with limited data. In this a problem is addressed, due to lack of information, it is difficult to determine the exact value of one or more entries in the Grey Moving Probability Matrix of a Markov Chain. Grey moving Probability Markov Model performs correct forecast of a web user’s subsequent link. The prediction is the assignment of predicting the users’ subsequent link. Let 𝑋 is a sequence of 𝑁 random variables 𝑋1 , 𝑋2 … 𝑋𝑁 representing navigational sequences generated through Chain Rule of Probability model of sequences. 𝑃𝑟(𝑋) = 𝑃𝑟(𝑋1 , … 𝑋𝐿 )= 𝑃𝑟(𝑋𝐿 |𝑋𝐿−1 , 𝑋𝐿−2 … 𝑋1 ) 𝑃𝑟(𝑋𝐿−1 |𝑋𝐿−2 … 𝑋1 ) … 𝑃𝑟(𝑋1 ) key property of a first-order Markov Chain is the probability of each depends only on the value of 𝑃𝑟(𝑋) = 𝑃𝑟(𝑋𝐿 |𝑋𝐿−1 )𝑃𝑟(𝑋𝐿−1 |𝑋𝐿−2 ) … 𝑃𝑟(𝑋2 |𝑋1 )𝑃𝑟(𝑋1 )= 𝑃𝑟(𝑋1 ) ∏𝐿𝑖=2 𝑃𝑟(𝑋𝑖 |𝑋𝑖−1 ). The notation of Markov Chain is always represented as the transitional parameters, that can be denoted by 𝑎xi−1xi where 𝑎xi−1xi = Pr(Xi = xi |Xi−1 = xi−1 ) & the probability distribution of a sequence x as N 𝑎Sx1 ∏N i=2 𝑎xi−1 xi =Pr(x1 ) ∏i=2(x i |Xi−1 ) over a sequences of length ‘N’ Where 𝑎Sx1 represents the transition from the start state. 3.1.2 Preliminaries of Grey of Moving Probability Markov Model To overcome the difficulty of the prediction of exact value, replace the uncertain entry by a Grey interval 𝑃𝑖𝑗 (⨂) based on the known value. When the moving probability matrix is Grey then the required Whitenization matrix 𝑃̃ (⊗) = [ 𝑃̃𝑖𝑗 (⊗)] is to satisfy the following: 1. 𝑃̃ (⊗) ≥ 0; i, j ∈ I; 2. ∑ jєI 𝑃𝑖𝑗 = 1, for any i∈I; For any n є T and states i, j ∈ I, 𝑃𝑖𝑗 (𝑛) = 𝑃(𝑋𝑛+1 =j/𝑋𝑛 =i ) is called the Moving Probability of the Markov Chain. Properties of Grey Moving Probability Matrix Moving Probability Matrix 𝑃(𝑛) has to satisfy the following properties: 1. 𝑃𝑖𝑗 (𝑛) ≥ 0; i, j∈ I; 2. ∑ 𝑗 ∈ I 𝑃𝑖𝑗 (𝑛)=1, for any i∈I; 3. 𝑃 (𝑛) = 𝑃𝑛 ; Transitional probability Estimation models There are different transitional probability estimation models used namely Maximum Likelihood Estimation (MLE) & A Bayesian Approach. Maximum Likelihood Estimation (MLE): The maximum likelihood estimates are the observed frequencies of the bases shown in Equation 3.1 𝐧 𝐏𝐫(𝒂) = ∑ 𝒂 𝐢 𝐧𝐢 3.1 A Bayesian Approach: Start with some prior belief for each use Laplace estimates shown in Equation 3.2 𝐧 +𝟏 P𝐫(𝒂) = ∑ (𝐧𝒂 𝐢 𝐢 +𝟏) 3.2 Predicting Users’ next visit using Grey Moving Probability Markov Model 4 𝐧 +𝐩𝒂 𝐦 𝐏𝐫(𝒂) = (∑𝒂 3.3 𝐢 𝐧𝐢 )+𝐦 ‘p𝑎 m’ is the prior probability of 𝑎 & m number of “virtual” instances. Procedure to predict the users’ next visit I. Identify the dissimilar user access in the log files Step1: Arrange the log data by host name & by time stamp. Step2: For each divergent host name, identify each user as a different user. Step3: If referrer field is available in the log file then do the following, else go to step 5. Step 4: To discover every user, merge the user identification data from steps 1 to 3.Users are identified and stop. Step 5: For identifying user, take the output of step 2. II. Extract the URLs of the visited pages III. Identify User Sessions 1. Allocate a distinctive session ID for each and every user recognized in the user identification process. 2. Define the timeout threshold. 3. Find time variation among every two successive web access log data. 4. If the discovered variation goes over the precise threshold, assign a new session ID to the next access of the user. 5. Arrange the entries by session ID. IV. Transaction identification V. Segregate the resulting set of navigational patterns into training and a testing set to perform the experiments. VI. Find the similarity between the training set (navigation patterns) to form the clusters and analyze the user behavior (using the Grey clustering algorithm). VII. Apply the transitional probabilities of the sequences with different estimation approaches. [Algorithm]: Predicting Users’ Next Visit Input: Test Sequence record Output: next navigational step. Step 1: Initially consider the test sequence record as the reference sequence pattern 𝑠𝑟𝑖 . 𝑠𝑟𝑖 = ⟨0, 𝑠𝑟𝑖 (1), 𝑠𝑟𝑖 (2), … , 𝑠𝑟𝑖 (𝑝), 0⟩ Representing this sequence as first order & Second order Markov Chain format: 1 𝑠𝑟𝑖 = ⟨0 − 𝑠𝑟𝑖 (1), 𝑠𝑟𝑖 (1) − 𝑠𝑟𝑖 (2), 𝑠𝑟𝑖 (2) − 𝑠𝑟𝑖 (3), … , 𝑠𝑟𝑖 (𝑝 − 1) − 𝑠𝑟𝑖 (𝑝), 𝑠𝑟𝑖 (𝑝) − 0⟩ Step 2: Find the Grey Relational Pattern Grade between the reference sequence and the cluster ′𝑛′, using 𝑣( 𝑠𝑟𝑖 , 𝐶𝑛 ) = ( 𝚫𝒔𝒊𝒎 = 𝐀 ∗ 𝑆𝑚𝑎𝑥 −Δ𝑠𝑖𝑚 𝜁 𝑆𝑚𝑎𝑥 −𝑆𝑚𝑖𝑛 𝑳𝑳𝑪𝑺( 𝒔𝒓𝒊 , 𝑪𝒏𝒔 𝒄𝒋 𝑴𝒂𝒙(| 𝒔𝒓𝒊 , |,| 𝑪𝒏𝒔 𝒄𝒋 ) ) |) +𝐁∗ | 𝒔𝒓𝒊 ∩ 𝑪𝒏𝒔 | | 𝒔𝒓𝒊 ∪ 𝑪𝒏𝒔 | 𝒄𝒋 𝒄𝒋 𝑪𝒏𝒔 = the comparative sequences in cluster and 𝒄𝒋 Step 3: Select the maximum Grey relational pattern grade 𝑣( 𝑠𝑟𝑖 , 𝐶𝑛 ) among them and consider the sequences in that particular cluster as the active comparative pattern. Step 4: Find the transitional probabilities of the sequences with different Estimation approaches using Equation 3.13, 3.14 & 3.15. Step 5: Calculate 𝑃(⨂) = 𝑃𝑖𝑗 ; /* probability values which occurred using different estimation approaches consider them as the Grey values */ Step 6: 𝑃 T (0) =𝑃 (⨂); `/*state 0/ 𝑃 T (1) =𝑃 T (0) 𝑃 (⨂); /state1*/ Predicting Users’ next visit using Grey Moving Probability Markov Model 5 𝑃 T (2) =𝑃 T (1) 𝑃 (⨂); /*state 2*/ 𝑃 T (n) =𝑃 T (n-1) 𝑃 (⨂); /*for n states*/ Step 7: Repeat Step 4 until the desired no. of further visit is obtained. /* will get the desired no. of further visit*/ ̃ Step 8: Generating 𝑃̃ (⨂) = [𝑃 𝑖𝑗 (⨂)]; /* choose the highest probability link value using HM will be the Grey number as its whitenization value*/ Algorithm: For predict the users’ next visit by using Grey Moving Probability Markov Model 4 EXPERIMENTAL RESULTS The use of the Markov chain (MC) relies on the assumption that the states that are likely to be visited in the next navigation depends only on what page a Web user is viewing now. Each element in the sequence matrix indicates the proportion of visits to state j at the next transition, given the present state i. A Web user not currently on the Web site is described as being in state 0. Table 1: Predicting subsequent visit using Grey Moving Probability Markov Model Table 2: Predicting subsequent visit using Grey Moving Probability Markov Model Test Training Further Sequ sequence(exclud (without using (using MLE & ence ing estimation BA ) test visit Further Visit Predicting Users’ next visit using Grey Moving Probability Markov Model 6 sequences) models) 3, 3,3,3,6,6,8,8,12,1 2,3,3 ,3,3,10,6,10,7. 3,3,6,6,6,6.7,7, 7,10,10 3,11,11,10,10, 12 4 ? 10,10,10,10,11,1 1,11 9,9,10,10,10,7, 3,6,11,4,12,7 3,6,11 13,11,6 3,6,11,14,14,1 1,14,14, 3,3,6,11,11,11, 10,11,6,7,7,7,3,7, 7,7. 7,17,8,7,7,8,13 ,11,3,1 5 Conclusion In the prediction model, Variable Length Markov model is used to predict the category of users’ next state with the transaction probability. This method uses two transition probability estimation models, Maximum Likelihood Estimation (MLE) & a Bayesian Approach (BA).In the context of Predicting the next request during a user’s navigation session has been extensively studied, higher-order Markov models have been widely used to model navigation sessions and for predicting the next navigation step. Generally prediction accuracy has been mainly evaluated with the Hit and Miss Score. Evaluating next link prediction models with the aim of finding a recommendation set. This approach reduces the online recommendation time while retaining predictive accuracy. In msnbc dataset, users who navigated between 1 to 14 pages for predicting future visit with Grey Moving probability variable length Markov model achieve a maximum of 96% success. In cti dataset, users who navigated between 3 to 7pages for predicting future visit with Grey Moving probability variable length Markov achieve a maximum of 93% success. In msweb dataset, users who navigated between 1 to 12 pages for predicting future visit with Grey Moving probability variable length Markov model achieve a maximum of 91% success. REFERENCES [1] R. Kosala, H. Blockeel, Web mining research: a survey, ACM SIGKDD Explorations Newsletter (1) (2000)1–15. [2] F. M. Facca and P. L. Lanzi, "Mining interesting knowledge from web logs: a survey," Data & Knowledge Engineering, vol. 53, pp. 225-241, 2005. Predicting Users’ next visit using Grey Moving Probability Markov Model 7 [3] J. Srivastava, R. Cooley, M. Deshpande, P.-N. Tan, Web usage mining: discovery and applications of usage patterns from Web data, SIGKDD Explorations 1 (2) (2000) 1–12. [4] Doru Tanasa and Brigitte Trouse, “Advanced Data preprocessing for intersites web usage mining”,IEEE Intelligent System, March/April 2004, pp59-65. [5] Cooley, R., B. Mobasher, and J. Srivastava, "Data Preparation for Mining World Wide Web Browsing Patterns," Knowledge and Information Systems, vol. 1, pp. 5-32, 1999. [6] M. Eirinaki, M. Vazirgiannis, Web mining for Web personalization, ACM Transactions on Internet Technology 3 (1) (2003) 1–27. [7] R. Liu, V. Keselj,” Combined mining of Web server logs and web contents for classifying user navigation patterns and predicting users’ future requests”, Data & Knowledge Engineering, Elsevier, 2007, pp.304-330. [8] Yan, W.T., Jacobsen, M., Garcia-Molina, H., Umeshwar,” From user access patterns to dynamic hypertext linking ”, Fifth International World Wide Web Conference,1996. [9] B.Mobasher, R.Cooley, J.Srivastava,”Automatic personalization based on web usage mining” Communications of the ACM, 43(8), pp.142–151, 2000. [10] M.Nakagawa,B.Mobasher,”A hybrid web personalization model based on site connectivity”,WebKDD,pp. 59-70,2003. [11]R.Baraglia,F.Silvestri,”Dynamic personalization of Web Sites Without User Intervention”,Communication of the ACM, 2007,pp. 63-67. [12]R.Baraglia, F.Silvestri,” An online recommender system for large Web sites”, Web Intelligence,IEEE/WIC/ACM, pp. 20–24.2004. [13] Mehrdad Jalali 1,Norwati Mustapha 2, Ali Mamat 2, Md. Nasir B Sulaiman “A new classification model for online predicting users’ future movements” IEEE 2008. [14] M.-S. Chen, J. S. Park, and P. S. Yu. Data mining for path traversal patterns in a web environment. In16th International Conference on Distributed ComputingSystems, pages 385–392, May 1996. [15] D. Cheung, B. Kao, and J. Lee. Discovering user access patterns on the worldwide web. In 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’97), February 1997. [16] M. Spiliopoulou and L. C. Faulstich. Wum: A tool for web utilization analysis. In EDBT Workshop WebDB’98, Valencia, Spain, March 1998. [17] M. Baumgarten, A. G. Bchner, S. S. Anand, M. D.Mulvenna, and J. G. Hughes. User-driven navigationpattern discovery from internet data. In International ACM Workshop on Web Usage Analysis and User Profiling (WebKDD’99), pages 74–91, 1999. [18] B. Berendt. Web usage mining, site semantics, and the support of navigation. In Workshop Web Mining for E-Commerce - Challenges and Opportunities, Boston,MA, August 2000. [19] M. Hansen and E. Shriver. Using navigation data to improve IR functions in the context of web search. In CIKM, pages 135–142, 2001. [20] F. Masseglia, P. Poncelet, and M. Teisseire. Using data mining techniques on web access logs to dynamically improve hypertext structure. ACM SigWeb Letters, 8(3):13–19, October 1999. [21] T. W. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From User Access Patterns to Dynamic Hypertext Linking. In 5th World Wide Web Conference (WWW’96), Paris, France, May 1996. Predicting Users’ next visit using Grey Moving Probability Markov Model 8 [22] E. Frias-Martinez and V. Karamcheti. A prediction model for user access sequences. In WEBKDD Workshop: Web Mining for Usage Patterns and User Profiles, July 2002. [23] J. Bollen, H. V. de Sompel, and L. M. Rocha. Mining associative relations from website logs and their application to context-dependent retrieval using spreading activation. In Workshop on Organizing Web Space (WOWS), Berkeley, California, August 1999. [24] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Using sequential and non-sequential patterns for predictive web usage mining tasks. In Proceedings of the IEEE International Conference on Data Mining (ICDM’2002), Maebashi City, Japan, December 2002. [25] H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In KnowledgeDiscovery and Data Mining, pages 146–151, 1996. [26] W. Gaul and L. Schmidt-Thieme. Mining web navigation path fragments. In Workshop on Web Mining for E-Commerce – Challenges and Opportunities, pages 319–322, Boston, MA, August 2000. [27] Mathias G´ ery, Hatem Haddad. “Evaluation of Web Usage Mining Approaches for User’s Next Request Prediction”, WIDM’03, November 7–8, 2003, New Orleans, Louisiana, USA. [28] Sifeng Liu and Yi Lin,”Grey Information Theory and Practical Applications”Springer Science+BusinessMedia,springer.com. [29] Bindu Madhuri.Ch and J.A.Chandulal.: “Analysis of the Navigation Behavior of the Users’ using Grey Relational Pattern Analysis with Markov Chains,” International Journal of Engineering Science and Technology ,Vol. 2(10), 2010, 5402-5412. [30] Bindu Madhuri.Ch and J.A.Chandulal.: “Analysis of Users’ Web Navigation Behavior using GRPA with Variable Length Markov Chains,” International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.1, No.2, March 2011.

Predicting Users` next visit using Grey Moving Probability Markov

Related documents

Products

Support

Predicting Users` next visit using Grey Moving Probability Markov

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib