A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15 Data & Knowledge Engineering, Vol 69, No. 4, pp. 371-382, 2010. 南台科技大學 資訊工程系 Outline 1 Introduction 2 Definitions and background 3 Prediction/recommendation model 4 WST utilization – recommendation/prediction method 5 Evaluation 6 Definitions and background 2 1. Introduction We consider the problem of web page usage prediction in a web site by modeling users’ navigation history and web page content with weighted suffix trees. We focus to the later area of web data mining that tries to exploit the navigational traces of the users in order to extract knowledge about their preferences and their behavior. 3 1. Introduction We propose two novel methods for modeling the user navigation history. The first method,exploits knowledge extracted only from user access sequences from the web server log file. The second method enhances the first one by utilizing web page content during the phase of access pattern extraction. 4 2. Definitions and background 5 3. Prediction/recommendation 6 3. Prediction/recommendation WAS maintenance Either we program properly the web server to store each WAS in separate repository or we can program an extraction process from the log files that is executed at the beginning of the preprocessing procedure. Assume for the sake of description that there are N sequences that form a set S={WAS1,WAS2,….WASN} 7 3. Prediction/recommendation WAS clustering Our decision to use k-windows as a clustering method was driven by a variety of reason, such as the enhanced quality of the produced clusters and its inherent parallel nature (a) Sequential movements M2, M3, M4 of initial window M1. (b) Sequential enlargements E1, E2 of window M4. 8 3. Prediction/recommendation (a) W1 and W2 satisfy the similarity condition and W1 is deleted. (b) W3 and W4 satisfy the merge operation and are considered to belong to the same cluster. (c) W5 and W6 have a small overlapment and capture two dierent clusters. 9 3. Prediction/recommendation An example of the application of the k-windows algorithm. 10 3. Prediction/recommendation WAS clustering exploiting web page content Direct sequence alignment (DSA) In the alignment (global or local) of a pair the scoring function of aligning two characters/web pages is a combination of the importance label of each page and the similarity metric between them. 1* cos(TPi, TPj ), r ( Pi ) r ( Pj ) 1 Score( Pi, Pj ) * cos(TPi , TPj ), r ( Pi )! r ( Pj ) 2 (r ( Pi ) U or r ( Pj ) U ) and r ( Pi )! r ( Pj ) 1, 11 3. Prediction/recommendation Sequence alignment with clustering preprocess (SACP) Another way to incorporate the content of web pages into the sequence alignment algorithm is to perform a clustering by content of the web pages. 1, Pi , Pj in the same cluster Score( Pi, Pj ) 0, Pi , Pj not in the same cluster&r(Pi),r(P j)! U 1, r(Pi) U or r(Pj) U 12 3. Prediction/recommendation WAS cluster representation When the WAS clustering procedure is over each one of the clusters is expressed as a weighted sequence. As an alternative someone could possibly use the approach of progressive or iterative pairwise alignment in order to produce the multiple sequence alignment. 13 4. WST utilization – recommendation/prediction method The recommendation/prediction algorithm works as follows: when a new user is arrived in the system, he is assigned to the root of the generalized weighted suffix tree (gWST). Weighted suffix tree navigation 14 4. WST utilization – recommendation/prediction method We have a sample run of the recommendation algorithm. Recommendation method run. Numbers in the nodes express their weight. 15 5. Evaluation Evaluation of access based method Comparing our experimental results with “A web page prediction model based on click-stream tree representation of user behavior” 16 5. Evaluation Evaluation of access and content based methods The context of the experiment was exactly the same as the evaluation as described in the previous section. 17 6. Conclusions and open issues we have proposed various techniques for predicting web page usage patterns by modeling the users’ navigation history using string processing techniques, Future work includes different ways of modeling web user access patterns. 18 南台科技大學 資訊工程系