A web page usage prediction scheme using sequence indexing and clustering techniques Adviser:

advertisement
A web page usage prediction scheme using
sequence indexing and clustering
techniques
Adviser: Yu-Chiang Li
Speaker: Gung-Shian Lin
Date:2010/10/15
Data & Knowledge Engineering, Vol 69, No.
4, pp. 371-382, 2010.
南台科技大學
資訊工程系
Outline
1
Introduction
2
Definitions and background
3
Prediction/recommendation model
4
WST utilization – recommendation/prediction method
5
Evaluation
6
Definitions and background
2
1. Introduction
 We consider the problem of web page usage prediction in a
web site by modeling users’ navigation history and web page
content with weighted suffix trees.
 We focus to the later area of web data mining that tries to
exploit the navigational traces of the users in order to extract
knowledge about their preferences and their behavior.
3
1. Introduction
 We propose two novel methods for modeling the user
navigation history.
 The first method,exploits knowledge extracted only from
user access sequences from the web server log file.
 The second method enhances the first one by utilizing web
page content during the phase of access pattern extraction.
4
2. Definitions and background
5
3. Prediction/recommendation
6
3. Prediction/recommendation
 WAS maintenance
 Either we program properly the web server to store each
WAS in separate repository or we can program an
extraction process from the log files that is executed at the
beginning of the preprocessing procedure.
Assume for the sake of description that there are N sequences
that form a set S={WAS1,WAS2,….WASN}
7
3. Prediction/recommendation
 WAS clustering
 Our decision to use k-windows as a clustering method was
driven by a variety of reason, such as the enhanced quality
of the produced clusters and its inherent parallel nature
(a) Sequential movements M2, M3, M4 of initial window M1.
(b) Sequential enlargements E1, E2 of window M4.
8
3. Prediction/recommendation
(a) W1 and W2 satisfy the similarity condition and W1 is deleted.
(b) W3 and W4 satisfy the merge operation and are considered to belong to the same
cluster.
(c) W5 and W6 have a small overlapment and capture two dierent clusters.
9
3. Prediction/recommendation
An example of the application of the k-windows algorithm.
10
3. Prediction/recommendation
 WAS clustering exploiting web page content
 Direct sequence alignment (DSA)
In the alignment (global or local) of a pair the scoring function of aligning two
characters/web pages is a combination of the importance label of each page and
the similarity metric between them.
1* cos(TPi, TPj ), r ( Pi )  r ( Pj )
1

Score( Pi, Pj )   * cos(TPi , TPj ), r ( Pi )!  r ( Pj )
2
(r ( Pi )  U or r ( Pj )  U ) and r ( Pi )!  r ( Pj )
 1,
11
3. Prediction/recommendation
 Sequence alignment with clustering preprocess (SACP)
Another way to incorporate the content of web pages into the sequence
alignment algorithm is to perform a clustering by content of the web pages.
1, Pi , Pj in the same cluster

Score( Pi, Pj )  0, Pi , Pj not in the same cluster&r(Pi),r(P j)!  U
 1, r(Pi)  U or r(Pj)  U

12
3. Prediction/recommendation
 WAS cluster representation
 When the WAS clustering procedure is over each one of the
clusters is expressed as a weighted sequence.
 As an alternative someone could possibly use the approach
of progressive or iterative pairwise alignment in order to
produce the multiple sequence alignment.
13
4. WST utilization –
recommendation/prediction method
 The recommendation/prediction algorithm works as
follows: when a new user is arrived in the system, he
is assigned to the root of the generalized weighted
suffix tree (gWST).
Weighted suffix tree navigation
14
4. WST utilization –
recommendation/prediction method
 We have a sample run of the recommendation
algorithm.
Recommendation method run. Numbers in the nodes express their weight.
15
5. Evaluation
 Evaluation of access based method
 Comparing our experimental results with “A web page
prediction model based on click-stream tree representation
of user behavior”
16
5. Evaluation
 Evaluation of access and content based methods
 The context of the experiment was exactly the same as the
evaluation as described in the previous section.
17
6. Conclusions and open issues
 we have proposed various techniques for predicting
web page usage patterns by modeling the users’
navigation history using string processing techniques,
 Future work includes different ways of modeling web
user access patterns.
18
南台科技大學
資訊工程系
Download