Building User Profile based on Concept and Relation for Web

advertisement

2012 International Conference on Innovation and Information Management (ICIIM 2012)

IPCSIT vol. 36 (2012) © (2012) IACSIT Press, Singapore

Building User Profile based on Concept and Relation for Web

Personalized Services

Jie Yu

+

, Fangfang Liu and Haihong Zhao

School of Computer Engineering and Science, Shanghai University, China, 200072

Abstract.

How to capture and represent user interest is a key issue in personalized services for Web information seeking. This paper presents a method that builds and updates user profile based on semantics and browsing sequence. First, user profile is composed of concepts and relations, which can ensure the semantic representation of user interest. Secondly, user’s browsing sequence of each Web page in a session is taken into account when the life time of concept and relation are computed. Thirdly, a memory model from cognitive psychology is introduced in updating concepts and relations in user profile after each session is finished, which ensures the dynamics of user profile. Experimental results indicate that this method is valid and effective in building and updating user profile. It can be seen that it has a brilliant perspective in the field of personalized services for Web information seeking.

Keywords:

user profile; information seeking; memory capacity model

1.

Introduction

Web personalized services alleviate the burden of information overload by gathering information which meets user’s needs. For example, intelligent browsing and personalized search are widely used in fields of e-

Commerce, adaptive Website, e-Learning, etc. An essential of Web personalized services is how to build user profile, which involves the information and preference of user and has great impact on the performance of Web personalized services.

In recent years, how to build user profile has received much attention as a means of improving qualification of Web personalized services. Some researchers use static profiling technique which analyzes user static and predictable characteristics. Li constructs user profile with the technique of information fusion[1]. Lu uses fuzzy set based clustering algorithm to generate user profile[2]. Although being easy to implement, the profile become invalid after the user’s interest changes due to lacks of dynamics. Some approaches obtain user’s interests by analyzing URLs that user has browsed, like the work by Mobasher[3].

These profiles lack semantic information which makes it difficult to accurately represent user’s interests. In addition, it is unable to obtain user’s dynamic interests and will affect the qualification of personalized recommendation. There are also some methods that put emphasis on representing semantics of user’s interest.

Qiu constructs a user profile based on his/her past click history and provides a ranking mechanism that identifies the user’s interests[4]. The work by Sieg et al. builds ontological representation of user profile[5].

Wemate uses Vector Space Model to represent user’s interest based on content of pages[6]. Although the semantic information involved in these user profile models make them adequately reflect the content of the user’s interested Web pages, how to evolve the model and keep its adaptivity to make it represent the user’s

+

Corresponding author

E-mail address : jieyu@shu.edu.cn, ffliu@shu.edu.cn, zhaohaihong@shu.edu.cn.

165

interests by limited topics is still a problem. In [7] Singh introduces a system used for news filtering in which the user’s interests are modeled by interest hierarchy based on explicit user feedback. Unfortunately, studies have shown that most users are reluctant to provide any explicit feedback[8]. From the discussion above, we can see that existing research work provides various means of extracting and representing user’s interests, but it is difficult to accurately capture user’s real-time interests and will affect the qualification of personalized recommendation due to lacks of semantics, dynamics and adaptivity.

Aiming at solving these problems, this paper presents a novel method for building and updating user profile, UP-CR (User Profile based on Concept and Relation) to represent user’s real-time preference for

Web personalized services. UP-CR is semantically composed of concepts and relations, which is extracted from user’s browsing contents. User’s browsing sequence of each Web page in a session is taken into account while coputing the life time of concept and relation. In addition, in updating UP-CR, a memory model from cognitive psychology is introduced to ensure the dynamics of UP-CR.

2.

Building User Profile

User’s Web activity is a process of information seeking that interaction continuously proceeds. Web provides information that user is interested. And user’s knowledge increases with the proceeding of user’s browsing more and more information. According to the theory of information seeking[9], increase of knowledge leads to change of user interest. Therefore, we build user profile based on his/her each interaction with Web. In this paper UP-CR(User Profile based on Concept and Relation) is presented to describes user’s real-time interest in the process of Web activity. It is represented by concepts and relation which are extracted from the Web contents that user has browsed. The dynamics of user profile is ensured by updating the attribute of each concept and relation. The definition of UP-CR is given as follows.

2.1.

Definition 1. (User Profile based on Concept and Relation, UP-CR)

A UP-CR is a two-tuple P =( N , L ) where

-- N = { s

1

, s

2

, ..., s n

} is a finite non-empty set of nodes. Each node is a two-tuple s i

=( c , life ) where

-- c defines the semantic concept

- life defines the life time of this node;

- R = { l

1

, l

2

, ..., l m

} is a finite non-empty set of link between nodes. Each link is a two-tuple l i

= (< s p

, s q

>, life ) where

-- s p

, s q

∈ S , < s p

, s q

> is a pair which defines the association relation between two nodes,

-- life defines the life time of the link in user profile.

Node in the definition above corresponds to the concept from Web contents that user browsed. Link between nodes corresponds to semantic relation. Nodes and links with high life attribute both indicate the concepts and the relation contribute a lot to the representation of user interest.

The features of UP-CR are given as follows. z Semantics. User profile is generated based on the Web contents that user has browsed. It is represented based on concept level. In addition, relations between concepts are also extracted from

Web contents. Therefore, concepts are not isolated in UP-CR, they are associated with each other which enables UP-CR full of semantics. z Dynamics. Corresponding to change of user interest, UP-CR evolves when user interacts with Web resources. The attribute life of node and link in UP-CR indicates the active degree of each concept and relation. Each time interaction happens, attributes of some nodes and links change to reflect the change of user interest. Therefore, evolution mechanism can ensure the dynamics of UP-CR.

It can be seen that there are two key issues in building UP-CR: (1) How to obtain the initial user profile, which means how to extract concepts and semantic relations when the user starts the first browsing session;

(2) How to update user profile with the proceeding of user’s browsing activities, which means how to build the evolution mechanism of concepts and semantic relations.

166

For the first issue, we adopt the concepts and relation extraction method presented in [10] which extracts concepts from Web contents based on support value. In addition, we take user’s browsing sequence into account. First, life time of concept in each browsing session is computed. Suppose that in the session user browses n Web pages, and f is the number of Web page that concept appears, then concept’s initial life time sl

_

i in this page is computed as the following formula. sl_i = f / n (1)

According to the theory of Information Seeking, user’s browsing activity is a process that user’s requirement is more and more clear. Therefore, user’s browsing contents in the latter part of his/her session can better express his/her interest in this session. Therefore, the node life time is then adjusted based on the browsing position of Web page that it appears with the following formula.

⋅ n

j

n j

(2) where sl denotes adjusted node life time, j denotes browsing position that Web pages involving the concept.

It can be seen that the concepts shown in the latter phase of browsing process play more important role in representing user interest. Based on the computed value, concepts whose values are bigger than the threshold can be selected for UP-CR.

In UP-CR, semantic relation between concepts can be extracted based on Pointwise Mutual Information

(PMI)[11]. The life time of relation between concept c i

and c j

can be computed by the following formula: rl i c c

= log(

* (

i

c j sf c sf c

)

) log

N

(3) where sf ( c i

∩ c j

) means the joint page frequency of the c i

and c j

. logN is a normalization factor to ensure the value to the range [0, 1]. Then being similar to adjusting concept life time, adjusted life time rl ( c i

, c j

) of semantic relation is computed by the following formula.

rl

(

c

i

,

c

j

) =

rl

_

i

(

c

i

,

c

j

) ⋅ n

q

q n

(4) where q denotes browsing position that Web pages involving both concept c i

and c j

. And based on the computed values, relations whose values are bigger than the threshold can be selected for UP-CR. Thus, the initial UP-CR file for user’s first browsing session is built.

3.

Updating User Profile

Since user profile reflects user’s interest which can be obtained from user’s browsing history. Therefore, with user’s browsing more Web pages, UP-CR should be updated based on the new browsing contents. After a new browsing session is finished, corresponding session semantics can be built with the same method that builds initial UP-CR file. Then updating UP-CR focuses on how to merge the new session semantics file into previous UP-CR while updating existing nodes and links.

3.1.

Updating the Life Time of Nodes and Links in UP-CR

In this paper, updating of UP-CR is based on updating life time of node and link. The new nodes and links that have just joined into UP-CR have a relatively high life time, and the ones that will be deleted from

UP-CR file have a low life time. Updating life time is designed based on the following principles: z The nodes and links which represent the recently frequently browsed contents should have a high life time. And the contents which have been browsed by user but have not been browsed recently are not involved in user profile.

167

z A node or link that user is interested will stay in UP-CR longer than the ones that can not represent the user preference. In other words, this kind of node and link should have a high life time. z Life time is the basis of updating UP-CR. Life time of one node or link being under the threshold α indicates that it is not suitable for representing user’s interest and will be deleted from UP-CR.

In this paper, after obtaining the new session semantics file, first we will check the nodes both showing up in existing UP-CR file and the new session semantics file. Obviously, this kind of node should be maintained in the updated UP-CR file, because it indicates that user is still interested in the concept-related contents in the new session. Since the life attributes of nodes in the new session semantics file and the

life

up life attribute value, up denote the life attribute value in the existing UP-CR file, and s attribute value in the new session semantics file, then life ' up

=

( life up

+

life s

)

2

denote the life

(5)

Besides the kind of node discussed above, there is the other kind of nodes that have been in existing UP-

CR but are not involved in the new session semantics file. Although they don’t show up in user’s new browsing session, it doesn’t indicate that they are irrelevant to user’s interest. In addition, the functionality of user profile is to reflect user’s recent interest. Therefore, not arising in one session can not indicate that this node should not be involved in UP-CR. On the other side, node life time should be adjusted to reflect the change of browsing contents in this session. To describe this situation, we introduce the memory capacity model SIMPLE (Scale Invariant Memory, Perception, and Learning) [12] to this paper. The probability of forgetting an item is used as the adjustment factor for nodes in existing UP-CR that have not been involved in new session semantics file. According to SIMPLE, the probability of recalling a given item i is related to the psychological distance between item i and item j before it. And the computation of this probability p is given by p

=

e c ⋅ dis

(6) where dis is psychological distance between item i and item j , c is the adjustment factor. Psychological distance between two items includes temporal and spatial distance.

In this paper, as an extension, psychological distance is computed based on the follwing two aspects. z Semantic difference degree between existing UP-CR file and the new session semantics file.

Obviously, for all the nodes that arising in new session semantics file but not being involved in the existing UP-CR, the semantic difference degrees are the same. In this paper, we compute the difference degree based on COSINE value as following.

sd

= 1 −

cos ine

(7) z Temporal difference degree between last arising of the node to the updating moment. Temporal difference degree is computed based on the number of sessions between its last arising to the updating moment(including its last arising session). td

=

( n

i ) /( n

− 1

)

(8) where n is the number of total sessions for user’s browsing activities. For example, in user’s whole browsing process, if a concept arises in the 2 nd , 3 rd , 6 th , 9 th , 10 th sessions, then when updating UP-CR for 16 th session, temporal distance is computed as (16-10)/(16-1)=0.4

Obviously, although there may be many nodes that have the same semantic difference degree between two files, they probably don’t have the same temporal distance.

168

Taking the above two aspects into account, psychological distance dis for a node is computed as the weighted sum of semantic difference degree and temporal difference degree.

dis

=

w

sd

+

(

1 −

w

)

td

(9)

Then the probability of recalling node i can be computed, thus to obtain the adjusted life time of the node which is involved in the existing UP-CR file but not arising in the new session semantics file.

life

' up

= (

1 −

p

) ∗

life

up (10)

It can be seen that in this case, updated life time of a node decreases. If this node does not arise in the next several sessions, its life time will decrease greatly, which will make it be deleted from UP-CR.

Being similar to nodes in UP-CR, links in existing UP-CR file are also divided into the two kinds: z The ones arising both in the new session semantics file and in existing UP-CR file; z The ones not arising in the new session semantics file but arising in existing UP-CR file.

For these links, updating method which is similar to nodes is applied.

Fig.1. Adding new nodes and links to the existing UP-CR file

3.2.

Adding/Deleting Nodes and Links to/from Existing UP-CR

When a new browsing session is finished, some new nodes and links will arise in the new session semantics file. Obviously, they are greatly relevant to user’s interest. Therefore, they will be added to the existing UP-CR file. The life times of nodes and links in new session semantics file are also introduced to

UP-CR file. Fig.1 illustrates how to add new nodes and links to the existing UP-CR file.

With a user browsing more and more Web contents, his/her interest will probably change, which means that the nodes and links representing his/her previous interest in the existing UP-CR file should be deleted.

The principle of deleting nodes and links in this paper is that if in several neighboring sessions a concept still has not shown up, then it and its links with other nodes can not represent user’s interest.

In this paper, when a new session semantics file is generated, the life time of nodes and links in existing

UP-CR file will be updated, and some nodes and links will be added to the existing UP-CR according to the method presented above. Then n session semantics files of recent browsing sessions(including the newly generated one) will be checked. If a node has not arisen in the n session semantics files, then it will be deleted from the existing UP-CR file. All the links related to it will be deleted too.

There is another case that a node or a link will be deleted from UP-CR. If its life time is under the deletion threshold, then it will not be reserved in the UP-CR file.

It can be seen that the number of nodes and links in UP-CR won’t be very big. On one hand, when building session semantics file, node and link are selected with relatively high life time values. On the other

169

hand, some nodes and links will be deleted when updating existing UP-CR file. Therefore, the scale of nodes and links can be controlled.

4.

Experimental Results

In order to demonstrate the building and updating method of user profile proposed in this paper, we traced a user’s browsing contents in academic database of Web of Science. First, according to the user’s browsing sequence and contents, a user profile for a session that he read five papers is built, which is illustrated in Fig.2. In this session, the query key word is “collaborative filtering” and the user profile is built based on the abstracts of these papers. The concept threshold value is set at 0.2, and the relation threshold value is set at 0.

Fig.2. UP-CR of the 1 st session

Then the user browsed another two sessions in which he read four papers respectively. Correspondingly, user profile files are updated which are illustrated in Fig.3. It can be seen that in the updating process, the number of node and relation does not increase fast. And from the semantics of nodes, it can be seen that user interest is around collaborative filtering , user interest and personalized search , which can demonstrate the validity of the building and updating method of user profile presented in this paper.

(a). Updated UP-CR after the 2 nd session

170

(b). Updated UP-CR after the 3 rd session

Fig.3. Updated UP-CR after the 2 nd and 3 rd session

5.

Summary

How to build and update user profile is catching more and more attention in the field of personalized services. This paper presents a building and updating method for user profile based on concept and relation.

In building user profile, the browsing sequence is taken into account which can ensure the accuracy of obtaining user interest. In addition, a memory capacity model is introduced to updating user profile to capture the dynamic change of user interest. Experimental results demonstrate the validity of the method proposed in this paper. It can be seen that it has a brilliant perspective in the field of personalized services.

6.

Acknowledgement

Research work is supported by the National Science Foundation of China (grants 60803143 and

61003249), Shanghai Leading Academic Discipline Project (project no. J50103), the Key Basic Research

Program of Shanghai (project no. 09JC1406200).

7.

References

[1] X. Li, S. K.Chang. A Personalized E-Learning System Based on User Profile Constructed Using Information

Fusion. In Proceedings of the 11th International Conference on Distributed Multimedia Systems, Banff, Canada,

2005, pp.109-114.

[2] F. Lu, X. Li, Q. T. Liu. Research on Personalized E-Learning System Using Fuzzy Set Based Clustering

Algorithm. Lecture Notes in Computer Science. Springer Berlin, Heidelberg 2007.

[3] B.Mobasher, H.Dai, T. Luo, Y.Q. Sun. Integrating Web Usage and Content Mining for More Effective

Personalization. In Proceedings of the 1st International Conference on Electronic Commerce and Web

Technologies, London, UK, 2000, pp. 165 - 176.

[4] F. Qiu, J. Cho. Automatic Identification of User Interest for Personalized Search. In Proceedings of the 15th international conference on World Wide Web, Edinburgh Scotland, 2006, pp. 727 – 736.

[5] A. Sieg, B. Mobasher, R.Burke. Ontological User Profiles for Representing Context in Web Search. In

Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent

Technology, Silicon Valley, USA, 2007 pp. 91- 94.

[6] L. Chen, K. Sycara. WebMate: Personal Agent for Browsing and Searching. In Proceedings of the 2nd

International Conference on Autonomous Agents, Minneapolis, USA, 1998, pp. 132-139.

[7] S. Singh, M. Sarabdeep, J. Duffy. An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy.

In Grove, Andrew, Eds. Proceedings 69th Annual Meeting of the American Society for Information Science and

Technology, Austin, USA, 2006.

171

[8] J. Carroll, M. Rosson. The Paradox of the Active User. Interfacing Thought: Cognitive Aspects of Human-

Computer Interaction, 1987, pp.80 – 111.

[9] P. Hansen, K.Järvelin. The information seeking and retrieval process at the Swedish patent and registration of fi ce.

Moving from lab-based to real life work-task environment. Proceedings of the ACM SIGIR 2000 Workshop on

Patent Retrieval, 2000, pp.44–53.

[10] J. Yu, et al. Building Search Context with Sliding Window for Information Seeking. Proceedings of 2011 IEEE

3rd International Conference on Computer Research and Development, 274-277.

[11] D. Bollegala, et al. Measuring Semantic Similarity between Words Using Web Search Engines, In Pro-ceedings of the 16th international conference on World Wide Web, 2007, pp.757–786.

[12] I. Neath, G. D. A.Brown. SIMPLE: Further Applications of a Local Distinctiveness Model of Memory.

Psychology of Learning and Motivation, Vol.46 . pp. 201-243. ISSN 0079-7421.

172

Download