Information Domain Modeling for Adaptive Web Systems Wenpu Xing and Ali A. Ghorbani Intelligent & Adaptive Systems (IAS) Research Group Faculty of Computer Science University of New Brunswick Fredericton, NB, Canada wenpu.xing, ghorbani@unb.ca Abstract This paper presents a Domain Modeling System, which builds a domain model framework for adaptive Web systems. It records concepts and the relationships among them and represents them as a concept network. To speed up run time searches, the system finds all related concepts by calculating the optimal paths between all pairs of concepts offline in advance. In addition, a new algorithm, Rich Maximal Frequent Sequence algorithm, is introduced in the system for discovering frequent sequence patterns among concepts. To test the Domain Modeling System, it is applied to an adaptive web system. The experiments demonstrate that the adaptive Web system is improved in the performance of accurate page recommendations and quick responses. 1. Introduction In the past few years, the World Wide Web has expanded quickly and has been permeating people’s lives. People can do many of their daily activities online, such as shopping, reading news, banking or booking a flight seat or a restaurant. The Web makes life convenient. However, with the fast growth of the Web, people are not satisfied with viewing the same content on the same web page and are not happy with easily getting lost in the hyperspace of the Web. To cater to people’s needs, Adaptive Web Systems (AWSs) are demanded to provide the exact information people need, present in the way people prefer, and guide people to a destination through an optimal path [2, 3, 5, 11]. This requires the systems to know exactly the information domain and manage it. To make the information domain much easier to manage, recording the information as conceptual units, which we call concepts, along with the associations among them, which we call relationships, [4, 10, 7, 9] is necessary. For example, in the existing AWSs, such as Interbook [2], AHA! [3], SKILL [8] and ELM-ART [11], concepts and relationships have been widely used. The systems study the concepts and relationships in their own information domains and generate dynamic pages based on the study and user information (i.e., interests, preference, goals and background). Moreover, to improve the reusability and the modifiability of AWSs, the information domain can be encapsulated as a domain model. Many AWSs, such as AHA! [3], have been built by recording the content and the navigation structure as domain models. However, there is no standard, comprehensive framework or design pattern for building them. To avoid having developers repeat the same work that others have already done for domain modeling of AWSs and promote knowledge transfer between AWSs, building a domain model framework for AWSs is paramount. To concentrate on this issue, this research works on studying the concepts and the relationships in the information domains and building a domain modeling system. The domain modeling system provides a domain model framework for AWSs with the techniques of effectively using concepts and relationships. Finally, to evaluate it, the domain modeling system is applied to an AWS, Adaptive Recommendation for Academic Scheduling (ARAS). The experiments are also shown in this paper. The rest of this paper is organized as follows. Section 2 presents the domain modeling system in detail. The experiments of evaluation are shown in Section 3. Finally, the conclusion is addressed in Section 4. 2 Domain Modeling System The Domain Modeling System (DMS) is aimed at supporting the development of AWSs. DMS builds a domain model framework for AWSs. The system focuses on two things: 1) defining a general structure to represent the information domains of AWSs and 2) providing techniques to consume the information more efficiently and quickly. The structure of the proposed DMS is shown in Figure 1. Domain Modeling System na Author Tool sets Domain Model used by Recommendation Provider request Concepts sets provides sets Pattern Miner Environment response Relationships uses concept remmendation uses sets uses Graph Generator usage data Figure 1. The Structure of the Domain Modeling System (DMS) The system consists of a data model, namely domain model, and four processors, namely author tool, pattern miner, graph generator and recommendation provider. The domain model encapsulates the information domain and describes how it is represented as concepts and relationships. The processors provide the functionalities of constructing and consuming the domain model. First of all, the author tool provides an interface for authors to input concepts and relationships to AWSs. Then new relationships are retrieved from the existing relationships or usage data and recorded in the domain model by the pattern miner. After that, the graph generator builds a concept network and finds out the optimal paths for each pair of the concepts based on the priorities of the concepts and the relationships stated in the concept network. Finally, the recommendation provider generates concept recommendations to given requests by using the optimal paths and other necessary information in the domain model. 2.1 Domain model 2.1.1 Concepts Concepts are the fundamental classifications or units of information within the system. According to the information included, concepts in DMS are divided into two categories: atomic concepts and composite concepts (as shown in Figure 2). Atomic concepts are a special kind of composite concept. They are the smallest items recorded in the domain model and do not need to be further broken down. For example, an icon, an image, or a fragment of text is an atomic concept. Composite concepts are those that consist of several other composite concepts, which can be composite concepts or atomic concepts. However, a composite concept cannot be constituted by its sub concepts. Composite Concepts * 1 Atomic Concepts Figure 2. Concept categorization The domain model represents the information domain as conceptual units, which are concepts, along with the associations among them, which are relationships. To record the concepts and the relationships in a common structure and then to facilitate information exchange between applications within different domains, a general structure, Ontology, is presented. The detailed information is described in the following subsections. 2.1.2 Relationships A relationship describes in what way (if any) two concepts are related and to what degree that relationship exists. In DMS, two kinds of relationships are considered: predefined relationships, which are those defined by the author, and discovered relationships, which are those mined by the system from the usage data. Predefined Relationships: the relationships that can be observed by studying the information domain. To state the generalization of these relationships to AWSs, the predefined relationships in DMS are further divided into two groups based on their life scopes: domain independent, which are independent from any domain specific concepts, and domain specific, which are dependent on some domain specific concepts and will not be listed until an application domain is specified. The domain independent relationships considered in DMS are listed as follows: 1. IsA(a,b): Concept a is a concept b iff a is defined as a sub concept or an instance of b. 2. Prerequisite(a,b): Concept a is a prerequisite of concept b iff accessing b requires knowing a. 3. Co-requisite(a,b): Concept a is a co-requisite of concept b iff they must be processed together. 4. Inhibition(a,b): Concept a is an inhibition of concept b iff a should not be accessed after accessing b. 5. Similarity(a,b): Concept a is similar to concept b iff their contents are similar. 6. Containment(a,b)/Member(b,a): Concept a contains concept b and b is a member of a iff b is held within container a. 7. Whole(a,b)/Part(b,a): Concept b is a part of concept a iff an occurrence of a would necessarily involve an occurrence of b, but not vice versa. 1. Association(a,b): Concept a is an association of concept b iff the presence of a in the sessions implies the presence of b. 2. MaximalFrequentSequence(a,b,c,d): The sequence (a, b, c, d) is a maximal frequent sequence iff the following three requirements are met: 1) these concepts are always presented in order in the sessions; 2) the occurrence of the sequence in the sessions is greater than the given threshold value; and, 3) the sequence is not contained in any other longer frequent sequences. 2.1.3 Ontology Ontology has become a popular word in information- and knowledge-based systems research, such as the Semantic Web [6], and has been developed in artificial intelligence to facilitate knowledge sharing and reuse. Ontologies play a key role in advanced information exchange as they provide a common understanding of a domain. In DMS, we present an ontology to represent the domain model by describing the vocabulary and structure of the information those domains contain. Figure 3 shows the structure of the ontology. The Ontology contains the concepts and the relationships and provides an interface to query, update, and create them. Each concept is represented by a set of attributes that are descriptive properties possessed by each instance of the concept. Each concept has one or more values for each of its attributes. Each relationship associates with one source concept and one target concept. Ontology 1 8. Sibling(a,b): Concept a is a sibling of concept b iff they are contained in the same concept. For example, part(a,x) and part(b,x), or member(a,y) and member(b,y) are satisfied. Concept 9. Equivalent(a,b): Concept a is equivalent to concept b iff they are essentially equal. Attribute 10. Complement(a,b): Concept a is a complement of concept b iff they are totally different and the union of them constitutes the universe. 11. Link(a,b): Concept b is a link of concept a iff a direct link from a to b exists. Discovered Relationships: the patterns among concepts observed from the usage data. They cannot be found by analyzing only the information domain. Because associations and frequent sequences of concepts are popularly used in electronic systems (i.e., e-commerce or e-learning) for providing recommendations, DMS defines the following two discovered relationships: * 1 1 1 source * * Relationship 1 target * * 1 * Value Figure 3. Ontology structure of DMS 2.2 Processors In contrast to the domain model, which provides a data structure for AWSs, the processors of DMS present functionalities of setting data, discovering useful information from the available data, and generating recommendations. Detailed information is provided in the following subsections. 2.2.1 Author Tool The author tool is a component of DMS for authors to interact with the system. By using this tool, authors can send requests for retrieving, updating, or adding the concepts and the relationships from or to the domain model. The system manipulates the concepts or the relationships according to the given requests. With the tool, authors can set all necessary concepts and the relationships between them to the domain model. However, with the exponential growth of Web systems, this task becomes not only time consuming but also challenging. To reduce the workload of the authors caused by figuring out the relationships that can be derived from available data, the system provides two relationship finders: sibling finder and similarity finder. The sibling finder discovers siblings from the relationships of containment and whole based on the definition of sibling relationship. The similarity finder discovers similar concepts by analyzing the topics contained in the concepts. Furthermore, to check the consistence and the integrality of the relationships, a relationship checker is provided based on the properties of the relationships and the implications between them. For example, sibling(a,b) implies sibling(b,a) and containment(a,b) implies member(b,a). 2.2.2 Pattern Miner The pattern miner provides two sub-miners, Association Miner and Rich Maximal Frequent Sequence Miner, to find the discovered relationships defined in DMS from usage data, respectively. 1. Association Miner: is used to discover the association relationships from the Web access logs. The miner applies the APRIORI algorithm. APRIORI was originally proposed by Agrawal in [1] in 1994 to find frequent itemsets and association rules in a transaction database. Now, it is the most basic and well-known algorithm to find frequent itemsets. The algorithm generates association rules by following the following process: at first, all frequent itemsets, whose occurrence is greater than a given threshold value of support, are incrementally discovered from the transaction database (for example, the itemsets with length k are spread from the itemsets with length k − 1); next, rules between the frequent itemsets are generated by checking the probability of the transactions containing both of the itemsets (for example, the association(a,b) holds iff the probability of the sessions containing both a and b is greater than the threshold value of confidence). The algorithm provides an efficient way to generate the candidate itemsets in a database pass by using only the itemsets found large in the previous pass. Algorithm 1 shows the algorithm. Detailed information is described in [1]. Algorithm 1 APRIORI Algorithm [1] Input: D: database Output: all frequent itemsets. Method: 1) L1 = {large 1-itemsets}; 2) for (k = 2; Lk−1 6= ∅; k + +) do begin 3) Ck = apriori-gen(Lk−1 );//New candidates 4) forall transactions t ∈ D do begin 5) Ct = subset(Ck , t);//Candidates contained in t 6) forall candidates c ∈ Ct do 7) c.count++; 8) end 9) Lk = {c ∈ Ck | c.count ≥ minsup} 10) end S 11) return k Lk ; 2. Rich Maximal Frequent Sequence Miner: is used to discover the traversal patterns, maximal frequent sequences, from Web access logs. A Frequent Sequence Tree (FSTree), a tree-like data structure, is introduced to record the frequent items and the maximal frequent sequences. The tree is constructed by following these steps. Firstly, an empty tree is defined with a root node only. Secondly, the frequent 1-item sequences are found by counting their occurrences in the sessions and added to the tree as child nodes of the root. Within each node, the occurrence of the included item and the path from the root to the current node are recorded to speed up later searches. Thirdly, for each node, M SN ode, in the newly updated level of the tree, every frequent 1-item C is considered as a potential child and a corresponding candidate frequent sequence is generated by appending C to the current path recorded in the node. Then, the candidate’s occurrence is counted by observing the sessions. If the candidate is frequent, C will be added to the tree as a child node of MSNode with the candidate sequence and its occurrence. Fourthly, step three will be continued until the construction of the tree is well done. In the finished FSTree, all maximal frequent sequences are recorded in the leaf nodes as their current path. Moreover, to speed up decision making at the cross points of similar sequences (i.e., (a, b, c, d) and (a, b, c, e)), weights are added to the sequences to identify the priority of step choices. For example, the sequence (c1 , c2 , c3 ) becomes ( c1 , wc1 c2 , c2 , wc2 c3 , c3 ), where wci ci+1 denotes the priority of the choice from ci to ci+1 . Sequences with weights are called rich frequent sequences while the algorithm that generates them is call Rich Maximal Frequent Sequence algorithm (RMFS). The pseudocode of RMFS is shown as Algorithm 2. Algorithm 2 RMFS Algorithm Input: S1 , S2 ,...,Sn :sessions. smin :minimum support threshold. Output: all rich maximal frequent sequences (RMFSs) Method: 1) F ST ree = an empty sequence tree; 2) F S1 = {frequent 1-item sequences}; 3) update(F ST ree, F S1 ); 4) for( k = 2; F Sk−1 6= ∅; k + +) do 5) F Sk−1 = {sequences with length k − 1 in ST }; 6) Ck = genPathCandidate(F Sk−1 ); 7) for i from 1 to n do 8) count(Ck , Si ); 9) F Sk = {p ∈ Ck |p.count ≥ smin } 10) update(F ST ree, F Sk ); 11) RMFSs = richPathsGen(F ST ree); 12) return RMFSs; their priorities. The weight of a concept a is propagated from its referrer concepts that are concepts linked to it. A concept gets one value proportional to its popularity (numbers of inlinks and outlinks) from each referrer concept. The popularity from the number in out of inlinks and outlinks is recorded as P(v,u) and P(v,u) , in respectively. P(v,u) is the popularity of link(v, u) calculated based on the number of inlinks from concept v to concept u and the number of inlinks of all referout ence concepts of concept v. P(v,u) is the popularity of link(v, u) calculated based on the number of outlinks of concept u and the number of outlinks of all reference concepts of concept v. in P(v,u) =P Iv,u c∈R(v) Ic out P(v,u) =P Ou c∈R(v) Oc (1) (2) where Iv,u represents the number of inlinks from concept v to concept u. I(c) represents the number of inlinks of concept c. Ou and Oc represent the number of outlinks of concept u and concept c, respectively. R(v) denotes the reference concept list of concept v. The weight of a concept is calculated by summing up the products of each referrer concept’s weight and corresponding popularity. W (u) = X in out W (v)P(v,u) P(v,u) (3) v∈B(u) 2.2.3 Graph Generator Because AWSs generate dynamic pages to the given requests on the fly, response time is a main concern. Therefore, processing as much information as possible in advance is necessary. The graph generator provides such a technique to accelerate the information query for providing recommendations by building a concept network and finding all optimal paths between each pair of concepts. The process is described as follows: • Building a concept network: Firstly, DMS builds a concept network by representing the concepts as nodes, and the relationships as links. At this step, multiple links are allowed between nodes because multiple relationships might exist between concepts. • Setting weights to the concepts and the relationships: Secondly, to emphasize the importance of the concepts and the relationships to the system, weights are set. The more important they are, the larger weights they will have. The weights of the relationships can be set arbitrarily by the author according to where B(u) is the set of concepts that link to concept u. W (u) and W (v) represent the weights of concepts u and v, respectively. The weights of the concepts and the relationships are set to the corresponding nodes and links in the concept network. • Combining the multiple links between nodes: To reduce the search time through the concept network, multiple links between nodes are combined. The multiple links are combined to a single link, and the maximal weight of the links is set to the combined link. • Finding related concepts: Finding related concepts for each concept off-line in advance is another efficient method to reduce the response time of AWSs. The related concepts are discovered by calculating optimal paths between each pair of concepts according to the importance of concepts and relationships. Equation 4 is defined to calculate the paths’ weights. The path with the largest weight is defined as the optimal path and saved into the domain model. P W (Ci ) × W (Ci−1 Ci ) n−1 (4) where PCs Ct is any path from the source concept Cs to the target concept Ct . W (PCs Ct ) represents the weight of the path PCs Ct . n is the number of concepts on the path. Ci denotes the ith concept on the path. W (Ci ) and W (Ci−1 Ci ) represent the weights of concept Ci and link Ci−1 Ci , respectively. W (PCs Ct ) = i = 2 to n 2.2.4 The Recommendation Provider Finally, a built-in recommendation provider is introduced into DMS to recommend related concepts to the given requests. Once DMS receives a request about a specific concept, the system checks the information contained in the request. If there is any user’s information (i.e., interests, browsed history in the current session), the information is passed to the recommendation provider with the requested concept. Otherwise, only the requested concept is passed. The recommendation provider finds a list of closely related concepts to the requested concept and/or the user information according to the optimal paths and the discovered patterns of association and maximal frequent sequences stated in the domain model. The name and/or URLs of the related concepts are passed to DMS. At the end, DMS provides the requested concept with these recommended concepts to the given request as a response. 3 Evaluation For the purpose of evaluating our work, DMS is applied to an AWS, Adaptive Recommendation for Academic Scheduling (ARAS). ARAS is an online system that aims to provide adaptive support to the course selection process for students without the assistance of advisors. The system generates recommendations to students based on course information, and users’ interests and course-taken history. To demonstrate the helpfulness of DMS to ARAS, three sample systems are developed and compared based on their performance: • System A: in which only relationships prerequisite and containment are considered. Once the system receives a concept request, it searches through the relationships and concepts and finds related concepts online for the given request based on the user’s interests. • System B: in which all predefined relationships defined in DMS are considered. The system finds all related concepts for each concept in the domain model in advance off-line by following the process described in section 2.2.3. Once the system receives a concept request, it provides the related concepts of the given request from its recorded related concept list immediately. • System C: in which all predefined relationships defined in DMS are considered. Similar to system B, the related concepts of each concept are found offline in advance based on these relationships and concept information. In addition, the discovered relationships, which are associations and maximal frequent sequences, are discovered in advance in this system. Once the system receives a concept request, the system not only finds the related concepts of the given request from its recorded related concept list, but also generates recommendations based on the associations and the maximal frequent sequences. The performance of these systems is measured in two ways: accuracy of recommendations, and response time of requests. The accuracy of recommendations measures how close the recommendations are to what users want. The response time measures how fast the system can provide the requested concept to users. In this research, the accuracy of recommendations is calculated based on two parameters: recall and precision. Recall is the portion of the number of concepts recommended correctly by the system to the number of concepts the users really want. Precision is the portion of the number of concepts recommended correctly by the system to the total number of recommended concepts. Since ARAS takes the University of New Brunswick, Fredericton, Canada, as the sample application domain, the real data of the students at the university is used as sample data for DMS. The sample data is preprocessed in two steps: firstly, a set of data is separated from the sample data set for mining the discovered relationships; then, the rest of the data is divided into training data sets and test data sets: the courses taken in each student’s last term are considered as a test set, which represents what the student wants, and the courses taken in previous terms are considered as a corresponding training set, which represents the student’s course taken history. The system generates recommendations for each student based on course information, curriculum information, and students’ interests, which are discovered from the student’s course taken history. The recommendations are measured against the corresponding test data set. Figure 4 presents the accuracy of the recommendations provided by the three sample systems. In the graph, the number of relationship types considered in the system is shown as the x-axis while accuracy is taken as the y-axis. The graph shows that the more relationships are considered in the system, the bigger recalls but smaller precisions the recommendations have because the more recommendations are provided. However, since the decrement ratio of the precisions is less than the increment ratio of the recalls, Sys- tems B and C provide better recommendations. Therefore, DMS represented by System C improves the performance of ARAS in accuracy of recommendations. 60 Recall Precision 50 References Accuracy (%) 40 30 20 10 0 3 4 5 6 7 8 9 Number of Relationship Types 10 11 12 13 Figure 4. Accuracy of the Sample Systems To measure the response time, the average response time of requests is calculated in the sample systems. The results are shown in Table 1. The key point of the results is that even though many more relationships are considered in Systems B and C, their response times are still less than that of System A. This result shows that DMS accelerates ARAS’ responses. System A System B System C tems. In the future, discovering techniques for generating relationship weights automatically based on concept information is planed. In addition, the system is planned to be extended with an ontology proxy for exchanging information between different domains. Average Response Time(millSec) 2740200 2020 2555 Table 1. Average response time of the sample systems In general, improvements of DMS to ARAS in accuracy of recommendations and response time demonstrate that DMS improves the performance of ARAS. 4 Conclusion In this paper, a domain modeling system is presented. The system not only provides a list of general concept relationships in AWSs, but also introduces a general structure for recording domain information for AWSs. Moreover, in order to improve the performance of AWSs, techniques of effectively using the domain information are addressed. Finally, the feasibility of the domain modeling system is presented by the experiment comparisons of three sample sys- [1] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J. B. Bocca, M. Jarke, and C. Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 1994. [2] P. Brusilovsky, J. Eklund, and E. Schwarz. Web-based education for all: A tool for developing adaptive courseware. In Computer Networks and ISDN Systems (Proceedings of Seventh International World Wide Web Conference), pages 291–300, April 1998. [3] P. De Bra and L. Calvi. AHA: a generic adaptive hypermedia system. In Proceedings of the 2nd Workshop on Adaptive Hypertext and Hypermedia, HYPERTEXT‘98, Pittsburgh, USA, June 20–24 1998. [4] C. Eliot, D. Neiman, and M. Lamar. Medtec: A web-based intelligent tutor for basic anatomy. In World Conference of the WWW, Internet, and Intranet (Web-Net’97), pages 161– 165, Toronto, Canada, October 1997. [5] M. Kilfoil, A. Ghorbani, W. Xing, Z. Lei, J. Lu, J. Zhang, and X. Xu. Toward an adaptive web: The state of the art and science. In Proceedings of Communication Network and Services Research (CNSR) 2003 Conference, pages 108– 119, Moncton, NB, Canada, May 15–16 2003. [6] M. Klein, J. Broekstra, D. Fensel, F. van Harmelen, and I. Horrocks. Spinning the Semantic Web, chapter 4, pages 95–141. The MIT Press, 2003. [7] W. Nejdl and M. Wolpers. Kbs hyperbook – a data-driven information system on the web. In WWW8 Conference, Toronto, May 1999. [8] G. Neumann and J. Zirvas. Skill - a scalable internet-based teaching and learning system. In Proceedings of WebNet 98, World Conference on WWW, Internet and Intranet AACE, pages 7–12, Orlando, Fl, November 1998. [9] M. Specht and R. Opermann. Ace - adaptive courseware environment. The New Review of Hypermedia and Multimedia, 4:141–161, 1998. [10] M. Specht, G. Weber, S. Heitmeyer, and V. Schoch. Ast: Adaptive www-courseware for statistics. In Workshop on Adptive Systems and User Modeling on the World Wide Web at UM’97 Conference, pages 91–95, Chia Laguna, Sardinia, Italy, June 1997. [11] G. Weber and M. Specht. User modeling and adaptive navigation support in www-based tutoring systems. In Proceedings of User Modeling’97, pages 289–300, 1997.