A SemanticLog – Semantic logging of user interaction evidence A.1 Basic Information Effective personalization and (automatic) user modeling rely on the availability and quality of evidence about a user’s interaction with an adaptive system. Therefore, it is imperative that precise data about user actions are recorded and that its associated semantics are preserved for further processing by inference agents in the user modeling process. Traditional logging approaches focus on server-side logs produced by web servers, which contain data incoming client-side requests (e.g., timestamp, URL addresses). On the other hand, SemanticLog employs another logging approach that stresses logging from the actual application that processes requests into responses and vice versa, both on the server side and on the client side. A.1.1 Basic Terms Event ontology An ontology that describes the possible events that can occur in a system as results of user interaction together with their attributes and their mapping onto specific users and user sessions. Semantic logging The logging of defined events resulting from user actions (user interaction) together with semantic metadata that describe what exact events occurred along with their attributes based on an event ontology. A.1.2 Method Description The logging method used by SemanticLog is based on three principles: Representing the events via an event ontology which describes their possible types and attributes in a clear, concise and machine readable way while preserving their semantics for further use by (third party) inference agents. Logging of events by the respective application modules, which “know” best what actually happened within the application, i.e., what events happened and what their context and semantics were. Logging of both client-side and server-side events with their successive integration into a single stream of events for individual users on a per session basis. The key part of the used logging method is related to the event ontology, which describes events and their attributes in a flexible and extensible way (Fig. 1), so that they can be used for successive analysis of user behavior for personalization. Each event is associated with a unique user and user session thus including support for multiple simultaneous user sessions (bottom left). For each event a recursive set of associated eventAttributes can be stored describing events and their context in more detail (bottom right). Both events and their attributes have defined types. 0..1 typesOfDisplayedItemAttributes typesOfDisplayedItem id bigint unsigned name varchar(100) * * id bigint unsigned name varchar(100) 1 * 1 * * displayedItemAttributes displayedItems id bigint unsigned * 1 id bigint unsigned name varchar(100) * * 0..1 0..1 0..1 typesOfEventAttributes eventAttributes * * displayStates id bigint unsigned 1 fromState 1 toState * id bigint unsigned value varchar(100) * * * * * typesOfEvents 1 events id bigint unsigned timestamp timestamp 1 id bigint unsigned name varchar(100) * * id bigint 1 unsigned name varchar(100) * 1 sessions users login varchar(20) uri varchar(100) 1 * id bigint unsigned start datetime end datetime Fig. 1. The relational data model corresponding to the used event ontology schema. To further support the processing of logs by (user model) inference agents, the state of the user interface at the time an event occurred is logged. This includes the displayState before and after an event occurs (left). Each display state is composed of displayedItems and their attributes, which comprise the respective user interface visible to the user and thus provide the state of the world based on which the user made his decision to cause an event (top right). A.1.3 Scenarios of Use SemanticLog can be used to log and integrate user interaction evidence in the form of semantic events occurring both on the server side and client side of a system. Thus the possible usage scenarios are as follows: Client side user interaction logging, e.g., moving the mouse, hovering on links. Server side user interaction logging, which is usually associated with the processing of client requests into responses sent back to the client. For example, a faceted browser can log application specific details server side events such as SelectRestriction based on what links the user clicks in the user interface. SemanticLog should not be used in following cases: There is no event ontology available for the respective application. There is no consumer for the logged data. A.1.4 External Links and Publications Tvarožek, M., Barla, M., & Bieliková, M. (2007). Personalized Presentation in Web-Based Information Systems. In J. Van Leeuwen, G. F. Italiano, W. van der Hoek, H. Sack, C. Meinel, & F. Plášil (Ed.), Lecture Notes in Computer Science: Proceedings of SOFSEM 2007 - Theory and Practice of Computer Science. LNCS 4362, pp. 796-807. Harrachov, Czech Republic: Springer-Verlag, Berlin Heidelberg. Andrejko, A., Barla, M., Bieliková, M., & Tvarožek, M. (2006). Softvérové nástroje pre získavanie charakteristík používateľa. In P. Vojtáš, & T. Skopal (Ed.), Proceedings of DATAKON ‘06, (pp. 139-148). Brno, Czech Republic. Hibernate, object-relational mapper, (http://www.hibernate.org/). Log4J, Java-based logging utility, (http://logging.apache.org/log4j) Apache Software Foundation. Tomcat, a Java (tomcat.apache.org) Apache Software Foundation. A.2 servlet container, Integration Manual SemanticLog is developed in Java (Standard Edition 5) as a library, which uses relational persistence via Hibernate and MySQL. It should be used in conjunction with other applications that need to log user interaction evidence for further processing. The distribution of SemanticLog consists of these parts: A jar archive, which contains the binary files of the logging service. A MySQL script used to initialize the MySQL database schema. A set of configuration files, which initialize Log4j logging, and describe the notification and use of external inference agents (LogAnalyzer), and the mapping rules for Hibernate. A.2.1 Dependencies SemanticLog uses these external tools and libraries: LogAnalyzer user model inference agent, which is notified and/or invoked when new events are ready to be processed. UserLogs library, which facilitates relational persistence for the event ontology model via Hibernate. Hibernate for relational persistence. Log4J logging utility. A.2.2 Installation Deploying SemanticLog with other applications involves these steps (Java Integrated Development Environment should be used): 1. Including the SemanticLog jar archive into the project. 2. Adding the SemanticLog and dependency jar archives to the classpath. 3. Adding the optional SemanticLog.jar file to the root project directory. 4. Adding logging data for the log4j.properties file and the hibernate.cfg.xml file. A.2.3 Configuration SemanticLog must be configured to use a suitable (hibernate.cfg.xml), and Log4J service (log4j.properties). relational database The optional SemanticLog.properties file contains the following values: CallLogAnalyzer LogAnalyzerMessageThreshold The – true if LogAnalyzer should be invoked (default = false). – the number of events after which LogAnalyzer is invoked if it is enabled (default = 1). file should contain configuration for the semanticlog logger, the hibernate.cfg.xml file must contain connection data to a valid and initialized MySQL database, where log data is to be stored. log4j.properties Additionally, the metadata level of event types and their respective attributes should be initialized before their first use as these are required during the logging process. A.2.4 Integration Guide SemanticLog is used to log events caused by user interaction. The typical usage scenario involves first a user login (method LogEventUserLogin), next zero or more application events (method LogEvent) and lastly a user logout (method LogEventUserLogout). For detailed description of individual logging methods see the accompanying javadoc documentation. Error handling Most errors originate either from bad configuration and/or initialization (e.g., nonexisting database, event or attribute types), or from caching, connection pooling and session issues in MySQL used by associated tools. If an error is too serious or unknown and thus cannot be reasonably handled, an exception is thrown. Otherwise, a log entry via log4j is made. A.3 Development Manual A.3.1 Tool Structure SemanticLog consists of a single package: sk.fiit.nazou.semanticlog. A.3.2 Method Implementation The implementation of SemanticLog is relatively straightforward. Its operation consists of two primary stages. First, during initialization the existing event, attribute and display state types are loaded. Next, during normal operation SemanticLog waits for incoming events and logs them as they arrive. The respective logging methods first create an object model of the incoming event in memory and then store it in a relational database via Hibernate. For faster processing, a set of hash map tables are used to speed up seeks of event and attribute types and sessions. A.3.3 Enhancements and Optimization Originally, SemanticLog was implemented as a Web service running in the Axis Web service container. However, during evaluation we identified a significant performance bottleneck related to the constant serialization and deserialization of displayStates. Thus the design was changed to a library, which was invoked directly alleviating this problem. However, this quick workaround effectively almost negated the original intention to integrate events from various sources since each source effectively had its own instance of SemanticLog (i.e., events from the same session may not be combined). To fix this problem we devised yet did not implement a hybrid approach that used a Web service for the initial logging of basic event data (user, session, event) and a jar library for the logging of complex displayState data (only available on the server side). Otherwise, the performance of SemanticLog is near optimal with the exception of event evaluation via LogAnalyzer, which is invoked synchronously thus slowing down the logging process. Thus an interesting optimization prospect would be to invoke LogAnalyzer asynchronously, possibly as a Web service. A.4 Manual for use in Other Application Domains The use of SemanticLog in other application domains requires solely the definition of events, event types and the corresponding attributes, attribute types and display states relevant for the particular domain. As this can be done without any code and configuration changes by solely preparing the required data offline and then population the log database, there are no application changes required for the use of SemanticLog in another application domain. A.4.1 Configuration for use in Other Application Domains No domain specific configuration, other than the one described in the configuration section in necessary. Solely the definition of the respective events and corresponding entities in the database is required. A.4.2 Dependencies Apart from LogAnalyzer, all dependencies are effectively domain independent. Furthermore, LogAnalyzer itself is not explicitly required by the logging method as such, but is a dependency out of convenience where log evaluation is invoked automatically when new data are available. Any other inference agents specific for a particular domain can be used instead of LogAnalyzer provided that it accepts the created logs as input (i.e., “understands” the event ontology). Detailed information about LogAnalyzer domain dependency is described in its respective documentation. It may require some domain specific adjustments, although the inference rules that are used are more dependent on the navigation model of a faceted browser rather than on the application domain itself. However, as some additional domain specific rules may be used, domain specific adjustments may be useful.