Semantic Web Aided Itineraries Planner CS 8803 Advanced Internet Application Development Aditya Devurkar – adityad@gatech.edu Aditya Sakhuja – aditya.sakhuja@gmail.com Mayur Bhosle – mayoorbhosle@gmail.com Rohit Sud – rohit@gatech.edu Contents Introduction 3 Objective 3 Research Issues addressed 4 Extracting Semantic data for use 4 Correlating user profiles, trip parameters and spot information 4 Feedback 5 Overview 5 System Architecture 6 Concepts and Techniques Used 9 Contributions 10 Project Learning 10 Future Work 11 References 12 Introduction The aim of the project is to develop a trip planner which would automatically suggest optimal trip combinations based on user profile and preferences. Existing solutions in the literature provide suggestions derived from data that is static in nature. Some of them suggest pre-existing trips like Tour Guide Mike [1]. These are bound by the pre-designed itineraries inside the knowledgebase of the system. On the other extreme are simple itinerary solutions that provide the user with a tool to record his itinerary details without offering any useful suggestions. Thus we felt a need to develop intelligent trip planners that suggest places of interest to the user. The ‘intelligent’ solution to the problem involves analyzing user profile, his current wishes and selecting the cities based on these parameters. We also seek to leverage semantic information relating to travel stored in the RDF format. We use SPARQL to query data from the data source and use it to populate a relational database with the records. This is done for simplifying data access and for increasing efficiency of data access. Objective The objective of this project is to come up with a prototype which illustrates the advantage of a personalized recommendation based tour planner. This involves creating a module for interpreting and extracting semantic data stored as RDF and populating a traditional data source for use by the application. Another key objective is to identify and implement metrics which can be used to suggest personalised itineraries to a user based on several static and dynamic factors. An evaluation of the system was done by analyzing the accuracy of the results with respect to results expected for test cases. Research Issues addressed Our project aims at addressing multiple research issues. They are discussed in detail below. Extracting semantic data for use There is a huge amount of semantic data present in various data repositories on the Internet. Not only this, the variety of semantic data present is phenomenal. For the purpose of our project, we concentrated on travel related data held by these data repositories. During our research we found quite a few data repositories but most of them contained generic information about places. This was limited to geographical information (like latitude, longitude, etc) and political information (country, whether capital of a region, etc). For our purpose we required more information than this. So we looked for richer sources of information. We found two sources that we were able to leverage to seek more relevant information about places. One was DBpedia2. DBpedia provided us with information obtained from the Wikipedia but classified using semantics. Further the data was available as RDF triples which could be leveraged by us as a source for place information. We used this source to obtain Latitude and Longitude information for places in the KB. Another source that we used was WikiTravel.org. This provided us with a richer source of information. We are able to extract quite a lot of information relating to cities and places of interest from this source. It even provided us with information regarding various recurring events that take place in a city. We extracted information regarding various places of interest in the city called spots. This information helped us classify spots according to various categories. In addition, we got additional information about the spots in terms of the cost to visit a spot and time that a person needs to spend for each visit. For events, we considered only recurring events, especially the ones that recur yearly. The reason for this was that information regarding current events was not available readily in the data sources used by us. Also we felt that events that recur were more instrumental in influencing user’s perception about a city as a place to visit. Hence we included only yearly events in our data source. We also attempted a classification of these events based on the type of these events (like social, political, etc). Correlating user profiles, trip parameters and spot information (metrics) We use a matrix to evaluate the similarities between various parameters. User-City similarity Matrix: It is based on the user interests and city categories and each city category is the aggregate of all its constituent spots. For eg: Any user who shares his interest with majority of the city’s spot categories will have a higher score. Also every parameter has a value and we use this value to compute the score based on the formula: Score = ∑i wixi wi = weight associated with the parameter xi. The weight associated can be either positive or negative and negative weight indicates that this parameter contributes negatively to the ranking of the city. As a sample example, we would not want to present a user interested in adventure sports a city that is known for its museums. In such a scenario the weight associated with museum is negative. Feedback System We argue that there may be situations in which the results that were presented to the user at some point are likely to change based on weather or personal preference. It is for this reason that we have built the feedback system to account for user preference. Once the user indicates that certain city (or cities) is not what the user desires, we re-run the algorithm this time eliminating the cities that user indicates. Overview The system overview can be visualized by the following diagram: interests: art, museum, park, sports … USER REGISTERS time: length of trip place: place of origin (interests) USER SUBMITS QUERY (interests, time, cost, place) CITY SELECTOR cities : list of cities in the current plan city_list: feedback from the user (cities) DISPLAY MODULE (city_list) We require the user to register with our service before using it, we ask the user to detail his interests at the time of registration. Once the user has registered, he can either start planning right away or use the system at a later date. At the point when the user wants to plan his trip, all needs to provide the system is the start and the end date of his journey and the place of origin. The CITY SELECTOR algorithm factors his interests and matches them with the cities in the KB and displays a set of cities that conform to his original specification. The user can chose to either accept the cities or he may chose not visit one or more cities in the output. In the latter case, the CITY SELECTOR algorithm runs again to provide the user with a fresh list of cities. System Architecture The system is organized as shown in the diagram below: Figure 1: System Architecture for the Trip Planner The chief modules of the system are: 1. Itinerary planner Algorithm (Knowledge base and Inference module) 2. Semantic data extractors - Extractors for XML/RDF sources, Other (data source) extractors 3. Relevance Feedback module 4. Result display module - For displaying the output on the Maps (Google/Yahoo!) we need to convert the engine's output data into extensible mark-up format so that it can displayed on the Maps. 5. User Authentication/Constraint specification module 6. Registration module - Users profile database Registration Module/User Database This is a simple interface that we provide for new users to register with us. It is linked to the user data base which will store relevant information like the user interests, places that he/she has visited earlier. Semantic Data Extractor This module will use SPARQL to query on all related semantic databases and will also incorporate the special 'nearby' function that we would need when servicing a query of the form. (Select all places 'nearby' Atlanta which satisfy certain constraints). Once it retrieves the results it will present the results to its inference engine mini module. This module would be responsible for operating on RDF data sources. Weather / TravelWiki Extractor This module is responsible for extracting weather related information of the places concerned. The weather and location information is also important for a traveler in deciding on a visit to a place. We aim to provide weather information and other local news information so that the user can make a judicious decision whether to actually visit the place. Feedback system This system feeds back the inference engine about the itinerary selection which the user has made from the set of available options. Depending upon the selections the weights of the factors involved in the pair <requirements, itinerary> are updated. Itinerary generator algorithm Input 1. Static user details Name, age, gender, interests, home town, activities (Rock Climbing, Trekking, pubs, casinos, amusement parks, museums), Trip members details 2. Dynamic user details Looking for (romantic place, eventful place, crowded place, calm/quiet places),start and end date, Cost limitations 3. City details • • • Source City Name City Climate City Category (romantic,) • Special events 4. Spot Details • Spot Name Spot Category (romantic, historic, scenic ….) City ID Spot Hotness rating Spot best visit time Expected time to spend Special events Data Structures generated City-City graph (air-travel time based) City-City graph (train-travel time based) City-City graph (road-travel time based) Output • We as output display a map on which we display the cities which satisfy the interests of the user. High level flow • • Accept the source city, travelling start date, travelling end date, travel preferences (daytime, night-time, no pref.), Country/Continent to tour. Use users interests, activities to filter the cities in the specified Country/Continent • • • • • • Generate the city-city time matrix as mentioned above from the data source (city-city travelling time ) Consider the following factors: “Time to spend at spot” “Spot hotness” “Spot visit time (day/night)” 5. The feedback module would fetch the satisfaction rating from the users in response to the itinerary presented to them. The feedback would typically be collected after the user travels and comes back. Each city and spot would get the hotness rating from the user. It would be used in producing future recommendations. 6. The registration and authentication modules are responsible for registration/authentication respectively. Knowledge Base This would contain the user feedbacks and city/spot details. This is needed to perform Meta reasoning on the results that we generated using the nearest neighbor reasoning. Concepts and Techniques used For retrieving data from the semantic web, we made use of data represented using the Resource Description Framework (RDF) format. RDF is a standard proposed by the W3C to model meta-data about resources available on the web. It is extensively used model information in knowledge representation systems. Our project primarily used the syntactical aspects of RDF data to retrieve relevant information for the travel domain. RDF describes data as triples. A triple consists of a resource linked to another resource through an arc labelled with a third resource. A resource refers to a conceptual mapping of an entity. In our application City, Spots and Events are all resources. An example triple might be: Figure 2: RDF triple Also an RDF triple might refer to text called literals instead of resources. Of course, these literals could themselves be modeled as resources. Figure 3: RDF Triple with literal Project Contributions Our project provides a new way to access travel information to plan trip itineraries. Current planners are built to operate on machine readable data (like numbers and co-ordinates). Our project is able to utilize semantic information (like categories, cost of visit, travel duration) to come up with more useful results than those obtained by just using static data. The best way to illustrate this would be through the use of an example. Suppose a user wants to spend his week at a scenic location like a tourist spot in the Rocky Mountains. Currently existing trip planners do not provide him with any mechanism to send such a query to the system. On the other hand, our system allows the user to specify such parameters (which could only be found in a semantic data source) and still use other traditional parameters like trip duration and distance to come up with a list of possible destinations. Further it is able to match a user’s profile information with the generated results and further refine the list it comes up with. This is the way our approach adds to the currently prevailing systems. We have factored the feedback system as we realize no planning is complete till it is actually executed. Hence we allow the user to filter his response till he is satisfied with the output. We also display the possible weather conditions at the locations in the output about a week prior to his start date. This provides the user with the opportunity to decide if he would actually want to undertake the trip or not. We realize that any recommendation system is as good as its worst recommendation. We propose to build a reasoning system which analyses the relation between user interest and city categories. Such a system would be highly beneficial to benchmark the results and provide users with better quality recommendations. Our contribution to the project has been largely restricted by the data set. We had to manually generate the RDF data for our project. Our only motivation to create a data set was to demonstrate that such an application can be created and is scalable for multicity data. Project Experience and Learning Our learning from the project can be classified into two separate divisions: Idea: We found out that people plan vacations largely based on the recommendation by their peers or relatives. There is no recommendation system in market which is generic and also personalized in the sense that it allows users to be able visit any location they chose to and still give priority to their interests. We strongly believe that a system like ours will have audience as it targets a particular market that has not been served earlier. Technology: We worked with a wide array of technologies. We gradually built our understanding of the language by trying to write simple scripts to communicate with various web services. We also extensively researched Resource Description Framework (RDF) to the extent that we have created our own schema. One good thing about RDF is that it can be easily extended. Hence our project can be extended to serve a larger data as and when the data becomes available. We were able to bring together information made available from various API’s to become available for one common purpose. Another interesting technology that we learnt during the course of the project was the SPARQL syntax. It is not only necessary to have information in machine readable form but also to be able to access it. We merged a variety of services to achieve one common goal. Tools and Services Used [A] RAPI: RDF API for PHP Source: (http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/tests.html) [B] Google Maps API for Display Module Source: (http://code.google.com/apis/maps/) [C]Weather Information Source: (http://www.weather.gov/forecast/xml/) Future Work This system has vast scope of improvement in future not only in terms of range of information it provides for each result in the result set but also the quality of the result set. Future extensions would involve bringing in more dimensions to the system. This would involve suggesting places to eat, places to stay for the cities in the recommended paths. Ideas from [11] can be incorporated to extend upon the core engine. The algorithm could support extra constraints specified by the user (Example constraints: Some cities which she definitely wants to visit, cost limitations, minimum time to spend in some specific holiday spots) We can also looking at the possibility of using relevance feedback from the users for particular location to improve quality of the itinerary generator algorithm. More realistic factors could be incorporated into the algorithm dynamically by adding modules like commutation information module and accommodation information modules. Another interesting extension would be to create a social network on top the system. For example, if the user wishes to make his profile public then he contact other people who are travelling with him at the same time, who have same interest. This would be really helpful since user can become part of a group in may come know other interesting places to visit in that region. References: [1] http://www.tourguidemike.com/ [2] www.dbpedia.org [3] http://www.google.com/tripplanner [4] http://www.mapquest.com [5] http://www.tripit.com/ [6] "Crumpet: Creation of user-friendly mobile services personalised for tourism", Stfan Poslad, Heimo Laamanen, Rainer Malaka, Achim Nick, Phil Buckle and Alexander Zip [7] Cyberguide: A Mobile context aware tour guide. [8] http://www.travelok.com/ [9] http://wikitravel.org/en/Main_Page [10] http://www.holidayandtravelguide.com/ [11] http://www.holidaytraveldestinations.com/ [12] http://www.lonelyplanet.com/ [13]http://www.w3c.org/RDF [14] http://www.w3.org/TR/rdf-sparql-query/