Semantic Web Aided Itineraries Planner

advertisement
Semantic Web Aided Itineraries Planner
CS 8803 Advanced Internet Application Development
Aditya Devurkar – adityad@gatech.edu
Aditya Sakhuja – aditya.sakhuja@gmail.com
Mayur Bhosle – mayoorbhosle@gmail.com
Rohit Sud – rohit@gatech.edu
Contents
Introduction
3
Objective
3
Research Issues addressed
4
Extracting Semantic data for use
4
Correlating user profiles, trip parameters and spot information
4
Feedback
5
Overview
5
System Architecture
6
Concepts and Techniques Used
9
Contributions
10
Project Learning
10
Future Work
11
References
12
Introduction
The aim of the project is to develop a trip planner which would automatically
suggest optimal trip combinations based on user profile and preferences.
Existing solutions in the literature provide suggestions derived from data that is
static in nature. Some of them suggest pre-existing trips like Tour Guide Mike
[1]. These are bound by the pre-designed itineraries inside the knowledgebase
of the system. On the other extreme are simple itinerary solutions that provide
the user with a tool to record his itinerary details without offering any useful
suggestions. Thus we felt a need to develop intelligent trip planners that suggest
places of interest to the user. The ‘intelligent’ solution to the problem involves
analyzing user profile, his current wishes and selecting the cities based on these
parameters.
We also seek to leverage semantic information relating to travel stored in the
RDF format. We use SPARQL to query data from the data source and use it to
populate a relational database with the records. This is done for simplifying data
access and for increasing efficiency of data access.
Objective
The objective of this project is to come up with a prototype which illustrates the
advantage of a personalized recommendation based tour planner. This involves
creating a module for interpreting and extracting semantic data stored as RDF
and populating a traditional data source for use by the application. Another key
objective is to identify and implement metrics which can be used to suggest
personalised itineraries to a user based on several static and dynamic factors. An
evaluation of the system was done by analyzing the accuracy of the results with
respect to results expected for test cases.
Research Issues addressed
Our project aims at addressing multiple research issues. They are discussed in
detail below.
Extracting semantic data for use
There is a huge amount of semantic data present in various data repositories on
the Internet. Not only this, the variety of semantic data present is phenomenal.
For the purpose of our project, we concentrated on travel related data held by
these data repositories. During our research we found quite a few data
repositories but most of them contained generic information about places. This
was limited to geographical information (like latitude, longitude, etc) and
political information (country, whether capital of a region, etc). For our purpose
we required more information than this. So we looked for richer sources of
information. We found two sources that we were able to leverage to seek more
relevant information about places. One was DBpedia2. DBpedia provided us with
information obtained from the Wikipedia but classified using semantics. Further
the data was available as RDF triples which could be leveraged by us as a source
for place information. We used this source to obtain Latitude and Longitude
information for places in the KB. Another source that we used was
WikiTravel.org. This provided us with a richer source of information. We are able
to extract quite a lot of information relating to cities and places of interest from
this source. It even provided us with information regarding various recurring
events that take place in a city. We extracted information regarding various
places of interest in the city called spots. This information helped us classify
spots according to various categories. In addition, we got additional information
about the spots in terms of the cost to visit a spot and time that a person needs
to spend for each visit. For events, we considered only recurring events,
especially the ones that recur yearly. The reason for this was that information
regarding current events was not available readily in the data sources used by
us. Also we felt that events that recur were more instrumental in influencing
user’s perception about a city as a place to visit. Hence we included only yearly
events in our data source. We also attempted a classification of these events
based on the type of these events (like social, political, etc).
Correlating user profiles, trip parameters and spot information
(metrics)
We use a matrix to evaluate the similarities between various parameters.
User-City similarity Matrix: It is based on the user interests and city categories
and each city category is the aggregate of all its constituent spots. For eg: Any
user who shares his interest with majority of the city’s spot categories will have
a higher score.
Also every parameter has a value and we use this value to compute the score
based on the formula:
Score = ∑i wixi
wi = weight associated with the parameter xi.
The weight associated can be either positive or negative and negative weight
indicates that this parameter contributes negatively to the ranking of the city. As
a sample example, we would not want to present a user interested in adventure
sports a city that is known for its museums. In such a scenario the weight
associated with museum is negative.
Feedback System
We argue that there may be situations in which the results that were presented
to the user at some point are likely to change based on weather or personal
preference. It is for this reason that we have built the feedback system to
account for user preference. Once the user indicates that certain city (or cities)
is not what the user desires, we re-run the algorithm this time eliminating the
cities that user indicates.
Overview
The system overview can be visualized by the following diagram:
interests: art, museum, park, sports …
USER
REGISTERS
time: length of trip
place: place of origin
(interests)
USER SUBMITS
QUERY
(interests, time, cost, place)
CITY
SELECTOR
cities : list of cities in the current plan
city_list: feedback from the user
(cities)
DISPLAY
MODULE
(city_list)
We require the user to register with our service before using it, we ask the user
to detail his interests at the time of registration. Once the user has registered,
he can either start planning right away or use the system at a later date.
At the point when the user wants to plan his trip, all needs to provide the system
is the start and the end date of his journey and the place of origin. The CITY
SELECTOR algorithm factors his interests and matches them with the cities in
the KB and displays a set of cities that conform to his original specification. The
user can chose to either accept the cities or he may chose not visit one or more
cities in the output. In the latter case, the CITY SELECTOR algorithm runs again
to provide the user with a fresh list of cities.
System Architecture
The system is organized as shown in the diagram below:
Figure 1: System Architecture for the Trip Planner
The chief modules of the system are:
1. Itinerary planner Algorithm (Knowledge base and Inference module)
2. Semantic data extractors - Extractors for XML/RDF sources, Other
(data source) extractors
3. Relevance Feedback module
4. Result display module - For displaying the output on the Maps
(Google/Yahoo!) we need to convert the engine's output data into
extensible mark-up format so that it can displayed on the Maps.
5. User Authentication/Constraint specification module
6. Registration module - Users profile database
Registration Module/User Database
This is a simple interface that we provide for new users to register with us. It is
linked to the user data base which will store relevant information like the user
interests, places that he/she has visited earlier.
Semantic Data Extractor
This module will use SPARQL to query on all related semantic databases and will
also incorporate the special 'nearby' function that we would need when servicing
a query of the form. (Select all places 'nearby' Atlanta which satisfy certain
constraints). Once it retrieves the results it will present the results to its
inference engine mini module. This module would be responsible for operating
on RDF data sources.
Weather / TravelWiki Extractor
This module is responsible for extracting weather related information of the
places concerned. The weather and location information is also important for a
traveler in deciding on a visit to a place. We aim to provide weather information
and other local news information so that the user can make a judicious decision
whether to actually visit the place.
Feedback system
This system feeds back the inference engine about the itinerary selection which
the user has made from the set of available options. Depending upon the
selections
the
weights
of
the
factors
involved
in
the
pair
<requirements, itinerary> are updated.
Itinerary generator algorithm
Input
1. Static user details
Name, age, gender, interests, home town, activities (Rock Climbing, Trekking,
pubs, casinos, amusement parks, museums), Trip members details
2. Dynamic user details
Looking for (romantic place, eventful place, crowded place, calm/quiet
places),start and end date, Cost limitations
3. City details
•
•
•
Source City Name
City Climate
City Category (romantic,)
•
Special events
4. Spot Details
•






Spot Name
Spot Category (romantic, historic, scenic ….)
City ID
Spot Hotness rating
Spot best visit time
Expected time to spend
Special events
Data Structures generated
City-City graph (air-travel time based)
City-City graph (train-travel time based)
City-City graph (road-travel time based)
Output
•
We as output display a map on which we display the cities which satisfy
the interests of the user.
High level flow
•
•
Accept the source city, travelling start date, travelling end date, travel
preferences (daytime, night-time, no pref.), Country/Continent to tour.
Use users interests, activities to filter the cities in the specified
Country/Continent
•
•
•
•
•
•
Generate the city-city time matrix as mentioned above from the data
source (city-city travelling time )
Consider the following factors:
“Time to spend at spot”
“Spot hotness”
“Spot visit time (day/night)”
5. The feedback module would fetch the satisfaction rating from the users in
response to the itinerary presented to them. The feedback would typically be
collected after the user travels and comes back. Each city and spot would get
the hotness rating from the user. It would be used in producing future
recommendations.
6. The registration and authentication modules are responsible for
registration/authentication respectively.
Knowledge Base
This would contain the user feedbacks and city/spot details. This is needed to
perform Meta reasoning on the results that we generated using the nearest
neighbor reasoning.
Concepts and Techniques used
For retrieving data from the semantic web, we made use of data represented
using the Resource Description Framework (RDF) format. RDF is a standard
proposed by the W3C to model meta-data about resources available on the web.
It is extensively used model information in knowledge representation systems.
Our project primarily used the syntactical aspects of RDF data to retrieve
relevant information for the travel domain.
RDF describes data as triples. A triple consists of a resource linked to another
resource through an arc labelled with a third resource. A resource refers to a
conceptual mapping of an entity. In our application City, Spots and Events are
all resources.
An example triple might be:
Figure 2: RDF triple
Also an RDF triple might refer to text called literals instead of resources. Of
course, these literals could themselves be modeled as resources.
Figure 3: RDF Triple with literal
Project Contributions
Our project provides a new way to access travel information to plan trip
itineraries. Current planners are built to operate on machine readable data (like
numbers and co-ordinates). Our project is able to utilize semantic information
(like categories, cost of visit, travel duration) to come up with more useful
results than those obtained by just using static data.
The best way to illustrate this would be through the use of an example. Suppose
a user wants to spend his week at a scenic location like a tourist spot in the
Rocky Mountains. Currently existing trip planners do not provide him with any
mechanism to send such a query to the system. On the other hand, our system
allows the user to specify such parameters (which could only be found in a
semantic data source) and still use other traditional parameters like trip duration
and distance to come up with a list of possible destinations. Further it is able to
match a user’s profile information with the generated results and further refine
the list it comes up with. This is the way our approach adds to the currently
prevailing systems.
We have factored the feedback system as we realize no planning is complete till
it is actually executed. Hence we allow the user to filter his response till he is
satisfied with the output. We also display the possible weather conditions at the
locations in the output about a week prior to his start date. This provides the
user with the opportunity to decide if he would actually want to undertake the
trip or not.
We realize that any recommendation system is as good as its worst
recommendation. We propose to build a reasoning system which analyses the
relation between user interest and city categories. Such a system would be
highly beneficial to benchmark the results and provide users with better quality
recommendations.
Our contribution to the project has been largely restricted by the data set. We
had to manually generate the RDF data for our project. Our only motivation to
create a data set was to demonstrate that such an application can be created
and is scalable for multicity data.
Project Experience and Learning
Our learning from the project can be classified into two separate divisions:
Idea:
We found out that people plan vacations largely based on the recommendation
by their peers or relatives. There is no recommendation system in market which
is generic and also personalized in the sense that it allows users to be able visit
any location they chose to and still give priority to their interests. We strongly
believe that a system like ours will have audience as it targets a particular
market that has not been served earlier.
Technology:
We worked with a wide array of technologies. We gradually built our
understanding of the language by trying to write simple scripts to communicate
with various web services. We also extensively researched Resource Description
Framework (RDF) to the extent that we have created our own schema. One good
thing about RDF is that it can be easily extended. Hence our project can be
extended to serve a larger data as and when the data becomes available. We
were able to bring together information made available from various API’s to
become available for one common purpose. Another interesting technology that
we learnt during the course of the project was the SPARQL syntax. It is not only
necessary to have information in machine readable form but also to be able to
access it. We merged a variety of services to achieve one common goal.
Tools and Services Used
[A] RAPI: RDF API for PHP
Source: (http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/tests.html)
[B] Google Maps API for Display Module
Source: (http://code.google.com/apis/maps/)
[C]Weather Information
Source: (http://www.weather.gov/forecast/xml/)
Future Work
This system has vast scope of improvement in future not only in terms of range
of information it provides for each result in the result set but also the quality of
the result set. Future extensions would involve bringing in more dimensions to
the system. This would involve suggesting places to eat, places to stay for the
cities in the recommended paths. Ideas from [11] can be incorporated to extend
upon the core engine. The algorithm could support extra constraints specified by
the user (Example constraints: Some cities which she definitely wants to visit,
cost limitations, minimum time to spend in some specific holiday spots)
We can also looking at the possibility of using relevance feedback from the users
for particular location to improve quality of the itinerary generator algorithm.
More realistic factors could be incorporated into the algorithm dynamically by
adding modules like commutation information module and accommodation
information modules.
Another interesting extension would be to create a social network on top the
system. For example, if the user wishes to make his profile public then he
contact other people who are travelling with him at the same time, who have
same interest. This would be really helpful since user can become part of a
group in may come know other interesting places to visit in that region.
References:
[1] http://www.tourguidemike.com/
[2] www.dbpedia.org
[3] http://www.google.com/tripplanner
[4] http://www.mapquest.com
[5] http://www.tripit.com/
[6] "Crumpet: Creation of user-friendly mobile services personalised for
tourism", Stfan Poslad, Heimo Laamanen, Rainer Malaka, Achim Nick, Phil Buckle
and Alexander Zip
[7] Cyberguide: A Mobile context aware tour guide.
[8] http://www.travelok.com/
[9] http://wikitravel.org/en/Main_Page
[10] http://www.holidayandtravelguide.com/
[11] http://www.holidaytraveldestinations.com/
[12] http://www.lonelyplanet.com/
[13]http://www.w3c.org/RDF
[14] http://www.w3.org/TR/rdf-sparql-query/
Download