Developers Guide

advertisement
Data Insight Visualization for
popular travel destinations and
attractions - Developers Guide
Overview
Orbitz collects enormous amounts of data about travel habits of customers (anonymously).
We want to expose this data in an accessible and digested way to Orbitz’s customers so they
can better plan their upcoming trip.
This includes displaying a map which displays a heat map depicting the amount of
reservations made. The map can then be further filtered to show more specific about a
certain area, certain hotel ratings and a certain date span.
Furthermore there came a request from the industry advisor to add a feature which returns
a hotel score, based on its location in relation to the reservations data to help in sorting
search results.
Development Environment and Tools:






Java.
JavaScript.
GWT – Google Web Tools.
GWT Google maps v3- Java Google maps API.
SQL.
Eclipse IDE.
Environment Installation:





Download and install the latest Eclipse. http://www.eclipse.org/downloads/
Install the GWT plugin for eclipse.
https://developers.google.com/eclipse/docs/download
Install a local server with SQL support, recommended LAMP (for Linux) or WAMP
(for windows) - http://www.wampserver.com/en/
Import the included DB file into the SQL server (IE using PHPMyAdmin).
Import the project into Eclipse.
High Level Layer Design:
Client side:
 Client side is responsible for displaying the map, heat map layer and other
information for the user.

Sending request to Server side:
Two type of requests, explicit and implicit:
o Explicit: requesting for specific date or ranks.
o Implicit: changing the map location or zoom.
Server side:
 "The Brains" - Server side is responsible for all heavy calculations for the heat map.
o Communicating with Client.
o Communicating with SQL server.
o Calculating Location Score.
 Communication with Client side:
o Receiving heat map requests from the Client side.
o Sending new heat map points to Client side.
o Receiving location from Client side and sending location score to Client side.
 Communication with SQL server:
o Request heat map entries from SQL server.
SQL server side:
 SQL server contains reservation data.
Data Flow:
Server
Client
SQL
Client
Get Filter data
User Interface
change
Send SQLquery
Send
reseevation data
entries
Server
Client
Recieve
reservation data,
Send heat points
data.
Calculate score
Client
Recieve new
heat map data
Recieve location
score
Code Flow Example – Change Date:
When the date changes (from or to) the map needed to be updated.
Changing date on the date box will cause ValueChangeHandler to be executed,
com.bss.client.DateWidget: ValueChangeHandler:
Gets the new date from the date box.
Updates the Filter.
Calls getHeatMapEntriesByFilter from HeatMapWidget with the new
Filter information.
Implicitly updates the map using the new heat maps received using
MapUpdateCallback
Filter:
This Class contains all the user map related data.
HeatMapWidget:
Heat map widget Class draw's the heat map entries on Google maps.
com.bss.client.heatMapEntriesFromSQL: HeatMapEntriesByFilter:
RPC – remote procedure call
Implementation in HeatMapEntriesFromSQLImpl
The function creates SqlHelper object to communicate with the SQL server
With getLocs function, receives DataRow of entries.
Create SharedHeatMapCollection to be sent back to Client side.
SqlHelper:
This class responsible for communications with the SQL server.
DataRow:
Row data class of the SQL queries.
SharedHeatMapCollection:
Container Class for all necessary heat map data
com.bss.server.sql.SqlHelper: getLocs:
Connecting to the SQL server using init function.
Defining the SQL query using the Filter information.
Sending the query to SQL server using sqlQuery function.
Returns DataRow with the new heat point's information.
com.bss.server.sql.SqlHelper: sqlQuery:
Requesting for an open connection to the SQL server using
connectionPool function.
Sending the query to the SQL server with getQueryResults function
Closes the connections and returns the row data.
DatabaseUtilities.getQueryResults
The actual query request from the SQL server.
Returns DBResults object.
DBResults
Containing all the Raw heat point entries from SQL query.
com.bss.client: MapUpdateCallback:
This class implements AsyncCallback<SharedHeatMapCollection> which
defines the RPC behavior when called.
Calls resetWithNewEntries after successful SQL query that resets the
map with new set of heat map entries defined by Filter using
SharedHeatMapCollection received from the query.
Score per Location
A requirement that was introduced later in the development was to create a tool which will
give indication as to the value of geographical points which do not yet have data in the
database. The value will be calculated according the data already existing in the database.
The chosen approach was to use a modified version of the Inverse Distance Weights (IDW)
which is an interpolation according to the weights of adjacent known points and their
distance to the inspected point. The original interpolation is described in Appendix A.
The change made to the IDW is the removal of the normalization of the weights and limiting
the weight, to avoid getting infinitely high readings in points very close to known location (In
the database).
There are two main parameters to be changed in this interpolation:
1. The data points to be used in the interpolation.
2. Power value p - The weights are proportional to the inverse of the distance
(between the data point and the prediction location) raised to the power
value p.
The solution selected is using the data points from the filter that is determined by the
current map zoom and location and UI selection boxes (Dates and ratings) and using a user
adjustable value p.
Further research can be carried out to determine the optimal distance of the points affecting
the interpolated location from that location and to determine the optimal value of p. More
information is available in appendix A.
Problems encountered
Program
Performance
The tool developed uses a great amount of data to make the calculation of the displayed
data. We anticipated and later affirmed that the Google maps and Heat maps API were
having trouble coping with those amounts of data (Displaying a map of the entire world
takes hundreds of thousands locations). To cope with it we incorporated data filtering
techniques based on the visible map, aggregation of close points and the use of smart SQL
queries.
This allowed us to drop from 60,000+ points of data to no more than 150 points sent to the
Google API.
Tools
Google Maps & HeatMap:
Google maps and the heat map API are written in Java Script.
There exists several Java APIs, but they do not include support for the new heat map API.
It was necessary to make adaptation to an existing Java API for it to support the HeatMap JS
API.
GWT
Google Web Tools (GWT) is a development environment developed by Java which is
intended for web developers. It includes a Java API for JS function which limits the use of
Java in client code.
This required the study of GWT and to make some adaptations to the client Java code to
meet the requirements of GWT.
Google App Engine
In the search to find a free web host which enables server side Java code, we found out that
the only free one is the GAE.
While trying to work with it we spent a lot of time adapting the code and finding
workarounds to limitations imposed by the GAE such as the restrictions to work with a
special SQL server (Google cloud SQL). The attempt was abandoned and we decided to work
on a local Tomcat server.
Appendix A – Inverse Distance Weights
Inverse distance weighted (IDW) interpolation explicitly implements the assumption that
things that are close to one another are more alike than those that are farther apart. To
predict a value for any unmeasured location, IDW uses the measured values surrounding the
prediction location. The measured values closest to the prediction location have more
influence on the predicted value than those farther away. IDW assumes that each measured
point has a local influence that diminishes with distance. It gives greater weights to points
closest to the prediction location, and the weights diminish as a function of distance, hence
the name inverse distance weighted.
The Power function
As mentioned above, weights are proportional to the inverse of the distance (between the
data point and the prediction location) raised to the power value p. As a result, as the
distance increases, the weights decrease rapidly. The rate at which the weights decrease is
dependent on the value of p. If p = 0, there is no decrease with distance, and because each
weight λi is the same, the prediction will be the mean of all the data values in the search
neighborhood. As p increases, the weights for distant points decrease rapidly. If the p value
is very high, only the immediate surrounding points will influence the prediction.
P = 2 is used as a default value, although there is no theoretical justification to prefer this
value over others, and the effect of changing p should be investigated by previewing the
output and examining the cross-validation statistics.
An optimal power value can be determined by minimizing the root mean square prediction
error (RMSPE). RMSPE quantifies the error of the prediction surface.
The equation:
A general form of finding an interpolated value u at a given point x based on samples 𝑢𝑖 =
𝑤𝑖 (𝑥)∙𝑢𝑖
𝑢(𝑥𝑖 ) for 𝑖 = 0,1, … , 𝑁 using IDW is an interpolating function: 𝑢(𝑥) = ∑𝑁
𝑖=0 ∑𝑁
𝑗=0 𝑤𝑗 (𝑥)
𝑤𝑖 (𝑥) =
1
𝑑(𝑥,𝑥𝑖 )𝑝
.
The modified equation:
𝑁
𝑤𝑖 (𝑥) ∙ 𝑢𝑖
1
𝑖=0
𝑢(𝑥) = ∑
where
1
,
𝑑>1
𝑤𝑖 (𝑥) = {𝑑(𝑥, 𝑥𝑖 )𝑝
1,
𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Sources –
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00310000002m000000.h
tm
http://en.wikipedia.org/wiki/Inverse_distance_weighting
Download