Data Insight Visualization for popular travel destinations and attractions - Developers Guide Overview Orbitz collects enormous amounts of data about travel habits of customers (anonymously). We want to expose this data in an accessible and digested way to Orbitz’s customers so they can better plan their upcoming trip. This includes displaying a map which displays a heat map depicting the amount of reservations made. The map can then be further filtered to show more specific about a certain area, certain hotel ratings and a certain date span. Furthermore there came a request from the industry advisor to add a feature which returns a hotel score, based on its location in relation to the reservations data to help in sorting search results. Development Environment and Tools: Java. JavaScript. GWT – Google Web Tools. GWT Google maps v3- Java Google maps API. SQL. Eclipse IDE. Environment Installation: Download and install the latest Eclipse. http://www.eclipse.org/downloads/ Install the GWT plugin for eclipse. https://developers.google.com/eclipse/docs/download Install a local server with SQL support, recommended LAMP (for Linux) or WAMP (for windows) - http://www.wampserver.com/en/ Import the included DB file into the SQL server (IE using PHPMyAdmin). Import the project into Eclipse. High Level Layer Design: Client side: Client side is responsible for displaying the map, heat map layer and other information for the user. Sending request to Server side: Two type of requests, explicit and implicit: o Explicit: requesting for specific date or ranks. o Implicit: changing the map location or zoom. Server side: "The Brains" - Server side is responsible for all heavy calculations for the heat map. o Communicating with Client. o Communicating with SQL server. o Calculating Location Score. Communication with Client side: o Receiving heat map requests from the Client side. o Sending new heat map points to Client side. o Receiving location from Client side and sending location score to Client side. Communication with SQL server: o Request heat map entries from SQL server. SQL server side: SQL server contains reservation data. Data Flow: Server Client SQL Client Get Filter data User Interface change Send SQLquery Send reseevation data entries Server Client Recieve reservation data, Send heat points data. Calculate score Client Recieve new heat map data Recieve location score Code Flow Example – Change Date: When the date changes (from or to) the map needed to be updated. Changing date on the date box will cause ValueChangeHandler to be executed, com.bss.client.DateWidget: ValueChangeHandler: Gets the new date from the date box. Updates the Filter. Calls getHeatMapEntriesByFilter from HeatMapWidget with the new Filter information. Implicitly updates the map using the new heat maps received using MapUpdateCallback Filter: This Class contains all the user map related data. HeatMapWidget: Heat map widget Class draw's the heat map entries on Google maps. com.bss.client.heatMapEntriesFromSQL: HeatMapEntriesByFilter: RPC – remote procedure call Implementation in HeatMapEntriesFromSQLImpl The function creates SqlHelper object to communicate with the SQL server With getLocs function, receives DataRow of entries. Create SharedHeatMapCollection to be sent back to Client side. SqlHelper: This class responsible for communications with the SQL server. DataRow: Row data class of the SQL queries. SharedHeatMapCollection: Container Class for all necessary heat map data com.bss.server.sql.SqlHelper: getLocs: Connecting to the SQL server using init function. Defining the SQL query using the Filter information. Sending the query to SQL server using sqlQuery function. Returns DataRow with the new heat point's information. com.bss.server.sql.SqlHelper: sqlQuery: Requesting for an open connection to the SQL server using connectionPool function. Sending the query to the SQL server with getQueryResults function Closes the connections and returns the row data. DatabaseUtilities.getQueryResults The actual query request from the SQL server. Returns DBResults object. DBResults Containing all the Raw heat point entries from SQL query. com.bss.client: MapUpdateCallback: This class implements AsyncCallback<SharedHeatMapCollection> which defines the RPC behavior when called. Calls resetWithNewEntries after successful SQL query that resets the map with new set of heat map entries defined by Filter using SharedHeatMapCollection received from the query. Score per Location A requirement that was introduced later in the development was to create a tool which will give indication as to the value of geographical points which do not yet have data in the database. The value will be calculated according the data already existing in the database. The chosen approach was to use a modified version of the Inverse Distance Weights (IDW) which is an interpolation according to the weights of adjacent known points and their distance to the inspected point. The original interpolation is described in Appendix A. The change made to the IDW is the removal of the normalization of the weights and limiting the weight, to avoid getting infinitely high readings in points very close to known location (In the database). There are two main parameters to be changed in this interpolation: 1. The data points to be used in the interpolation. 2. Power value p - The weights are proportional to the inverse of the distance (between the data point and the prediction location) raised to the power value p. The solution selected is using the data points from the filter that is determined by the current map zoom and location and UI selection boxes (Dates and ratings) and using a user adjustable value p. Further research can be carried out to determine the optimal distance of the points affecting the interpolated location from that location and to determine the optimal value of p. More information is available in appendix A. Problems encountered Program Performance The tool developed uses a great amount of data to make the calculation of the displayed data. We anticipated and later affirmed that the Google maps and Heat maps API were having trouble coping with those amounts of data (Displaying a map of the entire world takes hundreds of thousands locations). To cope with it we incorporated data filtering techniques based on the visible map, aggregation of close points and the use of smart SQL queries. This allowed us to drop from 60,000+ points of data to no more than 150 points sent to the Google API. Tools Google Maps & HeatMap: Google maps and the heat map API are written in Java Script. There exists several Java APIs, but they do not include support for the new heat map API. It was necessary to make adaptation to an existing Java API for it to support the HeatMap JS API. GWT Google Web Tools (GWT) is a development environment developed by Java which is intended for web developers. It includes a Java API for JS function which limits the use of Java in client code. This required the study of GWT and to make some adaptations to the client Java code to meet the requirements of GWT. Google App Engine In the search to find a free web host which enables server side Java code, we found out that the only free one is the GAE. While trying to work with it we spent a lot of time adapting the code and finding workarounds to limitations imposed by the GAE such as the restrictions to work with a special SQL server (Google cloud SQL). The attempt was abandoned and we decided to work on a local Tomcat server. Appendix A – Inverse Distance Weights Inverse distance weighted (IDW) interpolation explicitly implements the assumption that things that are close to one another are more alike than those that are farther apart. To predict a value for any unmeasured location, IDW uses the measured values surrounding the prediction location. The measured values closest to the prediction location have more influence on the predicted value than those farther away. IDW assumes that each measured point has a local influence that diminishes with distance. It gives greater weights to points closest to the prediction location, and the weights diminish as a function of distance, hence the name inverse distance weighted. The Power function As mentioned above, weights are proportional to the inverse of the distance (between the data point and the prediction location) raised to the power value p. As a result, as the distance increases, the weights decrease rapidly. The rate at which the weights decrease is dependent on the value of p. If p = 0, there is no decrease with distance, and because each weight λi is the same, the prediction will be the mean of all the data values in the search neighborhood. As p increases, the weights for distant points decrease rapidly. If the p value is very high, only the immediate surrounding points will influence the prediction. P = 2 is used as a default value, although there is no theoretical justification to prefer this value over others, and the effect of changing p should be investigated by previewing the output and examining the cross-validation statistics. An optimal power value can be determined by minimizing the root mean square prediction error (RMSPE). RMSPE quantifies the error of the prediction surface. The equation: A general form of finding an interpolated value u at a given point x based on samples 𝑢𝑖 = 𝑤𝑖 (𝑥)∙𝑢𝑖 𝑢(𝑥𝑖 ) for 𝑖 = 0,1, … , 𝑁 using IDW is an interpolating function: 𝑢(𝑥) = ∑𝑁 𝑖=0 ∑𝑁 𝑗=0 𝑤𝑗 (𝑥) 𝑤𝑖 (𝑥) = 1 𝑑(𝑥,𝑥𝑖 )𝑝 . The modified equation: 𝑁 𝑤𝑖 (𝑥) ∙ 𝑢𝑖 1 𝑖=0 𝑢(𝑥) = ∑ where 1 , 𝑑>1 𝑤𝑖 (𝑥) = {𝑑(𝑥, 𝑥𝑖 )𝑝 1, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Sources – http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00310000002m000000.h tm http://en.wikipedia.org/wiki/Inverse_distance_weighting