Mapping Just because something can be represented geographically doesn’t mean it should. The relevant story may have nothing to do with geography. Maps have biases. Maps can be misleading. They may emphasize land area in a way that obscures population density, or show “geographic” patterns that merely demonstrate an underlying demographic pattern. Before you proceed, make sure a map is what you actually want. For a more detailed take on this question, read When Maps Shouldn’t Be Maps. What maps are made of Maps generally consist of geographic data (we’ll call this geodata for short) and a system for visually representing that data. Part 1: Geodata Latitude and Longitude Most geodata you encounter is based on latitude/longitude coordinates on Earth’s surface (mapping Mars is beyond the scope of this primer). Latitude ranges from -90 (the South Pole) to 90 (the North Pole), with 0 being the equator. Longitude ranges from -180 (halfway around the world going west from the prime meridian) to 180 (halfway around the world going east from the prime meridian), with 0 being the prime meridian. Yes, that means -180 and 180 are the same. If you are an old-timey sea captain, you may find or write latitude and longitude in degrees + minutes + seconds, like: 37°46'42"N, 122°23'22"W Computers are not old-timey sea captains, so it’s easier to give them decimals: 37.77833, -122.38944 A latitude/longitude number pair is often called a lat/lng or a lat/lon. We’ll call them lat/lngs. Want to quickly see where a lat/lng pair is on earth? Enter it into Google Maps, just like an address. * Sometimes mapping software wants you to give a lat/lng with the latitude first, sometimes it wants you to give it with the longitude first. Check the documentation for whatever you’re using (or, if you’re lazy like me, just try it both ways and then see which one is right). * Precision matters, so be careful with rounding lat/lngs. At the equator, one degree of longitude is about 69 miles! Map geometry Almost any geographic feature can be expressed as a sequence of lat/lng points. They are the atomic building blocks of a map. A location (e.g. a dot on a map) is a single lat/lng point: 37.77833,-122.38944 A straight line (e.g. a street on a map) is a pair of lat/lng points, one for the start and one for the end: 37.77833,-122.38944 to 34.07361,-118.24 A jagged line, sometimes called a polyline, is a list of straight lines in order, a.k.a. a list of pairs of lat/lng points: 37.77833,-122.38944 to 34.07361,-118.24 34.07361,-118.24 to 32.7073,-117.1566 32.7073,-117.1566 to 33.445,-112.067 A closed region (e.g. a country on a map) is just a special kind of jagged line that ends where it starts. These are typically called polygons: 37.77833,-122.38944 to 34.07361,-118.24 34.07361,-118.24 to 32.7073,-117.1566 32.7073,-117.1566 to 33.445,-112.067 33.445,-112.067 to 37.77833,-122.38944 The bottom line: almost any geodata you find, whether it represents every country in the world, a list of nearby post offices, or a set of driving directions, is ultimately a bunch of lists of lat/lngs. Map features Most common formats for geodata think in terms of features. A feature can be anything: a country, a city, a street, a traffic light, a house, a lake, or anything else that exists in a fixed physical location. A feature has geometry and properties. A feature’s geometry consists of any combination of geometric elements like the ones listed above. So geodata for the countries of the world consists of about 200 features.* Each feature consists of a list of points to draw a jagged line step-by-step around the perimeter of the country back to the starting point, also known as a polygon. But wait, not every country is a single shape, you say! What about islands? No problem. Just add additional polygons for every unconnected landmass. By combining relatively simple geometric elements in complex ways, you can represent just about anything. Let’s say you have the Hawaiian islands, each of which is represented as a polygon. Should that be seven features or one?* It depends on what kind of map we’re making. If we are analyzing something by state, we only care about the islands as a group and they’ll all be styled the same in the end. They should probably be a single feature with seven pieces of geometry. If, on the other hand, we are doing a map of Hawaiian wildlife by island, we need them to be seven separate features. There is also something called a “feature collection,” where you can loosely group multiple features for certain purposes, but let’s not worry about that for now. A feature’s properties are everything else that matter for your map. For the countries of the world, you probably want their names, but you may also want things like birth rate, population, largest export, or whatever else is going to be involved in your map. * One of the lessons you will learn when you start making maps is that questions that you thought had simple answers – like “What counts as a country?” and “How many Hawaiian islands are there?” – get a little complicated. Geodata formats So we’ve learned that geodata is a list of features, and each feature is a list of geometric pieces, and each geometric piece is a list of lat/lngs, so the whole thing looks something like this: Feature #1: geometry: polygon #1: [list of lat/lngs] polygon #2: [list of lat/lngs] (for Easter Island) ... properties: name: Chile capital: Santiago ... Feature #2: geometry: polygon #1: [list of lat/lngs] polygon #2: [list of lat/lngs] ... properties: name: Argentina capital: Buenos Aires ... So we just need a big list of lat/lng points and then we can all go home, right? Of course not. In the real world, this data needs to come in some sort of consistent format a computer likes. Ideally it will also be a format a human can read, but let’s not get greedy. Now that you know that geodata is structured like this, you will see that most common formats are very similar under the hood. Four big ones that you will probably come across are: Shapefiles This is the most common format for detailed map data. A “shapefile” is actually a set of files: .shp — The geometry for all the features. .shx — A helper file that stores what order the shapes should be in. .dbf — stores the properties of each feature in a spreadsheet-like format. ● Other optional files storing things like a project description and styling (only the above three files are required). If you open a shapefile in a text editor, it will look like gibberish, but it will play really nicely with desktop mapping software, also called GIS software or geospatial software. Shapefiles are great for doing lots of detailed manipulation and inspection of geodata. By themselves, they are pretty lousy for making web maps, but fortunately it’s usually easy to convert them into a different format. ● ● ● GeoJSON A specific flavor of JSON that is great for web mapping. It’s also fairly human readable if you open it in a text editor. Let’s use the state of Colorado as an example, because it’s nice and rectangular. { "type": "Feature", "geometry": { "type": "Polygon", "coordinates": [ [ [-102.04,36.99], [-102.04,40.99], [-109.05,40.99], [-109.05,36.99], [-102.04,36.99] ] ] }, "properties": { "name": “Colorado" “capital”: “Denver” } } This means: Draw a polygon by starting from the first point ([-102.04,36.99]), drawing a line to the next point ([-102.04,40.99]), and repeating until the end of the list. Notice that the last point is the same as the first point, closing the loop – most software doesn’t require this extra point and will close the loop for you. KML A specific flavor of XML that is heavily favored by Google Maps, Google Earth, and Google Fusion Tables. The basic components behave very similarly to GeoJSON, but are contained in XML tags instead of curly braces. KML supports lots of extra bells and whistles like camera positioning and altitude for making movies in Google Earth. It plugs really nicely into Google products, but generally needs to be converted to something else in order to make other web maps. So what does Colorado look like in KML? <Polygon id="Colorado"> <altitudeMode>clampToGround</altitudeMode> <outerBoundaryIs> <LinearRing> <coordinates> -102.04,36.99 -102.04,40.99 -109.05,40.99 -109.05,36.99 -102.04,36.99 </coordinates> </LinearRing> </outerBoundaryIs> </Polygon> The XML tags can be very confusing, but note that the meat of this data is quite similar to the GeoJSON example. Both of them are just a list of points in order, with a lot of scary braces and brackets as window dressing. TopoJSON The new hotness. TopoJSON takes in a basic geodata format, like GeoJSON, and spits out a clever reduction of it by focusing on the part of a map we usually care about: borders and connections (a.k.a. topology). The details are beyond the scope of this primer, but you can read about the TopoJSON magic here: https://github.com/mbostock/topojson/wiki Remember: different software accepts different file formats for geodata, but at the end of the day everyone speaks lat/lng. Different file formats are just dialects of the mother tongue. - See more at: http://schoolofdata.org/2013/11/09/web-mapping/#sthash.LG35fiIm.dpuf Part 2: Building an Election Map: In this guide, we will be building a map for the Egyptian Election 2012. Before building a data visualization, the first question you need to ask yourself is the goal of this visualization. In other words, what should the user get from it? Your goal need to satisfy the need of the graphic’s reader and knowing your readers is a key. To illustrate, if you want to show comparisons, you might use a bar chart. If you want to show correlations, then scatter plot would be a great fit. And if you want to present and organize data then a map or histogram might accomplish your goal. We want to visualize Egyptian Election data and show the states which were won by the candidate Morsi and the states which were won by the candidate Shafik. Then we need to compare the percentage of votes between the two candidates in each state. To fulfill, these two goals, we need two visualizations which are a map and a bar chart. Instead of presenting two separate visualizations, we can build a map and once we hover over a state, a bar chart will pop-up to compare the percentage of votes. In this guide, we will use thematic maps. According to Alberto Cairo, Thematic maps are probably the purest and most successful form of information graphics. They are a traditional geographical map (the base like Google Maps or Open Street Map) with data overlays above it. These overlays should be designed so the reader would not have any difficulty differentiating them. Thematic maps focus on specific trend and theme in order to tell a story with data. The goal of a traditional map is locating events while the goal of a thematic map is to locate specific trends by mashing-up datasets and merging to a geographic location. Thematic maps have different shape types. I will discuss three types which are: ● Choropleth Maps ● Dot Maps ● Proportional Symbol Maps This is an example of choropleth map developed by the Guardian to show the percentage of poverty in US states. The name of this technique is derived from the Greek words choros - space, and pleth - value. As you can see choropleth maps are based on polygons which are bunch of lists of lat/lngs. These polygons are then turned into a shape file format ready for visualization. Choropleth maps are well suited for showing percentages, rations or any kind of derived data but they are not suited for visualizing raw amounts. They can present a distribution effectively because they use colored polygon areas themselves as a symbol. For example, they are good at displaying the percentage of crime rates in Cairo’s governorates per thousand person and not total number of crime in Cairo’s governorates. This is an example of a dot map developed by the New York Times. It represents specific grades given by the health department of NYC to the restaurants there. We can conclude that dot maps are best suited for showing a discrete phenomenon where every dot represents a single unit. So, a restaurant might be located in a dot in one place and nowhere else. Dot maps are not suited for raw and continuous data. The 3rd type of maps I am covering here is the proportional symbol map. The map below which was developed by the New York Times represents the amount of votes for Romney and Obama in US states. Each circle represents a specific amount of votes. Thus, each circle can be scaled up or down depending on the amount of a variable in each place. This make it the best map for showing raw data. From the explanation above, you should have concluded that the best way to visualize our story is by using a choropleth map because we will be showing the distribution of votes for the two candidates. Thus, colored polygons can show where each candidate win. If this sounds fun, let’s start building our data visualization! Our first step is to search for a dataset that contains the election results. This School of Data blog post contains a useful spreadsheet for the data we need. You can make your own copy of the dataset to start working on it. As you can see, the data table is not cleaned and not yet ready for analysis and visualization. For example, we need to clean the first column and present the governorates names in English. Thus, we need to remove the Arabic version and the symbol “/”. Ok let’s do it manually. Oh no! That’s a big waste of our time. There are a lot of functions in spreadsheets that would save us a lot of time in refining this. First, we’ll need to insert two columns to the right of our target column in order to write the functions. Set the cell equal to the value of the split() function where the first parameter is the text or a reference to the cell whose contents you wish to split, and the second parameter is the delimiter, i.e. the character upon which you will split the contents of the cell. Now, we will split the contents of cell A4 at each instance of a “/” symbol, so you would type in the following formula: =SPLIT(A4:A31,"/") Notice that the cell is split into a new cell for every occurrence of the delimiter. So, in the above example, since there are “/” character in A4, the contents will be split into 2 cells (B4 and C4). Now make sure you click on the very right button of each cell to apply the formula for the whole column as shown in the figure below. We have in our table the number of votes given for both candidates and total number of votes for each governorate. For the sake of goal we set in the beginning, we need to calculate the percentage of votes which is easy to calculate. You just need to divide the number of votes by the total number of votes and multiply by 100. Next, we need to specify and id number for each governorate for the sake of joining data in later stages. We will also need to specify a new column for the winner in each governorate. In order to do this, we will sue the IF function: =If(H2>I2,"1","2") In other words, we are saying here that for each governorate, if Morsi got the highest votes then type number 1 and if Shafik got the highest votes then type number 2. We are using numbers in order to make it easy for the mapping tool to visualize the results. We will cover this more when we reach the mapping part. Your final cleaned table should like this: Notice here that I included two additional columns containing the longitude and latitudes for governorates. This can be useful if you are developing a proportional symbol map. There are different ways to get these values. One of them is simply using Google Spreadsheet! It will automatically get the values for you. You can find a tutorial on how to do by clicking here. You also get them manually by using GeoNames. For the choropleth map, we need a set of polygons which represents the borders Egyptian states. As we said, these are represented in a shapefile data format. Shapefiles can be drawn using tools like Google Earth or GIS tool like QGIS. However, if you are not a GIS specialist, you want a ready shapefile. You can search the web and find dozens of free shapefiles. However, you may need to buy a shapefile which is available from different geographic software providers especially if you need specific level of detail or request or if the free shapefiles you found or outdated (i.e. new state names are added to a country last year). Luckily enough, I have a shapefile for Egyptian boundaries here: http://www.divagis.org/gdata After you download the shapefile, let’s go and open it via a GIS software. For this guide, we will use Quantum GIS which is a great open source tool. After you have done the QGIS editings,,,Congratulations! Now, you have a ready shapefile. Now, it is the time to design our interactive election map! For this purpose, we will use a design studio which is Tilemill from MapBox. Import you Data: To import data into TileMill as a CSV file you need column headings on the first row. The CSV must also contain columns with latitude and longitude geographic coordinates. However, we need to design choropleth map, that’s why, we need to import the shapefile we recently created in QGIS. Start TileMill and click on the “New project” button on the main screen. Enter a “Filename” for your project and click “Add”. You can leave the other fields alone for now. Click on the new project to open it. The project contains a default layer called #countries styled with some example CartoCSS code. To add a shapefile layer, first click the “Layers” button located on the bottom left to bring up the Layers panel. Now click “Add layer”. Enter the shapefile as the datasource. Click the “Save & Style” button. This will add the layer to your project and insert a default CartoCSS rule for the layer. Now, you should see your shapefile in the map. Styling our map: The code you see in right pane is called CartoCSS. TileMill uses a language called CartoCSS to determine the look of a map. Colors, sizes, and shapes can all be manipulated by applying their relative CartoCS parameters in the stylesheet panel to the right of the map. It’s very similar to the CSS language which is well known for web design. Read the CartoCSS manual for a more detailed introduction to the language. In the previous steps on Importing data, we added a shapefile using the “Save & Style” button. This button automatically added several styling parameters to your stylesheet and gave them an initial value. #electionegypt This is the layer to which the styles are applied. Polygon-fill This is the color of the inside of the polygon. There are two methods for changing color values. You can either type in a new value, or you can use the color swatches at the bottom of the CartoCSS panel. Try changing the marker color by clicking the light red swatch and selecting a new color. Click “Save” in the color picker to see your changes. Notice the corresponding color value is updated in the CartoCSS. Line-width: The white lines represent the borders in the figure above represents the linewidth and you can change this to make it thinner or thicker accordingly. Now, we have a beautiful red map but still we didn’t serve our goal of presenting the distribution of votes for the two candidates. I order to so, we need to do conditional styles. Conditional CartoCSS styles allow you to change the appearance of the polygons on your map based on attributes in the data. Review the available data for the layer in the feature inspector. Find the column called EgyptEle_8 and examine the range of values. This will help you decide how to scale the points. As you can see, this is the Winner column in our data table. QGIS renamed the column when we joined the tables. Add the following CartoCSS rule to the bottom of your stylesheet. In plain English, the CartoCSS rule we added says that whenever we have 1 in the Winner column “EgyptEle_8”, set the polygon color to scale3. We already defined scale3 in the code above. One important note when it comes to coloring a map is that the designer of a graphic should achieve unity in colors. If you examine the colors used in the map below, you will discover that it is cluttered with too many different colors. The designer could have reached better unity in this graphic if he used a monochromatic color scheme, thus, he can start with range 0 by a neutral color and graduate into an accent color in range 1000. I would advise you to use ColorBrewer which can help you set a color gradient effectively. If you applied the CartoCSS code below and clicked on save, you will get this map: This map shows that the governorates where Morsi got the highest number of votes are colored in Blue while the governorates where Shafik got the highest number of votes are colored in red. Nevertheless, how can the reader of this map know which color refers to what? That’s why we need to design a legend. A legend is permanently on a map and is useful for displaying titles, descriptions, and keys for what is being mapped. It can be styled using HTML, or it can simply contain an image. For a legend, try not to use more than six classes unless it is necessary. The average human eye finds it difficult to distinguish more than six shades of a hue. Let’s add a legend that describes the theme of the map. Open the Templates panel by clicking on the pointer button in the bottom left. The Legend tab is open by default. Enter your legend text/html in the Legend field. TileMill already have ready HTML code for awesome legends. You can check them here. We will use this example: Copy the code into your legend menu and make necessary changes in colors. We are specifying here the same two colors in our map to identify the votes for Morsi and Shafik. Click Save and voila! Your map should now look like this: Our goal in the very beginning of this guide is to build a map and once we hover over a state, a bar chart will pop-up to compare the percentage of votes. In order to build this bar chart, we’ll need tooltips. Tooltips allow you to make maps interactive with dynamic content that appears when a user hovers over or clicks on a map. They can contain HTML and are useful for revealing additional data, images, and other content. Open the Templates panel by clicking on the pointer button on the bottom left. Click on the “Teaser” tab. Teaser content appears when you hover over a feature and Full content appears when you click on a feature. You can use the Location field to define a URL to be loaded when a feature is clicked. Select the “electionegypt” layer to use it for interaction. TileMill only supports one interactive layer at a time. The data fields for the layer are displayed wrapped in curly Mustache tags. The tags represents the columns in our data table. These tags will be replaced by data when you interact with the map. Locate the fields you want to use. Write your template using the Mustache tags. Paste the following code into the Teaser field and use the preview to make sure it looks good: Percentage of votes in<br/>{{{EgyptElect}}} for Morsi is {{{EgyptEle_6}}} and Shafik {{{EgyptEle_7}}} Click “Save” to save your settings and refresh the map. Close the panel by clicking the close button (X) or by pressing the ESC key. Move your mouse over some points to see the tooltips. Cool ha? We built a great teaser, however, I promised you with a bar chart. So how do we implement this? We’ll use Google Charts API. Google’s Chart API is a unique tool that allows you to embed dynamic charts and graphs in the interactive space of TileMill. Little to no programming experience is needed to adjust an existing chart in their gallery or build one from scratch in their interactive chart playground. I will not go into details with Google Chart API. You can follow this tutorial for this purpose. You can copy the code into the Teaser: <img src="http://chart.apis.google.com/chart? &chxt=x,y&chxl=0:| &chxr=0,1 &chxs=0,BD0026,10,0,l,BD0026|1,BD0026,10,0,l,BD0026|2,BD0026,10,0,l,BD0026 &chls=3,1,0 &chbh=a,4,4 &chg=14.3,9,1,1 &chs=250x250 &cht=bvg &chco=24416f,ad232d &chdl=Votes for Morsi|Votes for Shafik &chm=b,24416f,0,0,0|b,ad232d,0,0,0 &chd=t:{{{EgyptEle_6}}}|{{{EgyptEle_7}}} &chtt=++++++++++Percentage of Votes in {{{EgyptElect}}}+++ &chts=BD0026,10.5 &width="150" height="250" alt="" /> When you hover over the states, you will get the barchart showing the percentage of votes for that specific governorate. Congratulations! You successfully built an awesome Election Map! Now that you’ve designed a cool map, you can upload it by clicking “Export.” The easiest way to share your map is via the free MapBox service, the company behind TileMill. You can upload your maps there and then you’ll be able to embed it in any site the same way you embed a YouTube video. You can find more about exporting in this tutorial. Finally you can upload the map to the web via MapBox and enjoy playing with it. Here’s mine: http://a.tiles.mapbox.com/v3/datascientist.cc9v0a4i/page.html