Mapping -Aub workshop

advertisement
Mapping
Just because something can be represented geographically doesn’t mean it should. The
relevant story may have nothing to do with geography. Maps have biases. Maps can be
misleading. They may emphasize land area in a way that obscures population density,
or show “geographic” patterns that merely demonstrate an underlying demographic
pattern. Before you proceed, make sure a map is what you actually want.
For a more detailed take on this question, read When Maps Shouldn’t Be Maps.
What maps are made of
Maps generally consist of geographic data (we’ll call this geodata for short) and a
system for visually representing that data.
Part 1: Geodata
Latitude and Longitude
Most geodata you encounter is based on latitude/longitude coordinates on Earth’s
surface (mapping Mars is beyond the scope of this primer).
Latitude ranges from -90 (the South Pole) to 90 (the North Pole), with 0 being the
equator.
Longitude ranges from -180 (halfway around the world going west from the prime
meridian) to 180 (halfway around the world going east from the prime meridian), with 0
being the prime meridian. Yes, that means -180 and 180 are the same.
If you are an old-timey sea captain, you may find or write latitude and longitude in
degrees + minutes + seconds, like:
37°46'42"N, 122°23'22"W
Computers are not old-timey sea captains, so it’s easier to give them decimals:
37.77833, -122.38944
A latitude/longitude number pair is often called a lat/lng or a lat/lon. We’ll call them
lat/lngs.
Want to quickly see where a lat/lng pair is on earth? Enter it into Google Maps, just like
an address.
* Sometimes mapping software wants you to give a lat/lng with the latitude first,
sometimes it wants you to give it with the longitude first. Check the documentation for
whatever you’re using (or, if you’re lazy like me, just try it both ways and then see which
one is right).
* Precision matters, so be careful with rounding lat/lngs. At the equator, one degree of
longitude is about 69 miles!
Map geometry
Almost any geographic feature can be expressed as a sequence of lat/lng points. They
are the atomic building blocks of a map.
A location (e.g. a dot on a map) is a single lat/lng point:
37.77833,-122.38944
A straight line (e.g. a street on a map) is a pair of lat/lng points, one for the start and one
for the end:
37.77833,-122.38944 to 34.07361,-118.24
A jagged line, sometimes called a polyline, is a list of straight lines in order, a.k.a. a list
of pairs of lat/lng points:
37.77833,-122.38944 to 34.07361,-118.24
34.07361,-118.24 to 32.7073,-117.1566
32.7073,-117.1566 to 33.445,-112.067
A closed region (e.g. a country on a map) is just a special kind of jagged line that ends
where it starts. These are typically called polygons:
37.77833,-122.38944 to 34.07361,-118.24
34.07361,-118.24 to 32.7073,-117.1566
32.7073,-117.1566 to 33.445,-112.067
33.445,-112.067 to 37.77833,-122.38944
The bottom line: almost any geodata you find, whether it represents every country in the
world, a list of nearby post offices, or a set of driving directions, is ultimately a bunch of
lists of lat/lngs.
Map features
Most common formats for geodata think in terms of features. A feature can be anything:
a country, a city, a street, a traffic light, a house, a lake, or anything else that exists in a
fixed physical location. A feature has geometry and properties.
A feature’s geometry consists of any combination of geometric elements like the ones
listed above. So geodata for the countries of the world consists of about 200 features.*
Each feature consists of a list of points to draw a jagged line step-by-step around the
perimeter of the country back to the starting point, also known as a polygon. But wait,
not every country is a single shape, you say! What about islands? No problem. Just add
additional polygons for every unconnected landmass. By combining relatively simple
geometric elements in complex ways, you can represent just about anything.
Let’s say you have the Hawaiian islands, each of which is represented as a polygon.
Should that be seven features or one?* It depends on what kind of map we’re making. If
we are analyzing something by state, we only care about the islands as a group and
they’ll all be styled the same in the end. They should probably be a single feature with
seven pieces of geometry. If, on the other hand, we are doing a map of Hawaiian
wildlife by island, we need them to be seven separate features. There is also something
called a “feature collection,” where you can loosely group multiple features for certain
purposes, but let’s not worry about that for now.
A feature’s properties are everything else that matter for your map. For the countries of
the world, you probably want their names, but you may also want things like birth rate,
population, largest export, or whatever else is going to be involved in your map.
* One of the lessons you will learn when you start making maps is that questions that
you thought had simple answers – like “What counts as a country?” and “How many
Hawaiian islands are there?” – get a little complicated.
Geodata formats
So we’ve learned that geodata is a list of features, and each feature is a list of
geometric pieces, and each geometric piece is a list of lat/lngs, so the whole thing looks
something like this:
Feature #1:
geometry:
polygon #1: [list of lat/lngs]
polygon #2: [list of lat/lngs] (for Easter Island)
...
properties:
name: Chile
capital: Santiago
...
Feature #2:
geometry:
polygon #1: [list of lat/lngs]
polygon #2: [list of lat/lngs]
...
properties:
name: Argentina
capital: Buenos Aires
...
So we just need a big list of lat/lng points and then we can all go home, right? Of course
not. In the real world, this data needs to come in some sort of consistent format a
computer likes. Ideally it will also be a format a human can read, but let’s not get
greedy.
Now that you know that geodata is structured like this, you will see that most common
formats are very similar under the hood. Four big ones that you will probably come
across are:
Shapefiles
This is the most common format for detailed map data. A “shapefile” is actually a set of
files:
.shp — The geometry for all the features.
.shx — A helper file that stores what order the shapes should be in.
.dbf — stores the properties of each feature in a spreadsheet-like format.
● Other optional files storing things like a project description and styling (only the
above three files are required).
If you open a shapefile in a text editor, it will look like gibberish, but it will play really
nicely with desktop mapping software, also called GIS software or geospatial software.
Shapefiles are great for doing lots of detailed manipulation and inspection of geodata.
By themselves, they are pretty lousy for making web maps, but fortunately it’s usually
easy to convert them into a different format.
●
●
●
GeoJSON
A specific flavor of JSON that is great for web mapping. It’s also fairly human readable if
you open it in a text editor. Let’s use the state of Colorado as an example, because it’s
nice and rectangular.
{
"type": "Feature",
"geometry": {
"type": "Polygon",
"coordinates": [
[
[-102.04,36.99],
[-102.04,40.99],
[-109.05,40.99],
[-109.05,36.99],
[-102.04,36.99]
]
]
},
"properties": {
"name": “Colorado"
“capital”: “Denver”
}
}
This means: Draw a polygon by starting from the first point ([-102.04,36.99]), drawing a
line to the next point ([-102.04,40.99]), and repeating until the end of the list.
Notice that the last point is the same as the first point, closing the loop – most software
doesn’t require this extra point and will close the loop for you.
KML
A specific flavor of XML that is heavily favored by Google Maps, Google Earth, and
Google Fusion Tables. The basic components behave very similarly to GeoJSON, but
are contained in XML tags instead of curly braces. KML supports lots of extra bells and
whistles like camera positioning and altitude for making movies in Google Earth. It plugs
really nicely into Google products, but generally needs to be converted to something
else in order to make other web maps. So what does Colorado look like in KML?
<Polygon id="Colorado">
<altitudeMode>clampToGround</altitudeMode>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-102.04,36.99
-102.04,40.99
-109.05,40.99
-109.05,36.99
-102.04,36.99
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
The XML tags can be very confusing, but note that the meat of this data is quite similar
to the GeoJSON example. Both of them are just a list of points in order, with a lot of
scary braces and brackets as window dressing.
TopoJSON
The new hotness. TopoJSON takes in a basic geodata format, like GeoJSON, and spits
out a clever reduction of it by focusing on the part of a map we usually care about:
borders and connections (a.k.a. topology). The details are beyond the scope of this
primer, but you can read about the TopoJSON magic here:
https://github.com/mbostock/topojson/wiki
Remember: different software accepts different file formats for geodata, but at the end
of the day everyone speaks lat/lng. Different file formats are just dialects of the mother
tongue.
- See more at: http://schoolofdata.org/2013/11/09/web-mapping/#sthash.LG35fiIm.dpuf
Part 2: Building an Election Map:
In this guide, we will be building a map for the Egyptian Election 2012.
Before building a data visualization, the first question you need to ask yourself is the
goal of this visualization. In other words, what should the user get from it? Your goal
need to satisfy the need of the graphic’s reader and knowing your readers is a key. To
illustrate, if you want to show comparisons, you might use a bar chart. If you want to
show correlations, then scatter plot would be a great fit. And if you want to present and
organize data then a map or histogram might accomplish your goal.
We want to visualize Egyptian Election data and show the states which were won by the
candidate Morsi and the states which were won by the candidate Shafik. Then we need
to compare the percentage of votes between the two candidates in each state. To fulfill,
these two goals, we need two visualizations which are a map and a bar chart. Instead of
presenting two separate visualizations, we can build a map and once we hover over a
state, a bar chart will pop-up to compare the percentage of votes.
In this guide, we will use thematic maps. According to Alberto Cairo, Thematic maps are
probably the purest and most successful form of information graphics. They are a
traditional geographical map (the base like Google Maps or Open Street Map) with data
overlays above it. These overlays should be designed so the reader would not have any
difficulty differentiating them. Thematic maps focus on specific trend and theme in order
to tell a story with data.
The goal of a traditional map is locating events while the goal of a thematic map is to
locate specific trends by mashing-up datasets and merging to a geographic location.
Thematic maps have different shape types. I will discuss three types which are:
● Choropleth Maps
● Dot Maps
● Proportional Symbol Maps
This is an example of choropleth map developed by the Guardian to show the
percentage of poverty in US states. The name of this technique is derived from the
Greek words choros - space, and pleth - value. As you can see choropleth maps are
based on polygons which are bunch of lists of lat/lngs. These polygons are then turned
into a shape file format ready for visualization. Choropleth maps are well suited for
showing percentages, rations or any kind of derived data but they are not suited for
visualizing raw amounts. They can present a distribution effectively because they use
colored polygon areas themselves as a symbol. For example, they are good at
displaying the percentage of crime rates in Cairo’s governorates per thousand person
and not total number of crime in Cairo’s governorates.
This is an example of a dot map developed by the New York Times. It represents
specific grades given by the health department of NYC to the restaurants there. We can
conclude that dot maps are best suited for showing a discrete phenomenon where
every dot represents a single unit. So, a restaurant might be located in a dot in one
place and nowhere else. Dot maps are not suited for raw and continuous data.
The 3rd type of maps I am covering here is the proportional symbol map. The map
below which was developed by the New York Times represents the amount of votes for
Romney and Obama in US states. Each circle represents a specific amount of votes.
Thus, each circle can be scaled up or down depending on the amount of a variable in
each place. This make it the best map for showing raw data.
From the explanation above, you should have concluded that the best way to visualize
our story is by using a choropleth map because we will be showing the distribution of
votes for the two candidates. Thus, colored polygons can show where each candidate
win.
If this sounds fun, let’s start building our data visualization!
Our first step is to search for a dataset that contains the election results. This School of
Data blog post contains a useful spreadsheet for the data we need. You can make your
own copy of the dataset to start working on it.
As you can see, the data table is not cleaned and not yet ready for analysis and
visualization. For example, we need to clean the first column and present the
governorates names in English. Thus, we need to remove the Arabic version and the
symbol “/”. Ok let’s do it manually. Oh no! That’s a big waste of our time. There are a lot
of functions in spreadsheets that would save us a lot of time in refining this.
First, we’ll need to insert two columns to the right of our target column in order to write
the functions.
Set the cell equal to the value of the split() function where the first parameter is the text
or a reference to the cell whose contents you wish to split, and the second parameter is
the delimiter, i.e. the character upon which you will split the contents of the cell.
Now, we will split the contents of cell A4 at each instance of a “/” symbol, so you would
type in the following formula:
=SPLIT(A4:A31,"/")
Notice that the cell is split into a new cell for every occurrence of the delimiter. So, in the
above example, since there are “/” character in A4, the contents will be split into 2 cells
(B4 and C4). Now make sure you click on the very right button of each cell to apply the
formula for the whole column as shown in the figure below.
We have in our table the number of votes given for both candidates and total number of
votes for each governorate. For the sake of goal we set in the beginning, we need to
calculate the percentage of votes which is easy to calculate. You just need to divide the
number of votes by the total number of votes and multiply by 100. Next, we need to
specify and id number for each governorate for the sake of joining data in later stages.
We will also need to specify a new column for the winner in each governorate. In order
to do this, we will sue the IF function:
=If(H2>I2,"1","2")
In other words, we are saying here that for each governorate, if Morsi got the highest
votes then type number 1 and if Shafik got the highest votes then type number 2. We
are using numbers in order to make it easy for the mapping tool to visualize the results.
We will cover this more when we reach the mapping part.
Your final cleaned table should like this:
Notice here that I included two additional columns containing the longitude and latitudes
for governorates. This can be useful if you are developing a proportional symbol map.
There are different ways to get these values. One of them is simply using Google
Spreadsheet! It will automatically get the values for you. You can find a tutorial on how
to do by clicking here. You also get them manually by using GeoNames.
For the choropleth map, we need a set of polygons which represents the borders
Egyptian states. As we said, these are represented in a shapefile data format.
Shapefiles can be drawn using tools like Google Earth or GIS tool like QGIS. However,
if you are not a GIS specialist, you want a ready shapefile. You can search the web and
find dozens of free shapefiles. However, you may need to buy a shapefile which is
available from different geographic software providers especially if you need specific
level of detail or request or if the free shapefiles you found or outdated (i.e. new state
names are added to a country last year).
Luckily enough, I have a shapefile for Egyptian boundaries here: http://www.divagis.org/gdata
After you download the shapefile, let’s go and open it via a GIS software. For this guide,
we will use Quantum GIS which is a great open source tool.
After you have done the QGIS editings,,,Congratulations! Now, you have a ready
shapefile.
Now, it is the time to design our interactive election map! For this purpose, we will use a
design studio which is Tilemill from MapBox.
Import you Data:
To import data into TileMill as a CSV file you need column headings on the first row.
The CSV must also contain columns with latitude and longitude geographic coordinates.
However, we need to design choropleth map, that’s why, we need to import the
shapefile we recently created in QGIS.
Start TileMill and click on the “New project” button on the main screen.
Enter a “Filename” for your project and click “Add”. You can leave the other fields alone
for now.
Click on the new project to open it. The project contains a default layer called #countries
styled with some example CartoCSS code.
To add a shapefile layer, first click the “Layers” button located on the bottom left to bring
up the Layers panel.
Now click “Add layer”.
Enter the shapefile as the datasource.
Click the “Save & Style” button. This will add the layer to your project and insert a
default CartoCSS rule for the layer.
Now, you should see your shapefile in the map.
Styling our map:
The code you see in right pane is called CartoCSS. TileMill uses a language called
CartoCSS to determine the look of a map. Colors, sizes, and shapes can all be
manipulated by applying their relative CartoCS parameters in the stylesheet panel to the
right of the map. It’s very similar to the CSS language which is well known for web
design. Read the CartoCSS manual for a more detailed introduction to the language.
In the previous steps on Importing data, we added a shapefile using the “Save & Style”
button. This button automatically added several styling parameters to your stylesheet
and gave them an initial value.
#electionegypt
This is the layer to which the styles are applied.
Polygon-fill
This is the color of the inside of the polygon. There are two methods for changing color
values. You can either type in a new value, or you can use the color swatches at the
bottom of the CartoCSS panel. Try changing the marker color by clicking the light red
swatch and selecting a new color.
Click “Save” in the color picker to see your changes. Notice the corresponding color value is
updated in the CartoCSS.
Line-width: The white lines represent the borders in the figure above represents the linewidth and you can change this to make it thinner or thicker accordingly.
Now, we have a beautiful red map but still we didn’t serve our goal of presenting the
distribution of votes for the two candidates.
I order to so, we need to do conditional styles.
Conditional CartoCSS styles allow you to change the appearance of the polygons on
your map based on attributes in the data.
Review the available data for the layer in the feature inspector.
Find the column called EgyptEle_8 and examine the range of values. This will help you
decide how to scale the points. As you can see, this is the Winner column in our data
table. QGIS renamed the column when we joined the tables.
Add the following CartoCSS rule to the bottom of your stylesheet. In plain English, the
CartoCSS rule we added says that whenever we have 1 in the Winner column
“EgyptEle_8”, set the polygon color to scale3. We already defined scale3 in the code
above.
One important note when it comes to coloring a map is that the designer of a graphic
should achieve unity in colors. If you examine the colors used in the map below, you will
discover that it is cluttered with too many different colors. The designer could have
reached better unity in this graphic if he used a monochromatic color scheme, thus, he
can start with range 0 by a neutral color and graduate into an accent color in range
1000.
I would advise you to use ColorBrewer which can help you set a color gradient
effectively.
If you applied the CartoCSS code below and clicked on save, you will get this map:
This map shows that the governorates where Morsi got the highest number of votes are
colored in Blue while the governorates where Shafik got the highest number of votes
are colored in red.
Nevertheless, how can the reader of this map know which color refers to what? That’s
why we need to design a legend. A legend is permanently on a map and is useful for
displaying titles, descriptions, and keys for what is being mapped. It can be styled using
HTML, or it can simply contain an image. For a legend, try not to use more than six
classes unless it is necessary. The average human eye finds it difficult to distinguish
more than six shades of a hue.
Let’s add a legend that describes the theme of the map.
Open the Templates panel by clicking on the pointer button in the bottom left.
The Legend tab is open by default.
Enter your legend text/html in the Legend field. TileMill already have ready HTML code
for awesome legends. You can check them here. We will use this example:
Copy the code into your legend menu and make necessary changes in colors. We are
specifying here the same two colors in our map to identify the votes for Morsi and
Shafik.
Click Save and voila! Your map should now look like this:
Our goal in the very beginning of this guide is to build a map and once we hover over a
state, a bar chart will pop-up to compare the percentage of votes. In order to build this
bar chart, we’ll need tooltips.
Tooltips allow you to make maps interactive with dynamic content that appears when a
user hovers over or clicks on a map. They can contain HTML and are useful for
revealing additional data, images, and other content.
Open the Templates panel by clicking on the pointer button on the bottom left.
Click on the “Teaser” tab. Teaser content appears when you hover over a feature and
Full content appears when you click on a feature. You can use the Location field to
define a URL to be loaded when a feature is clicked.
Select the “electionegypt” layer to use it for interaction. TileMill only supports one interactive
layer at a time.
The data fields for the layer are displayed wrapped in curly Mustache tags. The tags
represents the columns in our data table. These tags will be replaced by data when you
interact with the map. Locate the fields you want to use.
Write your template using the Mustache tags. Paste the following code into the Teaser
field and use the preview to make sure it looks good:
Percentage of votes in<br/>{{{EgyptElect}}} for Morsi is {{{EgyptEle_6}}} and Shafik
{{{EgyptEle_7}}}
Click “Save” to save your settings and refresh the map. Close the panel by clicking the
close button (X) or by pressing the ESC key. Move your mouse over some points to see
the tooltips. Cool ha?
We built a great teaser, however, I promised you with a bar chart. So how do we
implement this?
We’ll use Google Charts API. Google’s Chart API is a unique tool that allows you to
embed dynamic charts and graphs in the interactive space of TileMill. Little to no
programming experience is needed to adjust an existing chart in their gallery or build
one from scratch in their interactive chart playground. I will not go into details with
Google Chart API. You can follow this tutorial for this purpose.
You can copy the code into the Teaser:
<img src="http://chart.apis.google.com/chart?
&chxt=x,y&chxl=0:|
&chxr=0,1
&chxs=0,BD0026,10,0,l,BD0026|1,BD0026,10,0,l,BD0026|2,BD0026,10,0,l,BD0026
&chls=3,1,0
&chbh=a,4,4
&chg=14.3,9,1,1
&chs=250x250
&cht=bvg
&chco=24416f,ad232d
&chdl=Votes for Morsi|Votes for Shafik
&chm=b,24416f,0,0,0|b,ad232d,0,0,0
&chd=t:{{{EgyptEle_6}}}|{{{EgyptEle_7}}}
&chtt=++++++++++Percentage of Votes in {{{EgyptElect}}}+++
&chts=BD0026,10.5
&width="150" height="250" alt="" />
When you hover over the states, you will get the barchart showing the percentage of
votes for that specific governorate.
Congratulations! You successfully built an awesome Election Map!
Now that you’ve designed a cool map, you can upload it by clicking “Export.” The
easiest way to share your map is via the free MapBox service, the company behind
TileMill. You can upload your maps there and then you’ll be able to embed it in any site
the same way you embed a YouTube video. You can find more about exporting in this
tutorial.
Finally you can upload the map to the web via MapBox and enjoy playing with it. Here’s
mine: http://a.tiles.mapbox.com/v3/datascientist.cc9v0a4i/page.html
Download