How to and load CSO data into PostgreSQL

advertisement
Lab 4
DT786
Getting CSO data into R
In this lab we will cover:
1) Describe data.frame and SpatialPolygonsDataFrame objects.
2) How to load shape files for the Dublin Electoral Division (dublin.eds).
3) Adding an additional column to a data.frame.
4) Saving the changes in a shape file.
5) Plot the contents of a shape file
6) Downloading education and car ownership data from CSO.
7) Adding the education data from the CSO to R data.frame.
8) Adding the car ownership data from the CSO to R data.frame.
First we must load the required packages:
library(spdep)
library(maptools)
library(RColorBrewer)
library(classInt)
1.1) In R data frames (data.frame) can take several vectors of different types and
store them in the same variable. The vectors can be of all different types. For
example, a data frame may contain many lists, and each list might be a list of factors,
strings, or numbers. DFs are tightly coupled collections of variables which share
many of the properties of matrices and of lists, used as the fundamental data structure
by most of R's modeling software. There are different ways to create and manipulate
data frames. Here are some examples.
L3 <- LETTERS[1:3]
df1 <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE))
The same with automatic column names:
df2 <- data.frame(cbind(
is.data.frame(df1)
1,
1:10),sample(L3, 10, replace=TRUE))
The data.frame structure is used as part of spatial classes such as
SpatialPolygonsDataFrame and SpatialPointsDataFrame to hold attributes of spatial
objects
A SpatialPolygonsDataFrame is made up of a data.frame and an object of
class SpatialPolygons, which is a list of objects of class Polygons, which is made up
of a list with Polygon class objects. See lab3 on examining the structure of spatial
objects.
There is also a SpatialPointsDataFrame which is trructured as follows
2) How to load shape files for the Dublin Electoral Division (dublin.eds).
The Dublin Electoral Division data was exported from PostgreSQL as a shape file. It
contains al the geometric and some of the attributes (MALE1_1 and FEMALE1_1)
data for Dublin EDs. The file should be copied from student distrib to C:\My-R-Dir.
The file can be loaded into R with the command:
dublin.eds <- readShapePoly("C:\\My-R-Dir\\dublinEds.shp")
Note this is the same data as used on the Spatial Databases course but in R it is called
dublin.eds.
Examine dublin.eds
names(dublin.eds)
is(dublin.eds)
is(dublin.eds$MALE1_1)
dublin.eds$FEMALE1_1
coordinates(dublin.eds)
getClass(dublin.eds)
slotNames(dublin.eds)
#Examine slots
slotNames(dublin.eds)
dublin.eds@data
dublin.eds@polygons
3) Add a new column called POP to dublin.eds to store the sume of male
and female populations for each ED.
dublin.eds@data$POP <- dublin.eds@data$MALE1_1 + dublin.eds@data$FEMALE1_1
4) To save the above changes in a file set called dubeds1.
details <- paste("C:\\My-R-Dir", "dubeds1", sep="/")
writePolyShape(dublin.eds,details)
A new file will be created in your current working directory C:/My-R-Dir called
dubeds1.shp. This new file will have an population column calculated in the current
R session.
5) Plot the contents of dubeds1.shp
Load the file that you saved from step 4. It should contain the data on Dublin ED and
a total population:
dublin.eds <- readShapePoly("C:\\My-R-Dir\\dubeds1.shp")
Load a colour library
library(RColorBrewer)
Set some colours and
pop8 <- brewer.pal(8,'Set2')
Use ranges for thematic colouring
spplot(dublin.eds, "POP", col.regions=pop8, at=c(500,1000,2000,3000,4000,5000,6000,7000,8000),main='Dublin
Population')
You should get a map as below:
6) Downloading education and car ownership data from CSO.
The CSO census for 2006 consists of over 70 themes. Each of these themes contains
several components. We will use three steps to get CSO data into R.
1) Download a particular topic for a given area. We will download the education
and car ownership data for the Dublin Electoral Divisions.
2) Load the new topic into a data.frame specifically designed for that topic.
3) Move the new topic to the Electoral Division (dublin.eds) table that contains
all topics and the geometry of each ED.
Download Education information
Go to the CSO Census 2006 reports page for loading into PostgreSQL:
http://census.cso.ie/census/ReportFolders/ReportFolders.aspx
Follow the following screen shots:
Scroll to Dublin City
Select Dublin City and download as a CSV file.
Set the dimension order.
Save the file as DublinEducation.csv in C:\My-R-Dir
Edit DublinEducation.csv in TextPad or Notepad and delete the header:
"Theme 10 - 4 : Persons aged 15 and over by sex, principal economic status and highest level of education completed, 2002"
Each generic theme has one or more components. It is the theme components that
actually get downloaded. We are interests only in the educational parts of Theme 104. Note the data from he CSO does not contain geometry. Selected columns from the
downloaded table will later be loaded into the big ED frame dublin.eds. You
must have dublin.eds loaded into R.
Now in R read in the file:
dubeduc=read.csv(file="DublinEducation.csv",header=TRUE)
> names(dubeduc)
> names(dublin.eds)
names(dubeduc)
"Geographic.Area"
"No.formal.education"
"Primary.education"
"Lower.secondary.education"
"Upper.secondary"
"Technical.or.vocational.qualification"
"Upper.secondary.and.technical.or.vocational"
"Non.degree"
"Primary.degree"
“Professional.qualification..degree.status."
"Both.degree.and.professional.qualification"
"Post.graduate.certificate.or.diploma"
"Post.graduate.degree..masters."
"Doctorate..PhD."
"Not.stated"
"Total"
names(dublin.eds)
"SAPS_LABEL"
"FORMAL_EDU"
"PRIMARY_ED"
"LOWER_SECO"
"UPPER_SECO"
"TECHNICAL1"
"UPPER_S_01"
"NON_DEGREE"
"PRIMARY_DE"
"PROFESSION"
"BOTH_DEGRE"
"POSTGRADUA"
"POSTGRA_01"
"DOCTORATE1"
Add the CSO data to dublin.eds as follows
dublin.eds@data$FORMAL_EDU <- dubeduc$"No.formal.education"
dublin.eds@data$ PRIMARY_ED <- dubeduc$"Primary.education"
dublin.eds@data$PROFESSION <- dubeduc$“Professional.qualification..degree.status."
Check that the has been correctly updates
sideBySide <-paste(dublin.eds@data$SAPS_LABEL,
dublin.eds@data$PROFESSION, dubeduc$"Geographic.Area",
dubeduc$"Professional.qualification..degree.status.")
writeLines(sideBySide)
You do not have to check the rest of the data.
Add the rest of the education data in a similar fashion.
You can save the newley entered data as follows:
details <- paste("C:\\My-R-Dir", "dubeds2", sep="/")
writePolyShape(dublin.eds,details)
Print the map of Professionals
Load a colour library if necessary
library(RColorBrewer)
Set some colours and
pop8 <- brewer.pal(8,'Set2')
Get the ranges for thematic colouring
lower = min(dublin.eds@data$PROFESSION)
upper = max(dublin.eds@data$PROFESSION)
intrv = (lower+upper)/8
Now plot the map with the above intervals:
spplot(dublin.eds, "PROFESSION", col.regions=pop8, at=c(intrv, intrv*2, intrv*3,
intrv*4, intrv*5, intrv*6, intrv*7, intrv*8),main='Dublin Professionals')
You should get a map depicting the number of professionally qualified per ED.
How would you make a map displaying the density of professionals per ED.
The following updates all the dublin.eds education fields with the CSO values.
dublin.eds@data$"FORMAL_EDU"
dublin.eds@data$"PRIMARY_ED"
dublin.eds@data$"LOWER_SECO"
dublin.eds@data$"UPPER_SECO"
dublin.eds@data$"TECHNICAL1"
dublin.eds@data$"UPPER_S_01"
dublin.eds@data$"NON_DEGREE"
dublin.eds@data$"PRIMARY_DE"
dublin.eds@data$"PROFESSION"
dublin.eds@data$"BOTH_DEGRE"
dublin.eds@data$"POSTGRADUA"
dublin.eds@data$"POSTGRA_01"
dublin.eds@data$"DOCTORATE1"
dublin.eds@data$"NOT_STATED"
<<<<<<<<<<<<<<-
dubeduc$No.formal.education
dubeduc$Primary.education
dubeduc$Lower.secondary.education
dubeduc$Upper.secondary
dubeduc$Technical.or.vocational.qualification
dubeduc$Upper.secondary.and.technical.or.vocational
dubeduc$Non.degree
dubeduc$Primary.degree
dubeduc$"Professional.qualification..degree.status."
dubeduc$"Both.degree.and.professional.qualification"
dubeduc$"Post.graduate.certificate.or.diploma"
dubeduc$"Post.graduate.degree..masters."
dubeduc$"Doctorate..PhD."
dubeduc$"Not.stated"
It is a good idea to save your data afer changes.
It is a good idea to increment file version as follows:
writePolyShape(dublin.eds,C:\\My-R-Dir\\", "dubeds3.shp")
Add the car data called:
"Theme 15 - 1 : Number of households with cars, 2006"
in a similar fashion.
Download