Introduction to Geographic Information Systems Fall 2013 (INF 385T-28620) Geodatabases Dr. David Arctur Research Fellow, Adjunct Faculty University of Texas at Austin Lecture 4 September 19, 2013 Outline Tables Geocodes Data table joins Spatial joins Spatial data formats Geodatabases Calculating geometry INF385T(28620) – Fall 2013 – Lecture 4 2 Lecture 4 TABLES INF385T(28620) – Fall 2013 – Lecture 4 3 Two kinds of tables in ArcGIS Feature attribute table of map layer Attribute data is part of map layers Data table with geocodes (such as census IDs) Can add as table to ArcMap Can join to map layer to add more attributes to layer Join via same geocode values in both the data table and map layer’s attribute table Census data example—too many census variables to supply already in feature attribute table, so download custom table and join to appropriate polygon layer INF385T(28620) – Fall 2013 – Lecture 4 4 Data table format Rectangular table with one value per cell Columns (fields) are attributes Rows are observations (records) INF385T(28620) – Fall 2013 – Lecture 4 5 Data table format First row must have column names that are selfdocumenting labels E.g., Shape, POP2000 First character of attribute name must be a letter Remaining characters can be any letter, digit, or the underscore character (but no blanks) INF385T(28620) – Fall 2013 – Lecture 4 6 Data table format All additional rows of a data table must contain only attribute values (raw data) None of the rows can be sums, averages, or other statistics for raw data rows INF385T(28620) – Fall 2013 – Lecture 4 7 Primary keys Each table has a primary key attribute with two properties Each value is unique There are no null values INF385T(28620) – Fall 2013 – Lecture 4 8 Field calculator Add computed columns in ArcGIS ArcGIS does not have the query capacity of relational database packages to compute new columns on the fly So, must create permanent new columns Full range of computation Can add, multiply, etc. Has numeric and text functions Can concatenate text values INF385T(28620) – Fall 2013 – Lecture 4 9 Field calculator (numeric) INF385T(28620) – Fall 2013 – Lecture 4 10 Field calculator (text) Concatenate house number and street fields INF385T(28620) – Fall 2013 – Lecture 4 11 External table file formats for import into ArcGIS Plain ASCII text with comma separated values (.csv) Very transportable format, very large files Each table record is a row terminated with a line-break character (invisible, nonprinting value) Has values separated by a delimiter, usually a comma For data values that contain the delimiter, enclose the value in double quotes Sometimes columns get wrong data type on import (use double quotes to force text data type for digits, say for house numbers) INF385T(28620) – Fall 2013 – Lecture 4 12 External table file formats for import to ArcGIS Excel (.xls, .xlsx) Excel 2003, up to 65,000 rows and 256 columns Excel 2007, up to 1,048,576 rows and 16,384 columns dBase database table (.dbf) Legacy format st 10 characters ArcMap truncates field names to 1 dBase IV has maximum of 255 columns Can open dBase file in Excel but cannot save dBase from Excel Microsoft Access database (.mdb) Up to 2 GB file size See following for other limits: http://www.databasedev.co.uk/access_specifications.html INF385T(28620) – Fall 2013 – Lecture 4 13 Lecture 4 GEOCODES INF385T(28620) – Fall 2013 – Lecture 4 14 Geocodes (2000) Federal Information Processing Standards (FIPS) Developed by the National Institute of Standards and Technology Codes for place-names throughout the world • • • • • • Countries States/provinces Counties Metropolitan statistical areas (MSA’s) Cities Places—Indian reservations, airports, and post offices in the US See http://www.genesys-sampling.com/pages/Template2/site2/61/default.aspx for additional geocodes. INF385T(28620) – Fall 2013 – Lecture 4 15 Geocodes: hierarchical FIPS codes (political boundaries) Country: US State: 42 (Pennsylvania) County: 003 (Allegheny) Minor civil division: 4200361000 (Pittsburgh) Census codes Tract: 1917 (statistical boundaries) Block group: 003 Block: 005 (US420031917003005) Parcel block & lot number Local government cadastral data 0096-P-00210000000 (legal boundaries) (1690 Seaton St, Pittsburgh, PA 15226) INF385T(28620) – Fall 2013 – Lecture 4 16 World and US INF385T(28620) – Fall 2013 – Lecture 4 17 US and state 42 State 42 and county 003 INF385T(28620) – Fall 2013 – Lecture 4 18 County 003 and municipality 61000 Municipality 61000 and tract 1917 INF385T(28620) – Fall 2013 – Lecture 4 19 Tract 1917 and block group 003 Block group 003 and block 005 INF385T(28620) – Fall 2013 – Lecture 4 20 Geocodes (2010) ANSI Codes American National Standards Institute Codes Replace the Federal Information Processing Standards (FIPS) The entities covered include: • States and statistically equivalent entities • Counties and statistically equivalent entities • Named populated and related location entities (such as places and county subdivisions) • American Indian and Alaska Native areas See http://www.census.gov/geo/www/ansi/ansi.html INF385T(28620) – Fall 2013 – Lecture 4 21 Lecture 4 DATA TABLE JOINS INF385T(28620) – Fall 2013 – Lecture 4 22 Review: Table joins Puts two tables together, on the fly, to make one table One-to-one join (e.g., join state attribute data to state shapefile by StateName) One-to-many join (e.g., join code table to feature attribute table to add code description. Many records can use the same code value.) Each table in a join must have key attribute for matching Must have same values and data types for key in both tables INF385T(28620) – Fall 2013 – Lecture 4 23 Example join + INF385T(28620) – Fall 2013 – Lecture 4 = 24 Problems with joins Field types are different (e.g., one is numeric and one is text) Text values left align while numeric values right align INF385T(28620) – Fall 2013 – Lecture 4 25 Solution Create a new field of the same type and use Field Calculator INF385T(28620) – Fall 2013 – Lecture 4 26 Solution Both tables are same field types INF385T(28620) – Fall 2013 – Lecture 4 27 Problems with joins Data format varies Must remove dashes INF385T(28620) – Fall 2013 – Lecture 4 28 Lecture 4 SPATIAL JOINS INF385T(28620) – Fall 2013 – Lecture 4 29 Spatial joins Joins using shape (not attribute field) Enables data aggregation (counting or summing points by polygon) Common spatial joins Points to polygons (counts) Polygons to points (adds text) Points to points (distances) INF385T(28620) – Fall 2013 – Lecture 4 30 Points to polygons How many businesses are in each neighborhood? Start with: Business points Neighborhood polygons INF385T(28620) – Fall 2013 – Lecture 4 31 Points to polygons Right-click neighborhoods > Joins and Relates > Join INF385T(28620) – Fall 2013 – Lecture 4 32 Spatial join result New polygon layer with count of points (number of architects and engineers) INF385T(28620) – Fall 2013 – Lecture 4 33 Spatial join result Show as a choropleth map with labels, or table Neighborhood Name Central Business District Southside Flats Shadyside Bloomfield Lower Lawrenceville North Shore Squirrel Hill South Strip District Point Breeze Squirrel Hill North Garfield South Oakland Friendship North Oakland Carrick Central Lawrenceville East Allegheny Mount Washington East Liberty Central Northside Westwood Banksville Brookline Perry North Highland Park Larimer Allegheny West Middle Hill Bluff Southside Slopes INF385T(28620) – Fall 2013 – Lecture 4 Count 53 14 9 8 8 8 6 6 4 4 3 3 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 34 Points to polygons What neighborhood is a business in? Start with: Business points Neighborhood polygons INF385T(28620) – Fall 2013 – Lecture 4 35 Polygons to points Right-click business points > Joins and Relates > Join INF385T(28620) – Fall 2013 – Lecture 4 36 Spatial join result Point shapefile with neighborhood data on each business INF385T(28620) – Fall 2013 – Lecture 4 37 Points to points How close is the nearest bus stop to a business? Start with: Business points Bus stop points INF385T(28620) – Fall 2013 – Lecture 4 38 Points to points Right-click business points > Joins and Relates > Join INF385T(28620) – Fall 2013 – Lecture 4 39 Result Distance field added to new layer of businesses and stops joined INF385T(28620) – Fall 2013 – Lecture 4 40 Lecture 4 SPATIAL DATA FORMATS INF385T(28620) – Fall 2013 – Lecture 4 41 Esri legacy format: Coverage Folder with multiple files Can have points, lines, and/or polygons Has several intermediate data products (topology) to speed up processing (now calculated on the fly) INF385T(28620) – Fall 2013 – Lecture 4 42 Esri legacy format: Shapefile Multiple files, all with the same name but different file extensions No intermediate data products, but has indices to speed data processing Widely used to share spatial data files INF385T(28620) – Fall 2013 – Lecture 4 43 Shapefiles ArcView native format Minimum files .shp–stores feature geometry .shx–stores index of features .dbf–stores attribute data Additional files .prj–projection data .xml–metadata .sbn and .sbx–store additional indices INF385T(28620) – Fall 2013 – Lecture 4 44 CAD drawings CAD software Autodesk, AutoCAD (.dwg) Bentley, Microstation (.dgn, .dxf) Often used by engineering companies Better digitizing precision INF385T(28620) – Fall 2013 – Lecture 4 45 CAD drawings INF385T(28620) – Fall 2013 – Lecture 4 46 Lecture 4 GEODATABASES Geodatabases A geodatabase is a container used to hold a collection of datasets (GIS features, tables, raster images, and other objects) Country layer World.gdb Graticule layer INF385T(28620) – Fall 2013 – Lecture 4 48 Enterprise geodatabases Practically unlimited size and multiple simultaneous users Use enterprise data management systems Store spatial datasets in a number of DBMSs: IBM DB2, Microsoft SQL Server, Oracle, or Postgres INF385T(28620) – Fall 2013 – Lecture 4 49 Personal geodatabase Parallels enterprise geodatabase but on PC Stores datasets in a Microsoft Access .mdb file Limited to 2 GB Much overhead in space and extra structure Tempting to apply one’s own Access skills, but needs ArcGIS Catalog utility for manipulation INF385T(28620) – Fall 2013 – Lecture 4 50 File geodatabase An Esri replacement for shapefiles Vector and raster map layers Other objects (tables) Stores one or more datasets in a folder of files with .gdb extension Can be up to 1 TB in size Can be used across platforms Can be compressed and encrypted for read-only, secure use INF385T(28620) – Fall 2013 – Lecture 4 51 View geodatabases Cannot identify names in Windows Explorer Must use ArcCatalog INF385T(28620) – Fall 2013 – Lecture 4 52 Non-Esri vector formats Interoperability Ability of different vendors’ hardware and software to share data Driven by the Internet with standards evolving for open data access (International Organization for Standardization, Open Geospatial Consortium, US Federal Geographic Data Committee) Over 110 vector file formats available in ArcGIS Data Interoperability extension (http://www.esri.com/library/fliers/pdfs/data-interop-formats.pdf) INF385T(28620) – Fall 2013 – Lecture 4 53 KML (Keyhole Markup Language) XML schema for Internet-based maps Originally created by Keyhole, Inc. for satellite images and purchased by Google to become Google Maps Provides a set of features (points, lines, polygons, images, text, etc.) with lat/long coordinates plus altitude for 3D viewing KMZ is zipped KML and associated files, needed for upload to Google Maps Portability Can import and export KML/KMZ via ArcToolbox in ArcGIS Can upload to Google maps from your computer INF385T(28620) – Fall 2013 – Lecture 4 54 X,y data Point data table with x and y attributes Increasingly popular to include x and y with data Commonly used for GPS data INF385T(28620) – Fall 2013 – Lecture 4 55 Lecture 4 CALCULATING GEOMETRY INF385T(28620) – Fall 2013 – Lecture 4 56 Point centroids When displaying or analyzing small polygons it is often better to use point centroids INF385T(28620) – Fall 2013 – Lecture 4 57 Calculate x,y fields Add new x and y fields in the attribute table INF385T(28620) – Fall 2013 – Lecture 4 58 Calculate x,y fields Calculate geometry for x field, repeat for y INF385T(28620) – Fall 2013 – Lecture 4 59 X,y field results Results are x and y values based on map properties (e.g., Long/Lat or x,y feet) INF385T(28620) – Fall 2013 – Lecture 4 60 Export table with x,y values INF385T(28620) – Fall 2013 – Lecture 4 61 Add x,y data table INF385T(28620) – Fall 2013 – Lecture 4 62 Export features X,y events should be exported as permanent shapefile or feature class INF385T(28620) – Fall 2013 – Lecture 4 63 Count point centroids Population can be spatially joined to buffer around polluting companies INF385T(28620) – Fall 2013 – Lecture 4 64 Other geometry calculations Area Perimeter Length INF385T(28620) – Fall 2013 – Lecture 4 65 Summary Tables Geocodes Data table joins Spatial joins Spatial data formats Geodatabases Calculating geometry INF385T(28620) – Fall 2013 – Lecture 4 66