A Short Course in Geoinformatics Part I: Science Issues in GeoInformatics Michael F. Goodchild Outline • A short history of GIS • Basic principles of GIScience • Uncertainty A short history of GIS • Maps in computers – for decision-making • each map representing one dimension of a decision – for managing data • aggregating census returns to reporting zones • managing the multiple data types of transportation planning – to support map-making • editing • projection change A model for landscape architecture • Ian McHarg’s school at the University of Pennsylvania Meteorology Geology Hydrology Plant ecology Animal ecology Limnology Computation Ian McHarg 1920-2001 Remote sensing “For the first time, a department of landscape architecture could recruit a faculty of distinguished natural scientists sharing the ecological view and determined to integrate their perceptions into a holistic discipline applied to the solution of contemporary problems.” I.L. McHarg, A Quest for Life (Wiley, 1996, p. 192) Integration of science into action Frequently emulated as a model for environmental science But with a weaker intervention component The social context is missing Computation and remote sensing do not fit the model The Canada Geographic Information System • Roger Tomlinson – IBM contracts 1964-68 • 7 layers of land characteristics – – – – soil capability for agriculture recreation capability current land use …. • To assess the current use of Canadian land – to measure area, plan new uses Technical aspects of CGIS • Manuscript maps at 1:50,000 – 7 per tile • Hand-scribing of boundaries • An optical scanner creating a raster of boundaries • Vectorization • Merging with area attributes • The common boundary between two areas as the basic unit Flat-file options (tape) By face/polygon – double recording of internal boundaries – spurious differences By edge/arc – half the data volume – compute area in O(vertices) – simplify overlay – attributes of adjacent polygons – no polygon records Technical aspects… • Storage on magnetic tape – variable-length records – leftpolyID, rightpolyID, #points, (x1,y1),… • Indexing in Morton order – a quad-tree index • Numerical output only – tabulations of area – no visual display • Mainframe technology – later leased land lines at 300 bps The quadtree Recursive subdivision – variable depth depending on local detail 1 0 31 33 30 32 3 2 Other types of maps Transportation links – linear features – networks – U.S. Bureau of the Census – blocks = 2-cells – street segments = 1-cells – intersections = 0-cells Topological data structures • 1977 conference – sponsored by Harvard University • A unifying structure across many application areas – all three of: decision-making, managing data, editing maps • The birth of ESRI The relational model The map as a collection of arcs, nodes, and faces – F-A+N = 2 Stored in tables with keys GIS built on RDBMS – INFO Vertices left out – a hybrid solution – ARC/INFO – the ARC data structure still proprietary Square pegs in round holes Cul-de-sacs – allow 1-nodes Properties of parts of edges – dynamic segmentation – linear referencing Non-planarity – overpasses and underpasses – turntables A 1990s house of cards Still no vertices in the RDBMS Points – coordinates stored in tables – no topological relationships with other features Does it have to be this hard? – – – – – simple CAD data model points, lines, and areas in an empty space potentially overlapping no topological relationships compute on the fly Object-oriented data modeling All features are instances of classes Classes inherit properties from more general classes Features can be aggregates of other features Features can be composed of other features Features can be associated Address Agriculture Archiving Atmospheric Basemap Biodiversity Census-Administrative Boundaries Defense-Intel Energy Utilities Energy Utilities MultiSpeak TM Environmental Regulated Facilities Forestry Geology Groundwater Health Historic Preservation and Archaeology Hydro International Hydrographic Organization (IHO) S-57 for ENC Land Parcels Local Government Marine Petroleum Pipeline Raster Telecommunications Transportation Water Utilities A paradigm shift Away from the map metaphor – georeferenced events, transactions – objects with no georeferences – phenomena that were never mapped Neogeography – customized maps • • user-centric transitory Interactions, flows * 0..1 MINARD NAPOLEON MAP 0..2 * INTERACTION * 0..1 KARST FLOW ROUTES 0..2 * ORIGINAL USE CASE MODELS 0..1 * 0..1 * 0..2 0..1 Generic Flow Model slide 19 / 22 slide 15 / 22 The data modeling cycle The set of all phenomena in the domain Find workarounds, violate the data model Adopt a generic solution Identify inefficiencies and special cases Is the process beginning again? All features are instances of classes – are all phenomena naturally features? – is there a pre-feature stage? Inherently continuous phenomena – roads, rivers – topography – the pre-patch ecological landscape Basic principles of GIScience • The atomic geographic fact – the geo-atom – <x,z> – a pair defining what (z) is where (x) • Point observations are individual geo-atoms – data about lines, areas, volumes can be decomposed into geo-atoms – the boundary of California defines an infinite number of statements of the form <x,z> • where z = 1 if x is inside the boundary • else z=0 The result of applying a 150kmwide kernel to points distributed over California A typical kernel function Discrete objects • Points, lines, areas, or volumes – in an otherwise empty space – may overlap – countable • Examples: – – – – buildings cars instances of a disease oil wells Continuous fields • Variables that can be measured anywhere – at any time – z = f(x,y) f(x,y,z) f(x,y,z,t) • Examples: – – – – elevation of the ground surface atmospheric temperature soil pH wind direction • Variable can be a class – soil type – land use type Fields as objects Fields discretized as collections of objects – sample points – isolines – triangles of a mesh – samples of a Fourier transform Methods implied by roles of objects – isolines cannot cross – polygons must not overlap Mitchell, A., 1999. The ESRI Guide to GIS Analysis. Redlands: ESRI Press Principle • There are two fundamentally distinct ways of aggregating geo-atoms – into discrete objects • all points within an object have the attributes of the object – into continuous fields • every point is mapped to a variable • Marginal cases: – weather highs, lows, fronts – mountain peaks – clouds in the sky Beyond objects and fields • Discrete objects that move • Discrete objects that change shape • Discrete objects that have internal structure Helix representation Spine: expresses spatiotemporal 3-D movement of the center of mass. Prongs: express expansion or collapse of the object’s outline May Yuan, University of Oklahoma Hurricane Frances Hurricane helixes Spatially binary data • <x1,x2,z> – information about the relationship between two locations • • • • flow of migrants distance direction time of travel – such information is key to understanding many social processes – conventional geographic information is spatially unary 1) Spatial dependence principle • Tobler’s First Law of Geography (TFL) – “All things are similar, but nearby things are more similar than distant things” • Horizontal context – geographic facts should be consistent with their surroundings • Spatial dependence – the tendency for nearby observations to be correlated – violating an assumption of many statistical tests that observations are independent Validity “Nearby things are less similar than distant things” – negative spatial autocorrelation – possible at certain scales • • the checkerboard retailing – but negative a/c at one scale requires positive a/c at other scales – smoothing processes dominate sharpening processes Formalization Geostatistics – variogram, covariogram – measuring how similarity decreases (variance increases) with distance – parameters vary by phenomenon • does this make TFL less of a law? Utility Representation – GI is reducible to statements of the form <x,z> – the atomic form of GI is unmanageable, encountered only in point samples – all other GI data models assume TFL Spatial interpolation – IDW and Kriging implement TFL If TFL weren’t true GIS would be impossible – a point sample is useful only with interpolation Life would be impossible 2) Spatial heterogeneity principle • The Earth’s surface is fundamentally heterogeneous – unlike humans, whose characteristics are distributed around an average • It is difficult to generalize from a single case study • The results of any case study depend explicitly on the spatial bounds of the study • The second law of geography • Again, problematic for science Jorge Sifuentes, PhD dissertation Practical implications of the second law A state is not a sample of the nation – a country is not a sample of the world Classification schemes will differ when devised by local jurisdictions Figures of the Earth will differ when devised by local surveying agencies Global standards will always compete with local standards 3) A fractal principle The closer you look the more you see – and for many natural phenomena the rate is orderly – Richardson plots – lengths of national boundaries • • Spain and Portugal context of 1920s Practical implications Indexing schemes, quadtrees – partitioning of information at different scales Length is a function of spatial resolution – and variously under-estimated in GIS – as are many other properties • • • slope soil class land cover class – spatial resolution should always be explicit in GIS analysis • • easy in raster much more difficult in vector 4) The uncertainty principle No representation of the Earth’s surface can be complete – no measurement of position can be perfect – a GIS will always leave doubt about the true nature of the Earth’s surface ArcMap 10.0, Plate Carrée projection Error-sensitive GIS Storing characterizations of uncertainty Propagation through GIS operations Visualization Confidence limits on products How to build one? Augmentation of existing data models – new attributes of objects, object classes, data sets – metadata – the five-fold way – Lanter and Veregin, GeoLineus – inheritance, object-orientation