GIScience and CyberGIS Michael F. Goodchild University of California Santa Barbara The vision of CI-supported science • Teams studying complex questions – distributed across disciplines • with disparate practices – distributed geographically • with powerful communication links • perhaps distributed temporally as well • Access to vast repositories of data – possibly real-time – using powerful search tools – with comprehensive documentation • metadata, provenance, etc. More on the vision • Access to powerful software tools – manipulation, analysis, modeling – interoperable • easy to match to data • Powerful computation – supercomputers • Well-connected communities – skilled in the use of CI – easy recruitment, team formation • Efficient methods of knowledge sharing What’s special about spatial? • Vision is generic – is there a specifically spatial perspective? • a specific case for cyberGIS? • In some ways GIS is already ahead of the field – – – – – National Spatial Data Infrastructure 1993 metadata since 1992 geolibraries since 1993, geoportals OGC standards (WMS, etc.) a tradition of data sharing • much in the public domain, Feist, etc. Community engagement • • • • GIS functions available to all People love maps VGI But in the scientific community – generalizing away from space and time – variable takeup of GIS – a belief that it is intuitively obvious • no widely recognized theory or principles • reluctance to see GIS as comparable to e.g. statistics Elements of a geospatial CI • Base mapping – easy-to-use mapping tools • Google Maps API, ArcGIS Online • boundary files • Gazetteers, point-of-interest databases – geonames.org, Google Maps API, etc. – interoperability of georeferencing styles • Geocoding services • Powerful analysis packages – GeoDa, ArcGIS, R, Matlab, etc. maps.google.com raconline.org www.csiss.org More elements • Geodemographics – Census data mapping – PRIZM, Tapestry, etc. • National Spatial Data Infrastructure – 7 base layers So why the interest in cyberGIS? • Speedup? – how much effort is a speedup worth? • Scale – being able to perform simulations and develop models on n = 106+ elements – making scale explicit • and addressing the MAUP and ecological fallacy Because we can avoid shortcuts • No need for Pythagorean distances – all analysis on a curved Earth – new methods needed • No need to divide and conquer – small study areas • difficult to generalize from • No need to restrict to least squares – use nonlinear optimization • No need for parametric inferential statistics – use simulation and randomization tests More reasons for interest • Because the Earth’s systems really are parallel – humans and communities are semi-independent agents making simultaneous decisions – but conventional computing is serial – the architecture of cyberGIS can be closer to the architecture of the real world’s processes • Because today’s science problems really are more complex – requiring multidisciplinary, distributed teams New kinds of data • • • • Big Data Closer to real-time Vastly increased volume Poor and diminishing quality control – from disparate sources – no lengthy synthesis by experts – no metadata or provenance • Need to automate quality control – and the production of metadata and provenance The characteristics of Big Data • Volume – peta-, exabyte scale – zetta (1021) – yotta (1024) • the mass of the Earth is 5,973.6 Yg • Velocity – rapid change, speed of analysis • Variety – many sources – varied quality New kinds of analysis • Of data with unknown or variable quality • More suited to hypothesis generation than hypothesis testing – the softer end of science – exploration, sampling design – induction • An increased role for machine learning Challenging the norms of science • Collective responsibility – plagiarism • The black box – impossible to know all details of a project • Replicability – impossible to report in sufficient detail • Experimental design • Poor data – and therefore poor results New concepts of knowledge • David Weinberger, Too Big to Know • A strong legacy – academic advancement • No stop events – publication • All knowledge is contested • All knowledge is uncertain Selling the vision • Why is cyberGIS important? – because it enables new applications, new discoveries – possibilities that were not realized before • Has the case been made? – or is this a matter of faith? • We need a set of compelling examples – of what could not be done without cyberGIS Accessibility • CyberGIS must be more accessible than GIS – to a larger user community – advanced technology tends to move initially in the opposite direction – only then will users be motivated to adopt • Shorter learning curve • More intuitive user interfaces • Interoperable across more knowledge communities • How accessible is GIS? The user interface problem • To support CyberGIS, service-oriented architectures, discovery of services, interdisciplinary research – – – – we must formalize functionality a common language to describe operations interoperability across functions a radically different user interface • In 40 years of GIS development this has not been achieved – functionality is ad hoc, legacy, artifactual Title Count of functions 3D Analyst Tools 34 Analysis Tools 19 Cartography Tools 43 Conversion Tools 46 Data Interoperability Tools 2 Data Management Tools 178 Editing Tools 7 Geocoding Tools 7 Geostatistical Analyst Tools 22 Linear Referencing Tools 7 Multidimension Tools 7 Network Analyst Tools 21 Parcel Fabric Tools 4 Schematics Tools 5 Server Tools 14 Spatial Analyst Tools 171 Spatial Statistics Tools 26 Tracking Analyst Tools 2 Total 615 Questions users want to ask • To answer Question A you need to use Function B – or Function B1 followed by Function B2 followed by Function B3… The Andy Mitchell books • Mitchell A. The ESRI guide to GIS analysis. I. Geographic patterns and relationships. Redlands, CA: ESRI Press; 1995. • Mitchell A. The ESRI guide to GIS analysis. II. Spatial measurements and statistics. Redlands, CA: ESRI Press; 2005. • Mitchell A. The ESRI guide to GIS analysis. III. Modeling suitability, movement, and interaction. Redlands, CA: ESRI Press; 2012. Topics of Volume I • • • • • • Mapping Where Things Are Mapping the Most and Least Mapping Density Finding What’s Inside Finding What’s Nearby Mapping Change If you had an infinite supply of computing power how would you deploy it? • On bigger simulation models? • On synthesizing and analyzing larger quantities of data? • On tools to allow researchers to collaborate better? • On making the user interface more accessible? Meet Dr Geo Analytics • Ask Geo – Where are the counties with the highest percent uninsured? – Do these tend to be rural counties? – How is x related to y? • when x and y have different spatial support? Towards a successful CyberGIS • Think like a user – as well as a technically expert GIScientist • Understand why CyberGIS is important – and how to argue that • to an anthropologist or an ecologist • in 30 seconds • Simplify – the product must be easy to learn and use • as well as powerful and scientifically rigorous • Think ahead – today’s technologies will evolve CyberGIS as a game-changer • Rethinking many aspects of GIS • Changing traditional practices • Asking new questions – be willing to move beyond the old questions – Galileo’s telescope allowed him to ask new questions • Creating new priorities • A powerful vision – and a wealth of opportunity