Research on Volunteered Geographic Information Michael F. Goodchild University of California Santa Barbara Geographic information • Linking facts to locations within the geographic domain – geospatial – analogous spaces • Center for Spatial Studies • Geographic information systems – – – – – remote sensing GPS legacy of map data tracking volunteered geographic information Geographic information science • Fundamental issues raised by these technologies • Ontology and representation – coding the geographic world – what to leave out • Uncertainty – measuring the differences between databases and reality – problems of vagueness – propagation GIScience and social networks • Social networks are constrained by geography – the need for physical proximity – cultures are geographically defined • Theories of spatial interaction – physical proximity paramount • Theories of social networking – “the death of distance” • Need for new theory – SmallWorlds User-generated content • Trivial to georeference information – geotagging – map mashups • Trivial to make maps – online, open-source software • Significant alternative to traditional sources – things that were never mapped – traditional sources unsustainable popvssoda.com www.flickr.com www.wikimapia.org www.wikimapia.org www.wikimapia.org The story so far • The modern era – authoritative production of geographic information • official naming • guarantees of accuracy (or inaccuracy) – need for economies of scale • • • • cost of entry aerial photography, analytic stereoplotters advanced skills printing – generic products • multiple purposes • long-lived, emphasizing static phenomena The end of the modern era • Growing demands – – – – geographic information to support Web services wayfinding public decision-making management • Legislatures less willing to fund – efforts to make the user pay – constraints on the US federal government • Meltdown in the costs of entry • Software replacing the need for skills – soft photogrammetry – basic cartography – “anyone can make a map” Neogeography • “In other words, the old geography involves a prescribed role/interaction between the four main components, namely the audience, the information, the presenter and the subject, which are common to most standard practises of learning. In NeoGeography, there are however no such boundaries on roles, ownership, and interactions of these four components.” Rana and Joliveau, Journal of Location-Based Services • The citizen as both consumer and producer of geographic information A distant mirror • The Waldseemüller map – St Dié-des-Vosges, 1507 – a name that stuck Research questions • Who’s doing it? • About what? • Quality Who’s doing it? • Long-tail distributions – Pareto scaling • 3 Wikimapia leaders 140,000+ each • IP addresses • Inference from postings Robinson projection Articles with geotags # of articles per unit area (log scale, 0.1° resolution) 988,522 articles 103,291 distinct locations Wikipedia authorship • Registered authors • Only username required • Name, email, etc. optional • IP address kept hidden • Anonymous authors • IP address made public • But nothing else Contributions to “Copenhagen Opera House” # of Contributions Username or IP Most Recent 18 Dybdahl 18-Sep-2005 6 85.233.237.71 (anon) 12-Jan-2008 3 Viva-Verdi 8-Sep-2006 1 Hemmingsen 3-Jan-2007 4 81.62.92.47 (anon) 15-Apr-2006 1 Thue 28-Feb-2006 2 Ghent 30-Apr-2006 3 Valentinian 7-Jan-2007 3 83.77.92.205 (anon) 10-Apr-2006 3 130.226.234.229 (anon) 29-Sep-2007 2 86.149.109.196 (anon) 15-Oct-2007 2 Uppland 24-Dec-2005 2 87.48.100.222 (anon) 12-Jan-2006 University of California, Santa Barbara 135 anonymous authors with 719 revisions; signature distance = 533 km 64% of articles at 2,000 km or less ??? Cyberscape: Placemarks in post-Katrina New Orleans Flooding Reports (via Scipionus) in New Orleans, Sept. 2005 Who was able to or interested in using this new technology? Which places were they interested in? Crutcher and Zook. 2009. GeoForum What are they doing it about? • <x,Z,z(x)> • Framework data – common themes that support wayfinding, georeferencing – Federal Geographic Data Committee • • • • • • • geodetic control property ownership administrative boundaries Earth imagery topography hydrography transportation www.openstreetmap.org The gazetteer • The “names layer” – named features, points of interest – the interface to geographic information – Wikimapia Beyond the framework • Things that have never been mapped – where your friends are – cultural heritage – graffiti, trash • Time-critical information – emergencies Emergency management • Recent fires in Santa Barbara – Zaca Fire (July 07) • burned for 2 months • no houses lost – Gap Fire (July 08) • burned for 7 days • no houses lost – Tea Fire (November 08) • burned for 2 days • 230 houses lost – Jesusita Fire (May 09) • burned for 2 days • 75 houses lost Hits Source 595673 Jesusita Fire (Ethan) 188308 SBC Jesusita Fire Santa Barbara, CA (Robert O'Connor - fire news blog) 89214 Jesusita Fire Map (Randy - Independent.com) 67525 Jesusita Fire in Santa Barbara - LA Times map (Los Angeles Times) 27777 Map of burned homes in Santa Barbara (Los Angeles Times) 26330 Jesusita Fire Evacuation Areas: Approximation (COSB) 25454 Santa Barbara 'Jesusita Fire' (ABC7 Eyewitness News) 19592 Jesusita Fire - Santa Barbara (lanewspace) 2446 Santa Barbara Damaged Homes 2008 (Los Angeles Times, note: mapped for comparison with Jesusita) 2048 Jesusita Fire (longhairedhippy) 1314 Santa Barbara Fire Evacuation (Gary); 962 Jesusita Fire in Santa Barbara (ABC30 Action News) 788 Wildfire ~ Santa Barbara (Buffalo) 505 Closure map - Jesusita Fire in Santa Barbara (Los Angeles Times) 461 Untitled (Matthew, note: discovered via google.com.mx); 396 Jesusita Fire Structure Damage (Paul Bartsch); 31 Lessons learned • Authoritative information – must be verified by officials – too slow for the Tea and Jesusita Fires • Asserted information – carries risk of false positives • false rumor of Tea Fire in Mission Canyon • some unnecessary evacuations – people are willing to accept false positives – lack of authoritative information amounts to false negatives – false negatives are far less acceptable than false positives • there were some posted false negatives LA Times May 8 2009 Emphasis on the easy stuff • Placenames, streets, pictures – georeferencing – well-defined reference systems and objects • Free production by citizens replacing authoritative production • Do other types of geographic information require experts? – a catalog of types The FGDC framework layers • Transportation – basic network • rapid updates – citizens as probes • real-time congestion – air quality • Hydrography – water quality • Elevation – adequate authoritative sources • Orthoimagery – cost of entry • Cadastral – legal issues • Administrative units – legal issues • Geodetic control – expertise in geodesy Thematic layers • Weather and climate – tradition of amateur observers – GLOBE • Biota – Christmas Bird Count – e-flora – phenology • Soils – Natural Resource Conservation Service – mapping for agricultural advice The soil map • An area-class map – – – – – irregular areas denoting uniform soil type lengthy descriptions of types made by highly trained experts sample points interpolation from ground observation and aerial photography – every point assigned to a single class – expressed in a unique mapping c = f(x) • What is the nature of the expertise? Analysis of sample soils Application/use case Aerial photography Application/use case Historical records of crop performance Application/use case Expert knowledge Application/use case Covariates, e.g. elevation, climate, parent material Application/use case Scale and accuracy issues Application/use case Application knowledge Data quality • Traditional mapping guarantees bounds on inaccuracy – quality can be surprisingly poor – legacy data • OSM studies show VGI compares well • Geographic context • Crowdsourcing metrics www.flickr.com earth.google.com nationalmap.gov Authority and assertion • Authority – inaccuracies are guaranteed – formal testing programs – metadata • Assertion – inaccuracies are undocumented – no metadata – data about popular places tend to be more accurate – inaccuracies often less than legacy authoritative data Jesus and Allah BLUE = (more Jesus than Allah); RED = (more Allah than Jesus). Size of the bubble show the magnitude of the difference Crandall et al. 2009. Mapping the world’s photos. http://www.cs.cornell.edu/~crandall/papers/mapping09www.pdf Tracks inferred from Flickr postings (http://www.cs.cornell.edu/~crandall/papers/mapping09www.pdf) Future plans • Conflation with traditional sources – comparison of quality – different emphases • Methods of analysis and modeling for VGI • VGI in remote regions – digital divide