Crowd-sourcing metadata for VGI Mohsen Kalantari The University of Melbourne Australia saeidks@unimelb.edu.au Abbas Rajabifard The University of Melbourne Australia abbas.r@unimelb.edu.au Hamed Olfat The University of Melbourne Australia h.olfat@unimelb.edu.au Ian Williamson The University of Melbourne Australia ianpw@unimelb.edu.au Abstract There has been an increased tendency towards embracing the potential of the Web 2.0 in knowledge creation with users’ contribution becoming significantly important in the development of the open data such as Volunteered Geographic Information (VGI). An increasing number of volunteers that add value to geospatial data require effective ways of organising and providing access to value added or newly created data. Metadata plays a critical role in sharing, discovering and utilising VGI. This paper discusses and demonstrates the requirements of an automatic method to create metadata for VGI. The paper proposes an approach using folksonomies created based on users’ contributions. Keywords: Geospatial data, metadata, VGI, automation, tagging, folksonomy, Web 2.0, crowd sourcing 1 Introduction The power of the Web is being used to collect data and create knowledge. Linking web pages to each other, websites offering the collective work of the net users, search engines that use the network characteristics of the Web rather than just the content of the Web documents, and user engagement in buying and selling are examples that the Web environment collates data from everywhere and everyone [9]. In other words, users’ contributions have become significantly important in the development of the World Wide Web (WWW). This development is affecting Spatial Data Infrastructures (SDIs) as well. SDIs are often initiated and coordinated by government authorities and provide authoritative geospatial information to users. The data preparation within the SDIs is usually managed in a process where the quality of geospatial data is rigorously examined before publication to users. However, developments in VGI are challenging SDIs [3]. Geospatial information such as roads, addresses, and parcel maps are being created in initiatives such as OpenStreetMap (OSM), Wikimapia, and Google My Map. These initiatives enable the collective intelligence of volunteers to provide geospatial data to a wider group [1; 2; 4]. However, the data provided by volunteers often lacks metadata making it difficult to determine how the data has been created, when, by whom, and possibly its fitness for use [8]. Since the use of VGI is increasing, in this paper we propose an approach to facilitate the creation of metadata for VGI. We propose a method that provides description consistent with the content of VGI via VGI users. This method potentially reveals the essence of data as it relies on its users. 2 Need for VGI metadata The literature in this domain highlights two major issues with VGI. One is inconsistency because of the range of volunteers and the different methods they use for creating data. Second is the lack of efficient specifications by which volunteers can create reliable information. While inconsistency is natural for VGI, an increase in the number of volunteers can potentially assist in solving the first issue. For the second issue, specifications can be set to facilitate the improvement of VGI. In the specifications, lineage plays a significant role in helping the users to assess the suitability for a particular use. Even as the VGI quality can be improved, understanding its fitness for use is a significant challenge for the users. As a potential solution, metadata for VGI can perform an important function in telling its fitness for use. Experts and professionals usually create metadata [7]. In the same way, the bases of the majority of geospatial data catalogues have been created by professionals. It is a common practice now to create metadata based on standards and guidelines of cataloguing, and classification of geospatial data [5]. However, there is a fundamental issue here. Both professionally created and author-created VGI metadata have a similar disadvantage: the users of the VGI are not involved in the process. Metadata experts and authors may create metadata that not necessarily addresses the needs of users. Still the VGI users are disconnected from this process and their perception and experience of data are ignored. The significant missing step of both professionally and authorcreated metadata is the ability of users to contribute towards the VGI, without subscribing to the system and becoming a registered volunteer. An alternative approach can be utilised to create metadata for VGI is through crowd sourcing. The VGI users can use their own interpretation of a geographic phenomenon to describe it. Furthermore the users can express their remarks about the description of data, its quality, fitness for purpose etc. [6]. In other words, in this way we can link the users to the process of VGI improvement. If users can create metadata about VGI, and share their notes with the other users, it will help with VGI’s discoverability and content. Through this approach, the collective intelligence of the users will help to improve the understanding of other users of VGI. AGILE 2013 – Leuven, May 14-17, 2013 3 Crowd-sourcing VGI metadata We believe the collection of tags and descriptions by users can facilitate the generation of metadata for VGI. In a VGI system, where many users are allowed to tag geographic data, this collection of tags can become a geographic folksonomy, that is, a method that can collaboratively create and manage VGI metadata. We propose two models for the VGI metadata creation. In the first model we create a database for metadata by only monitoring users’ interaction where the users are not aware. In the second model we explicitly allow users for input to create metadata. 3.1 Figure 2). Figure 2: A different search result by deleting a relevant keyword Implied model In this model we monitor search words used by VGI users, analyse them, and then use them as descriptions to create the content of VGI feature metadata. The implicit model is streamlined in three steps: monitoring search words, recording search words, and assigning search words. The VGI system consisting of geographic data typically provides users with a facility to find a feature, place or location. Users query the data using search words. The service will then find and retrieve the corresponding records with that search word. The users will be able to view the results and decide which data is more suitable. An example is discussed here. The user is seeking ‘Blackburn High School in Victoria, Australia’ in Wikimapia using the search mechanism provided in the user interface. In the first instance, the result shows no matching record for this search ( Figure 1). The first impression is that there is no Blackburn High School recorded in the Wikimapia. Figure 1: User searching for Blackburn High School in Victoria, Australia in Wikimapia Now, a change in the search words will provide a different result. By deleting ‘Victoria’ from ‘Blackburn High School in Victoria, Australia’ the system is able to retrieve Blackburn High School from the records and present it on the map ( At first glance, ‘Victoria’ seems to be a logical search word, which should be tagged to the Blackburn High School. One way to address the issue is to assign Victoria to the Blackburn High School record manually by an administrator. However, there are millions of these geographic features, which make it extremely expensive and impractical to assign keywords and create metadata for them manually. Yet the argument that this paper advances is whether ‘Victoria’ as metadata or a tag should be used to describe this geographic feature needs to be considered from the users’ perceptive. A practical way to address the issues proposed by this paper is to monitor and track the search words used by users as collective intelligence when discovering geographic features. Here, we identify the search words used by a user for finding features, places or locations and monitor them during the discovery step. Now, we record any search word relevant to a geographic feature identified in a database and form the basis for VGI metadata records. Here, we discover how many times a search word can find a geographic data set. If a search word is used frequently in discovering a feature, it should be regarded as a keyword that is of significance for the users. Increasing use of the same search word illustrates its usefulness. Any frequently utilised search word has the potential to form the metadata record for the VGI feature and datasets. Among the search words recorded by any of them, which have frequently been utilised, will be assigned to the geographic feature and will be stored as its metadata file. Through this method, the commonly used search words (that are semantically linked to features) for finding features are recorded and made available to other users. This method will, over a period of use, create a basis for keywords of metadata records related to geographic features through applying and refining the appropriate search words ( AGILE 2013 – Leuven, May 14-17, 2013 Figure 2). 3.2 Explicit model The explicit model creates the VGI metadata content directly through comments made by the users, as opposed to the implicit model that indirectly creates metadata via monitoring the search words used by user. In the explicit model, users tag a feature based of their knowledge and understanding. The users label the geographic information based on their awareness in the context of using that information which is usually related to their individual requirements. These tags can be illustrated in a ‘Tag Clou’”. Within the tag cloud, the tags, which are used more frequently by the users, will be highlighted and shown in a larger-sized font (Figure 3). The users are also able to indicate their degree of agreement with the existing tags in the tag cloud by their choice of use of a tag. Figure 3: A demonstrator Tag Cloud representing metadata for Blackburn High School Victoria Blackburn The method proposed here for creating the VGI metadata builds on crowd sourcing, relying upon the user at the centre to describe VGI. On the same incremental basis that authors of VGI create it, users of VGI can provide descriptions for it. Therefore, the descriptions will become metadata. Crowd-sourcing metadata for VGI depends on the number of VGI users. In this context, the explicit model will rely heavily on users’ willingness to contribute. On the other hand, the implicit model described earlier tends to capture search words to create metadata. This demonstrates that implicit and explicit methods are complementary, and they are both critical for the success of the method proposed in this paper. A disadvantage of crowd-sourcing metadata is that it is created and maintained by unsupervised users. There could be no description about the meaning of the content of metadata. For example, the tag ‘Blackburn School’ might refer to the Primary or the Secondary school and this can result in misuse of the data. Descriptions in crowd-sourced metadata may be ambiguous. For example, one user may assign the tag ‘Property’ to an apartment while another user may use the same tag to refer only to a land parcel. Yet, larger scale folksonomies can address some of the problems of semantics in VGI metadata, as users of a crowdsourced based system tend to notice the current use of ‘terms’ within these systems, and thus are encouraged to use existing terms in order to form connections to related items easily. In this way, folksonomies collectively develop a partial set of metadata standards that are semantically related through ongoing involvement of non-expert geospatial users [6]. High School Melbourne 2010 4 digitised Conclusion Crowd-sourcing metadata will bring the same advantages and disadvantages of crowd sourcing geographic information. Authors and users of VGI will both potentially benefit from the use of crowd sourcing to create metadata for VGI. In addition, the authoritative sources of geospatial data could also benefit from the metadata created for VGI by comparing what is crowd sourced against what is considered to be accurate and complete metadata for the data of similar content. Crowd sourcing metadata can assist with the use of VGI; however, it cannot be a measure of fitness for purpose. One constraint of crowd-sourced VGI metadata is that it is restricted to what users can find (this applies to VGI itself as well) and not the universe of data that might be available. Folksonomies use a user-centric approach and they may only tell the perception of users about what they are able to discover. If one VGI system is the only source users find, even if the content of VGI does not meet their requirements, users may acknowledge its appropriateness. References [1] Coleman, D.J., Geogiadou, Y. And Labonte, J. 2009. Volunteered Geographic Information: The Nature and Motivation of Producers. International Journal of Spatial Data Infrastructure Research 4, 332-358. [2] Coleman, D.J., Sabone, B. And N. Nkhwanana 2010. Volunteering Geographic Information to Authoritative Databases: Linking Contributor Motivations to Program Effectiveness. Geomatica 64, 383-396. [3] Corcoran, P., Mooney, P. And Bertolotto, M. Analysing the growth of OpenStreetMap networks. Spatial Statistics. [4] Goodchild, M. 2007. Citizens as sensors: the world of volunteered geography. GeoJournal 69, 211-221. [5] Green, D. And Bosomair, T. 2001. Online GIS and Spatial Metadata. Francis and Taylor, New York. [6] Kalantari, M., Olfat, H. And Rajabifard, A. 2010. Automatic Spatial Metadata Enrichment:Reducing Metadata Creation Burden through Spatial Folksonomies. In Spatially Enabling Society, Research, Emerging Trends and Critical Assessment, A. RAJABIFARD, J. CROMPVOETS AND M. KALANTARI Eds. Leuven University Press, Leuven, 248. AGILE 2013 – Leuven, May 14-17, 2013 [7] Mathes, A. 2004. Folksonomies - Cooperative Classification and Communication Through Shared Metadata. [8] Mooney, P. And Corcoran, P. 2012. The Annotation Process in OpenStreetMap. Transactions in GIS 16, 561-579. [9] O'reilly, T. 2007. What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. Communications & Strategies, No. 1, p. 17, First Quarter 2007.