position paper

advertisement
Crowd-sourcing metadata for VGI
Mohsen Kalantari
The University of
Melbourne
Australia
saeidks@unimelb.edu.au
Abbas Rajabifard
The University of
Melbourne
Australia
abbas.r@unimelb.edu.au
Hamed Olfat
The University of
Melbourne
Australia
h.olfat@unimelb.edu.au
Ian Williamson
The University of
Melbourne
Australia
ianpw@unimelb.edu.au
Abstract
There has been an increased tendency towards embracing the potential of the Web 2.0 in knowledge creation with users’
contribution becoming significantly important in the development of the open data such as Volunteered Geographic Information
(VGI). An increasing number of volunteers that add value to geospatial data require effective ways of organising and providing
access to value added or newly created data. Metadata plays a critical role in sharing, discovering and utilising VGI. This paper
discusses and demonstrates the requirements of an automatic method to create metadata for VGI. The paper proposes an
approach using folksonomies created based on users’ contributions.
Keywords: Geospatial data, metadata, VGI, automation, tagging, folksonomy, Web 2.0, crowd sourcing
1
Introduction
The power of the Web is being used to collect data and create
knowledge. Linking web pages to each other, websites
offering the collective work of the net users, search engines
that use the network characteristics of the Web rather than just
the content of the Web documents, and user engagement in
buying and selling are examples that the Web environment
collates data from everywhere and everyone [9]. In other
words, users’ contributions have become significantly
important in the development of the World Wide Web
(WWW).
This development is affecting Spatial Data Infrastructures
(SDIs) as well. SDIs are often initiated and coordinated by
government authorities and provide authoritative geospatial
information to users. The data preparation within the SDIs is
usually managed in a process where the quality of geospatial
data is rigorously examined before publication to users.
However, developments in VGI are challenging SDIs [3].
Geospatial information such as roads, addresses, and parcel
maps are being created in initiatives such as OpenStreetMap
(OSM), Wikimapia, and Google My Map. These initiatives
enable the collective intelligence of volunteers to provide
geospatial data to a wider group [1; 2; 4].
However, the data provided by volunteers often lacks
metadata making it difficult to determine how the data has
been created, when, by whom, and possibly its fitness for use
[8].
Since the use of VGI is increasing, in this paper we propose
an approach to facilitate the creation of metadata for VGI. We
propose a method that provides description consistent with the
content of VGI via VGI users. This method potentially reveals
the essence of data as it relies on its users.
2
Need for VGI metadata
The literature in this domain highlights two major issues with
VGI. One is inconsistency because of the range of volunteers
and the different methods they use for creating data. Second is
the lack of efficient specifications by which volunteers can
create reliable information. While inconsistency is natural for
VGI, an increase in the number of volunteers can potentially
assist in solving the first issue. For the second issue,
specifications can be set to facilitate the improvement of VGI.
In the specifications, lineage plays a significant role in
helping the users to assess the suitability for a particular use.
Even as the VGI quality can be improved, understanding its
fitness for use is a significant challenge for the users. As a
potential solution, metadata for VGI can perform an important
function in telling its fitness for use.
Experts and professionals usually create metadata [7]. In the
same way, the bases of the majority of geospatial data
catalogues have been created by professionals. It is a common
practice now to create metadata based on standards and
guidelines of cataloguing, and classification of geospatial data
[5].
However, there is a fundamental issue here. Both
professionally created and author-created VGI metadata have
a similar disadvantage: the users of the VGI are not involved
in the process. Metadata experts and authors may create
metadata that not necessarily addresses the needs of users.
Still the VGI users are disconnected from this process and
their perception and experience of data are ignored. The
significant missing step of both professionally and authorcreated metadata is the ability of users to contribute towards
the VGI, without subscribing to the system and becoming a
registered volunteer.
An alternative approach can be utilised to create metadata
for VGI is through crowd sourcing. The VGI users can use
their own interpretation of a geographic phenomenon to
describe it. Furthermore the users can express their remarks
about the description of data, its quality, fitness for purpose
etc. [6]. In other words, in this way we can link the users to
the process of VGI improvement. If users can create metadata
about VGI, and share their notes with the other users, it will
help with VGI’s discoverability and content. Through this
approach, the collective intelligence of the users will help to
improve the understanding of other users of VGI.
AGILE 2013 – Leuven, May 14-17, 2013
3
Crowd-sourcing VGI metadata
We believe the collection of tags and descriptions by users
can facilitate the generation of metadata for VGI. In a VGI
system, where many users are allowed to tag geographic data,
this collection of tags can become a geographic folksonomy,
that is, a method that can collaboratively create and manage
VGI metadata. We propose two models for the VGI metadata
creation. In the first model we create a database for metadata
by only monitoring users’ interaction where the users are not
aware. In the second model we explicitly allow users for input
to create metadata.
3.1
Figure 2).
Figure 2: A different search result by deleting a relevant
keyword
Implied model
In this model we monitor search words used by VGI users,
analyse them, and then use them as descriptions to create the
content of VGI feature metadata. The implicit model is
streamlined in three steps: monitoring search words, recording
search words, and assigning search words.
The VGI system consisting of geographic data typically
provides users with a facility to find a feature, place or
location. Users query the data using search words. The service
will then find and retrieve the corresponding records with that
search word. The users will be able to view the results and
decide which data is more suitable.
An example is discussed here. The user is seeking
‘Blackburn High School in Victoria, Australia’ in Wikimapia
using the search mechanism provided in the user interface. In
the first instance, the result shows no matching record for this
search (
Figure 1). The first impression is that there is no Blackburn
High School recorded in the Wikimapia.
Figure 1: User searching for Blackburn High School in
Victoria, Australia in Wikimapia
Now, a change in the search words will provide a different
result. By deleting ‘Victoria’ from ‘Blackburn High School in
Victoria, Australia’ the system is able to retrieve Blackburn
High School from the records and present it on the map (
At first glance, ‘Victoria’ seems to be a logical search word,
which should be tagged to the Blackburn High School. One
way to address the issue is to assign Victoria to the Blackburn
High School record manually by an administrator. However,
there are millions of these geographic features, which make it
extremely expensive and impractical to assign keywords and
create metadata for them manually.
Yet the argument that this paper advances is whether
‘Victoria’ as metadata or a tag should be used to describe this
geographic feature needs to be considered from the users’
perceptive. A practical way to address the issues proposed by
this paper is to monitor and track the search words used by
users as collective intelligence when discovering geographic
features.
Here, we identify the search words used by a user for
finding features, places or locations and monitor them during
the discovery step. Now, we record any search word relevant
to a geographic feature identified in a database and form the
basis for VGI metadata records. Here, we discover how many
times a search word can find a geographic data set. If a search
word is used frequently in discovering a feature, it should be
regarded as a keyword that is of significance for the users.
Increasing use of the same search word illustrates its
usefulness. Any frequently utilised search word has the
potential to form the metadata record for the VGI feature and
datasets. Among the search words recorded by any of them,
which have frequently been utilised, will be assigned to the
geographic feature and will be stored as its metadata file.
Through this method, the commonly used search words
(that are semantically linked to features) for finding features
are recorded and made available to other users. This method
will, over a period of use, create a basis for keywords of
metadata records related to geographic features through
applying and refining the appropriate search words (
AGILE 2013 – Leuven, May 14-17, 2013
Figure 2).
3.2
Explicit model
The explicit model creates the VGI metadata content directly
through comments made by the users, as opposed to the
implicit model that indirectly creates metadata via monitoring
the search words used by user.
In the explicit model, users tag a feature based of their
knowledge and understanding. The users label the geographic
information based on their awareness in the context of using
that information which is usually related to their individual
requirements. These tags can be illustrated in a ‘Tag Clou’”.
Within the tag cloud, the tags, which are used more frequently
by the users, will be highlighted and shown in a larger-sized
font (Figure 3). The users are also able to indicate their degree
of agreement with the existing tags in the tag cloud by their
choice of use of a tag.
Figure 3: A demonstrator Tag Cloud representing metadata
for Blackburn High School
Victoria
Blackburn
The method proposed here for creating the VGI metadata
builds on crowd sourcing, relying upon the user at the centre
to describe VGI. On the same incremental basis that authors
of VGI create it, users of VGI can provide descriptions for it.
Therefore, the descriptions will become metadata.
Crowd-sourcing metadata for VGI depends on the number
of VGI users. In this context, the explicit model will rely
heavily on users’ willingness to contribute. On the other hand,
the implicit model described earlier tends to capture search
words to create metadata. This demonstrates that implicit and
explicit methods are complementary, and they are both critical
for the success of the method proposed in this paper.
A disadvantage of crowd-sourcing metadata is that it is
created and maintained by unsupervised users. There could be
no description about the meaning of the content of metadata.
For example, the tag ‘Blackburn School’ might refer to the
Primary or the Secondary school and this can result in misuse
of the data.
Descriptions in crowd-sourced metadata may be ambiguous.
For example, one user may assign the tag ‘Property’ to an
apartment while another user may use the same tag to refer
only to a land parcel.
Yet, larger scale folksonomies can address some of the
problems of semantics in VGI metadata, as users of a crowdsourced based system tend to notice the current use of ‘terms’
within these systems, and thus are encouraged to use existing
terms in order to form connections to related items easily. In
this way, folksonomies collectively develop a partial set of
metadata standards that are semantically related through
ongoing involvement of non-expert geospatial users [6].
High School
Melbourne
2010
4
digitised
Conclusion
Crowd-sourcing metadata will bring the same advantages and
disadvantages of crowd sourcing geographic information.
Authors and users of VGI will both potentially benefit from
the use of crowd sourcing to create metadata for VGI. In
addition, the authoritative sources of geospatial data could
also benefit from the metadata created for VGI by comparing
what is crowd sourced against what is considered to be
accurate and complete metadata for the data of similar
content.
Crowd sourcing metadata can assist with the use of VGI;
however, it cannot be a measure of fitness for purpose. One
constraint of crowd-sourced VGI metadata is that it is
restricted to what users can find (this applies to VGI itself as
well) and not the universe of data that might be available.
Folksonomies use a user-centric approach and they may only
tell the perception of users about what they are able to
discover. If one VGI system is the only source users find,
even if the content of VGI does not meet their requirements,
users may acknowledge its appropriateness.
References
[1]
Coleman, D.J., Geogiadou, Y. And Labonte, J. 2009.
Volunteered Geographic Information: The Nature and
Motivation of Producers. International Journal of Spatial
Data Infrastructure Research 4, 332-358.
[2]
Coleman, D.J., Sabone, B. And N. Nkhwanana 2010.
Volunteering Geographic Information to Authoritative
Databases: Linking Contributor Motivations to Program
Effectiveness. Geomatica 64, 383-396.
[3] Corcoran, P., Mooney, P. And Bertolotto, M. Analysing
the growth of OpenStreetMap networks. Spatial Statistics.
[4] Goodchild, M. 2007. Citizens as sensors: the world of
volunteered geography. GeoJournal 69, 211-221.
[5]
Green, D. And Bosomair, T. 2001. Online GIS and
Spatial Metadata. Francis and Taylor, New York.
[6]
Kalantari, M., Olfat, H. And Rajabifard, A. 2010.
Automatic Spatial Metadata Enrichment:Reducing Metadata
Creation Burden through Spatial Folksonomies. In Spatially
Enabling Society, Research, Emerging Trends and Critical
Assessment, A. RAJABIFARD, J. CROMPVOETS AND M.
KALANTARI Eds. Leuven University Press, Leuven, 248.
AGILE 2013 – Leuven, May 14-17, 2013
[7]
Mathes, A. 2004. Folksonomies - Cooperative
Classification and Communication Through Shared Metadata.
[8]
Mooney, P. And Corcoran, P. 2012. The Annotation
Process in OpenStreetMap. Transactions in GIS 16, 561-579.
[9] O'reilly, T. 2007. What is Web 2.0: Design Patterns and
Business Models for the Next Generation of Software.
Communications & Strategies, No. 1, p. 17, First Quarter
2007.
Download