A Semantic Web Service for Web and Mobile Citizen Science

advertisement
A Semantic Web Service for Web and Mobile
Citizen Science
Yu Chen, Linyun Fu, Yue Liu, Amar Viswanathan,
Joanne S. Luciano, Deborah L. McGuinness
Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA 12180
{cheny18,ful2,liuy18,kannaa,jluciano}@rpi.edu
dlm@cs.rpi.edu
Abstract.
Citizen science evolution experiences knowledge gap between domain experts and citizens that impedes effective collaborations. To empower and facilitate co-operative citizen data collection and data manipulation, we come up
wiht a semantic web service based on Semantic Automated Discovery and Integration (SADI) framework to help create customized UI, RDF data processing
as well as semantic and statistical reasoning such that data are validated and
processed properly. All transactions with this semantic service are in RDF such
that everything in this service be dereferenced and understood by machines.
The service can be invoked with http GET such that various client can take advantage of this semantic service. In order to demonstrate the functionalities of
the semantic service, an example on eutrophication detection of citizen science
project is illustrated in details.
Keywords: keyword: Citizen Science, Semantic web service, Reasoning, Eutrophication, User Interface
1
Introduction
Citizen science is scientific research conducted, in whole or in part, by amateur or
non-professional scientists. Formally, citizen science has been defined as "the systematic collection and analysis of data; development of technology; testing of natural
phenomena; and the dissemination of these activities by researchers on a primarily
avocational basis”[1] Enabled by ubiquitous mobile devices, citizen science has
adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
gained more attention from both the domain experts who enquire pervasive data collection and citizens who care about their surroundings. However, problems such as a
knowledge gap between domain experts and citizens, and different evaluation approaches among experts, becomes an inevitable obstacle to the prospect of world wide
scientific data sharing and processing. In response to the demanding requirements
and the problems involved, lots of manual effort has been spent on knowledge sharing
and mass education, aiming at a consensus on a standard workflow to contribute to
the citizen science repository such that infused data are with high quality. Nevertheless, these progress are neither efficient nor effective; Granted that there are people
acting as trainer patient enough to share their knowledge to whom interested in data
contribution, the trainees might be overwhelmed or even frustrated on those trivial
steps. Even if people know the respective data collection procedures, they may not
necessarily follow the rigorous steps. To address these issues, we developed a
straightforward approach to facilitate data collection, data processing, knowledge
sharing and all the other necessary procedures involved in citizen science using
a semantic web service based on Semantic Automatic Discovery and Integration
framework (SADI). In section 2, a workflow on how data is collected using the SADI
web service is demonstrated. In section 3, related work on facilitating citizen science
is introduced. Section 4 discusses remaining problems and future work on the prospective approach in citizen science.
2
Workflow
2.1
Use case
Eutrophication detection is taken as an example to illustrate the workflow demonstrating how the semantic web service is powering the process of citizen science. Eutrophication is defined as the enrichment of bodies of fresh water by inorganic plant nutrients (e.g. nitrate, phosphate)[2]. It is a general water pollution event that (can be) is
detected by biological, chemical and visual factors. In the following workflow sections, we show how semantic web technologies can help in data collection, data processing and knowledge sharing, etc.
2.2
Web Service.
2.2.1. Service Overview
We used SADI as the core processor for all semantic transactions. SADI
is a framework for discovery of, and interoperability (between) among
distributed data and analytical resources [3]. To be concise, it is more
than an HTTP GET service with all transactions files in the format of
RDF. The service facilitates automated use of a service based on the
semantic description of the service, without knowing the service’s operational details; the service self-description provides input and output
classes in RDF such that as long as the user or program issue a HTTP
GET request with appropriate RDF data, a processed RDF can be sent
back as the output. In this paper, the SADI service acts as a processor
for RDF data and as a template specification for the construction of the
user interface. The following section explains in detail how we utilized
the service.
2.2.2 Get Service Description
Before everything, a simple http GET request from client to the service
IP address will invoke the service description such that the client knows
how the UI is constructed and what parameters to provide in order to
interact with this service. A ‘curl’ command will function the same in
the terminal of Linux machine like this:
$curl http://leo.tw.rpi.edu:9109/KLAdaptor
Then we obtian the service description:
@prefix citizenscience:
<http://leo.tw.rpi.edu/projects/citizensscience/eutrophi
cationOntology.owl#> .
@prefix mygrid: <http://www.mygrid.org.uk/mygridmoby-service#> .
@prefix protegedc:
<http://protege.stanford.edu/plugins/owl/dc/protegedc.owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdfschema#> .
@prefix xsd:
<http://www.w3.org/2001/XMLSchema#> .
<#> a mygrid:serviceDescription;
rdfs:label "KLAdaptor";
mygrid:hasOperation <#operation>;
mygrid:hasServiceDescriptionText "Offering an index that shows the probability of eutrophication given
a set of related parameters";
mygrid:hasServiceNameText "KLAdaptor";
mygrid:providedBy
<http://code.google.com/p/surfrdf/3dff93b3-b1994dc3-aec6-97befd659e76>;
rdfs:comment "" .
<#input> a mygrid:parameter;
mygrid:objectType citizenscience:Eutrophication .
<#operation> a mygrid:operation;
mygrid:inputParameter <#input>;
mygrid:outputParameter <#output> .
<#output> a mygrid:parameter;
mygrid:objectType citizenscience:eutrophicationIndex .
<http://code.google.com/p/surfrdf/3dff93b3-b1994dc3-aec6-97befd659e76> a mygrid:organisation;
protegedc:creator "cheny18@rpi.edu";
mygrid:authoritative true .
So by parsing this rdf/turtle file, it can be figured out that the service offer a computation and reasoning of the eutrophication event based on the input of a class citizenscience:Eutrophication and will output a class of citizenscience:EutrophicationIndex.
The parser will continue on dereferencing the citiznescience:Eutrophication class by
knowing what parameters are required by finding the owl:equivalentClass
<owl:Class
rdf:about="http://www.semanticweb.org/ontologies/20
12/3/EutrophicationObservation.owl#Eutrophication"
>
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf
rdf:parseType="Collection">
<rdf:Description
rdf:about="http://www.semanticweb.org/ontologies/20
12/3/EutrophicationObservation.owl#phenomena"/>
<owl:Restriction>
<owl:onProperty
rdf:resource="http://www.semanticweb.org/ontologies
/2012/3/EutrophicationObservation.owl#hasBiological
Indicator"/>
<owl:cardinality
rdf:datatype="&xsd;nonNegativeInteger">1</owl:car
dinality>
</owl:Restriction>
<owl:Restriction>
<owl:onProperty
rdf:resource="http://www.semanticweb.org/ontologies
/2012/3/EutrophicationObservation.owl#hasChemicalI
ndicator"/>
<owl:cardinality
rdf:datatype="&xsd;nonNegativeInteger">3</owl:car
dinality>
</owl:Restriction>
<owl:Restriction>
<owl:onProperty
rdf:resource="http://www.semanticweb.org/ontologies
/2012/3/EutrophicationObservation.owl#hasVisualIndi
cator"/>
<owl:cardinality
rdf:datatype="&xsd;nonNegativeInteger">3</owl:car
dinality>
</owl:Restriction>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
<owl:disjointWith
rdf:resource="http://www.semanticweb.org/ontologies
/2012/3/EutrophicationObservation.owl#Indicator"/>
</owl:Class>
which gives a further description that it needs: 1 biological indicator, 3 chemical indicators, and 3 visual indicators. Then cascade dereferencing will be conducted along
owl:onProperty until the client obtains all related properties.
2.2.3 Creating UI according to RDF restriction
According to the class owl:equivalentClass and owl:onProperty, the
client software will dereference further to build necessary and sufficient
UI component satisfying the required SADI service input. In our preliminary scheme, we proposed the following mapping between the UI components with RDF/OWL vocabulary shown below.
RDF/OWL
vocabulary
UI component
<owl:unionOf
rdf:parseType
="Collection"
>
ComboBox
<owl:someVal
uesFrom>
Textfield for numbers
with range
<owl:onDataty
pe
rdf:resource="
&xsd;float"/
Textfield with datatype
restriction
<xsd:maxInclu
sive
rdf:datatype="
&xsd;integer"
>300</xsd:ma
xInclusive>
Textfield with integer
number of maximum
of 300
<xsd:minInclu
sive
rdf:datatype="
&xsd;integer"
>0</xsd:minIn
clusive>
Textfield with integer
number of minimum of
0
And then the UI is created:
Fig. 1. User Login
Fig. 2. Interface for Primary User
Fig. 3. Interface for Advanced User
In this sense, the somewhat complicated Eutrophication event is translated into a description of explicit details that non-technical citizens are able to provide. The restrictions on the data entry fields help authenticate whether the citizen has provided
valid input and offer suggestions if not, all according to the ontology. In this way,
logical semantic predicates are being used to bridge the knowledge gap between do-
main experts and citizens. Moreover, application development becomes easier because a great proportion of the UI design can be ignored. The ontology plays a vital
and non-substitutable role in this scenario.
2.2.4. Serialized input data into RDF and send it to the service
After collecting user’s input data from UI, they are serialized into RDF
and sent to the SADI service. An example of serialized data is given in
Figure 4.5.
@prefix void: <http://rdfs.org/ns/void#> .
@prefix citizenscience:
<http://leo.tw.rpi.edu/projects/citizensscience/eutro
phicationOntology.owl#>.
@prefix
xsd: <http://www.w3.org/2001/XMLSchema#>.
<http://citizenscience.org/observation/eutrophicatio
n>
a citizenscience:Eutrophication;
citizenscience:waterColor "Red";
citizenscience:deadFish "Many";
citizenscience:algaeVolume "Many";
citizenscience:oxygenVolume "2"^^xsd:float;
citizenscience:nitrogenVolume
"200"^^xsd:float;
citizenscience:phosphorusVolume
"77"^^xsd:float;
citizenscience:silicaVolume "0.2"^^xsd:float;
citizenscience:planktonVolume
"88"^^xsd:float .
2.2.5. Service perform reasoning
The service will perform reasoning using FuXi API [4] as well as computation using the statistical API in Python. We used the computation
model for the eutrophication event given in [5][6].
Detection algorithm used:
The process of eutrophication is described as follows:
All the elements from the left, which collected according to the ontology,
contribute to the chemicals that cause eutrophication event. Considering
the different amount of elements on the left that form the chemical on
the right, we adjust the weight of the each observed element to evaluate
the probability of eutrophication. Since reasoning based on ontology
could only provide information as to whether there is are items missing
to form the necessary and sufficient conditions, statistics analysis are
required to offer quantified reliable indicator. The calculated TNI according to the algorithm below are used to give a statistical indicator on
the eutrophication
where, TNI is the sum of indexes of all nutrient parameters, TNIj is the TNI of j parameter, Wj is the proportion of j parameter in the TNI, and rij is the relation of chlorophyll a (Chla) to other parameters. The available parameters concerned include total
nitrogen (TN), total phosphorus (TP), Chla, dissolved oxygen (DO), chemical oxygen
demand by K2MnO4 oxidation method (CODMn), biological oxygen demand
(BOD5), etc., and TN, TP and Chla are selected for calculating the TNI.
To get the probability of eutrophication, we referenced the surface water
criteria from Environmental Protection Agency of China[7], which the
author in the previous chemical paper cited..
Fig. 4. Eutrophication threshold from Environment Protection Agency of China
To be concise, the probability is a weighted sum of all the relevant parameters.
2.2.6. Return results in RDF to client
After reasoning and computation on the service side, the processed RDF
is returned back to the client. A example from feedback is shown below:
@prefix citizenscience:
<http://leo.tw.rpi.edu/projects/citizensscience/eutro
phicationOntology.owl#> .
@prefix xsd:
<http://www.w3.org/2001/XMLSchema#> .
<http://citizenscience.org/observation/eutrophicatio
n> a citizenscience:eutrophicationIndex;
citizenscience:Eutrophication "YES";
citizenscience:EutrophicationIndex "6.583.e01"^^xsd:float .
So we can tell that the input data given by the observation from the user could indicate an Eutrophication event with a probability of 65.8%. This piece of data will be
feedback to the client so the client can send a report containing reasoning results
along with provenance information, such as the location, time, observer’s profile and
photos at the observation point to a demanded party.
In conclusion, the workflow can be illustrated as the graph below:
Fig. 5. Semantic Web Service Workflow
3
Related work
Numerous of works have been done on instructing the citizens to help collect the data
to some party’s interests. EpiCollect build a platform to facilitate environmental scientist public data collection project and enquire data collection on mobile devices.
mCrowd resembles EpiCollect but provides RESTful API to their service in case
some sites are interested using their project publication service. However, they still
need to mind the knowledge gap while designing any data collection schema such that
user can easily understand and participate, which is much significant in terms of
scalability issues.
4
Future work and Discussion
In this poster, we discuss and implement the semantic web service responsible for ontology-based reasoning and statistical computation.
However, in response to nature of scientific research, much complicated
computational models are required to precisely simulate the actual process. In this case, we need rather powerful semantic processing and even
machine learning capabilities. Google Prediction API provides handy
interface to conduct machine learning operations on massive data and it
will help boost the power of semantic service in a order of magnitude.
So we will take that into the advaned version of this system.
Knowledge transparency is also vital in evaluating a good semantic
service. Problems such as how to visualize the actual process such that
domain experts can better evaluate the model, how to manage the ontology evolution such that service quality can be defined, recorded and
managed are all subject to proper solutions. In the future, more efforts
will be spent on how to represent the knowledge to domain expert who
don’t have semantic web background, i.e. don’t know either ontology or
OWL grammar, in such as way that they can understand and make modifications.
5
Conclusion
In this paper, a new semantic web service framework based on SADI for citizen science is proposed and demonstrated which facilitate data collection, data processing
and knowledge sharing. Detailed semantic transactions are illustrated to showcase
how semantic web technology is helped in eutrophication detection as a concrete
example. Related works on citizen science as well as their drawbacks are discussed.
Possible future works that extend this framework is also discussed.
Reference
[1] Citizen science
http://www.openscientist.org/2011/09/finalizing-definition-of-citizen.html
[2]Eutrophicaiton definition by USGC
http://toxics.usgs.gov/definitions/eutrophication.html
[3] Semantic Automatic Discovery and Integraion service framework
http://sadiframework.org/content/
[4] FuXi Python Reasoning API http://code.google.com/p/fuxi/
[5]Biological and chemical indicators of eutrophication in the Yellowstone River and major tributaries during August 2000, USGS
[6] Xiao-e Yang, Xiang Wu, Hu-lin Hao,and Zhen-li He, Mechanisms
and assessment of water eutrophication, Journal of Zhejiang University
[7] CNEPA (Environmental Protection Agency of China) Environmental
Quality Standard for Surface Water. GB 3838-2002. 2002
Download