A Semantic Web Service for Web and Mobile Citizen Science Yu Chen, Linyun Fu, Yue Liu, Amar Viswanathan, Joanne S. Luciano, Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, NY, USA 12180 {cheny18,ful2,liuy18,kannaa,jluciano}@rpi.edu dlm@cs.rpi.edu Abstract. Citizen science evolution experiences knowledge gap between domain experts and citizens that impedes effective collaborations. To empower and facilitate co-operative citizen data collection and data manipulation, we come up wiht a semantic web service based on Semantic Automated Discovery and Integration (SADI) framework to help create customized UI, RDF data processing as well as semantic and statistical reasoning such that data are validated and processed properly. All transactions with this semantic service are in RDF such that everything in this service be dereferenced and understood by machines. The service can be invoked with http GET such that various client can take advantage of this semantic service. In order to demonstrate the functionalities of the semantic service, an example on eutrophication detection of citizen science project is illustrated in details. Keywords: keyword: Citizen Science, Semantic web service, Reasoning, Eutrophication, User Interface 1 Introduction Citizen science is scientific research conducted, in whole or in part, by amateur or non-professional scientists. Formally, citizen science has been defined as "the systematic collection and analysis of data; development of technology; testing of natural phenomena; and the dissemination of these activities by researchers on a primarily avocational basis”[1] Enabled by ubiquitous mobile devices, citizen science has adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011 gained more attention from both the domain experts who enquire pervasive data collection and citizens who care about their surroundings. However, problems such as a knowledge gap between domain experts and citizens, and different evaluation approaches among experts, becomes an inevitable obstacle to the prospect of world wide scientific data sharing and processing. In response to the demanding requirements and the problems involved, lots of manual effort has been spent on knowledge sharing and mass education, aiming at a consensus on a standard workflow to contribute to the citizen science repository such that infused data are with high quality. Nevertheless, these progress are neither efficient nor effective; Granted that there are people acting as trainer patient enough to share their knowledge to whom interested in data contribution, the trainees might be overwhelmed or even frustrated on those trivial steps. Even if people know the respective data collection procedures, they may not necessarily follow the rigorous steps. To address these issues, we developed a straightforward approach to facilitate data collection, data processing, knowledge sharing and all the other necessary procedures involved in citizen science using a semantic web service based on Semantic Automatic Discovery and Integration framework (SADI). In section 2, a workflow on how data is collected using the SADI web service is demonstrated. In section 3, related work on facilitating citizen science is introduced. Section 4 discusses remaining problems and future work on the prospective approach in citizen science. 2 Workflow 2.1 Use case Eutrophication detection is taken as an example to illustrate the workflow demonstrating how the semantic web service is powering the process of citizen science. Eutrophication is defined as the enrichment of bodies of fresh water by inorganic plant nutrients (e.g. nitrate, phosphate)[2]. It is a general water pollution event that (can be) is detected by biological, chemical and visual factors. In the following workflow sections, we show how semantic web technologies can help in data collection, data processing and knowledge sharing, etc. 2.2 Web Service. 2.2.1. Service Overview We used SADI as the core processor for all semantic transactions. SADI is a framework for discovery of, and interoperability (between) among distributed data and analytical resources [3]. To be concise, it is more than an HTTP GET service with all transactions files in the format of RDF. The service facilitates automated use of a service based on the semantic description of the service, without knowing the service’s operational details; the service self-description provides input and output classes in RDF such that as long as the user or program issue a HTTP GET request with appropriate RDF data, a processed RDF can be sent back as the output. In this paper, the SADI service acts as a processor for RDF data and as a template specification for the construction of the user interface. The following section explains in detail how we utilized the service. 2.2.2 Get Service Description Before everything, a simple http GET request from client to the service IP address will invoke the service description such that the client knows how the UI is constructed and what parameters to provide in order to interact with this service. A ‘curl’ command will function the same in the terminal of Linux machine like this: $curl http://leo.tw.rpi.edu:9109/KLAdaptor Then we obtian the service description: @prefix citizenscience: <http://leo.tw.rpi.edu/projects/citizensscience/eutrophi cationOntology.owl#> . @prefix mygrid: <http://www.mygrid.org.uk/mygridmoby-service#> . @prefix protegedc: <http://protege.stanford.edu/plugins/owl/dc/protegedc.owl#> . @prefix rdfs: <http://www.w3.org/2000/01/rdfschema#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <#> a mygrid:serviceDescription; rdfs:label "KLAdaptor"; mygrid:hasOperation <#operation>; mygrid:hasServiceDescriptionText "Offering an index that shows the probability of eutrophication given a set of related parameters"; mygrid:hasServiceNameText "KLAdaptor"; mygrid:providedBy <http://code.google.com/p/surfrdf/3dff93b3-b1994dc3-aec6-97befd659e76>; rdfs:comment "" . <#input> a mygrid:parameter; mygrid:objectType citizenscience:Eutrophication . <#operation> a mygrid:operation; mygrid:inputParameter <#input>; mygrid:outputParameter <#output> . <#output> a mygrid:parameter; mygrid:objectType citizenscience:eutrophicationIndex . <http://code.google.com/p/surfrdf/3dff93b3-b1994dc3-aec6-97befd659e76> a mygrid:organisation; protegedc:creator "cheny18@rpi.edu"; mygrid:authoritative true . So by parsing this rdf/turtle file, it can be figured out that the service offer a computation and reasoning of the eutrophication event based on the input of a class citizenscience:Eutrophication and will output a class of citizenscience:EutrophicationIndex. The parser will continue on dereferencing the citiznescience:Eutrophication class by knowing what parameters are required by finding the owl:equivalentClass <owl:Class rdf:about="http://www.semanticweb.org/ontologies/20 12/3/EutrophicationObservation.owl#Eutrophication" > <owl:equivalentClass> <owl:Class> <owl:intersectionOf rdf:parseType="Collection"> <rdf:Description rdf:about="http://www.semanticweb.org/ontologies/20 12/3/EutrophicationObservation.owl#phenomena"/> <owl:Restriction> <owl:onProperty rdf:resource="http://www.semanticweb.org/ontologies /2012/3/EutrophicationObservation.owl#hasBiological Indicator"/> <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">1</owl:car dinality> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="http://www.semanticweb.org/ontologies /2012/3/EutrophicationObservation.owl#hasChemicalI ndicator"/> <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">3</owl:car dinality> </owl:Restriction> <owl:Restriction> <owl:onProperty rdf:resource="http://www.semanticweb.org/ontologies /2012/3/EutrophicationObservation.owl#hasVisualIndi cator"/> <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">3</owl:car dinality> </owl:Restriction> </owl:intersectionOf> </owl:Class> </owl:equivalentClass> <owl:disjointWith rdf:resource="http://www.semanticweb.org/ontologies /2012/3/EutrophicationObservation.owl#Indicator"/> </owl:Class> which gives a further description that it needs: 1 biological indicator, 3 chemical indicators, and 3 visual indicators. Then cascade dereferencing will be conducted along owl:onProperty until the client obtains all related properties. 2.2.3 Creating UI according to RDF restriction According to the class owl:equivalentClass and owl:onProperty, the client software will dereference further to build necessary and sufficient UI component satisfying the required SADI service input. In our preliminary scheme, we proposed the following mapping between the UI components with RDF/OWL vocabulary shown below. RDF/OWL vocabulary UI component <owl:unionOf rdf:parseType ="Collection" > ComboBox <owl:someVal uesFrom> Textfield for numbers with range <owl:onDataty pe rdf:resource=" &xsd;float"/ Textfield with datatype restriction <xsd:maxInclu sive rdf:datatype=" &xsd;integer" >300</xsd:ma xInclusive> Textfield with integer number of maximum of 300 <xsd:minInclu sive rdf:datatype=" &xsd;integer" >0</xsd:minIn clusive> Textfield with integer number of minimum of 0 And then the UI is created: Fig. 1. User Login Fig. 2. Interface for Primary User Fig. 3. Interface for Advanced User In this sense, the somewhat complicated Eutrophication event is translated into a description of explicit details that non-technical citizens are able to provide. The restrictions on the data entry fields help authenticate whether the citizen has provided valid input and offer suggestions if not, all according to the ontology. In this way, logical semantic predicates are being used to bridge the knowledge gap between do- main experts and citizens. Moreover, application development becomes easier because a great proportion of the UI design can be ignored. The ontology plays a vital and non-substitutable role in this scenario. 2.2.4. Serialized input data into RDF and send it to the service After collecting user’s input data from UI, they are serialized into RDF and sent to the SADI service. An example of serialized data is given in Figure 4.5. @prefix void: <http://rdfs.org/ns/void#> . @prefix citizenscience: <http://leo.tw.rpi.edu/projects/citizensscience/eutro phicationOntology.owl#>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. <http://citizenscience.org/observation/eutrophicatio n> a citizenscience:Eutrophication; citizenscience:waterColor "Red"; citizenscience:deadFish "Many"; citizenscience:algaeVolume "Many"; citizenscience:oxygenVolume "2"^^xsd:float; citizenscience:nitrogenVolume "200"^^xsd:float; citizenscience:phosphorusVolume "77"^^xsd:float; citizenscience:silicaVolume "0.2"^^xsd:float; citizenscience:planktonVolume "88"^^xsd:float . 2.2.5. Service perform reasoning The service will perform reasoning using FuXi API [4] as well as computation using the statistical API in Python. We used the computation model for the eutrophication event given in [5][6]. Detection algorithm used: The process of eutrophication is described as follows: All the elements from the left, which collected according to the ontology, contribute to the chemicals that cause eutrophication event. Considering the different amount of elements on the left that form the chemical on the right, we adjust the weight of the each observed element to evaluate the probability of eutrophication. Since reasoning based on ontology could only provide information as to whether there is are items missing to form the necessary and sufficient conditions, statistics analysis are required to offer quantified reliable indicator. The calculated TNI according to the algorithm below are used to give a statistical indicator on the eutrophication where, TNI is the sum of indexes of all nutrient parameters, TNIj is the TNI of j parameter, Wj is the proportion of j parameter in the TNI, and rij is the relation of chlorophyll a (Chla) to other parameters. The available parameters concerned include total nitrogen (TN), total phosphorus (TP), Chla, dissolved oxygen (DO), chemical oxygen demand by K2MnO4 oxidation method (CODMn), biological oxygen demand (BOD5), etc., and TN, TP and Chla are selected for calculating the TNI. To get the probability of eutrophication, we referenced the surface water criteria from Environmental Protection Agency of China[7], which the author in the previous chemical paper cited.. Fig. 4. Eutrophication threshold from Environment Protection Agency of China To be concise, the probability is a weighted sum of all the relevant parameters. 2.2.6. Return results in RDF to client After reasoning and computation on the service side, the processed RDF is returned back to the client. A example from feedback is shown below: @prefix citizenscience: <http://leo.tw.rpi.edu/projects/citizensscience/eutro phicationOntology.owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://citizenscience.org/observation/eutrophicatio n> a citizenscience:eutrophicationIndex; citizenscience:Eutrophication "YES"; citizenscience:EutrophicationIndex "6.583.e01"^^xsd:float . So we can tell that the input data given by the observation from the user could indicate an Eutrophication event with a probability of 65.8%. This piece of data will be feedback to the client so the client can send a report containing reasoning results along with provenance information, such as the location, time, observer’s profile and photos at the observation point to a demanded party. In conclusion, the workflow can be illustrated as the graph below: Fig. 5. Semantic Web Service Workflow 3 Related work Numerous of works have been done on instructing the citizens to help collect the data to some party’s interests. EpiCollect build a platform to facilitate environmental scientist public data collection project and enquire data collection on mobile devices. mCrowd resembles EpiCollect but provides RESTful API to their service in case some sites are interested using their project publication service. However, they still need to mind the knowledge gap while designing any data collection schema such that user can easily understand and participate, which is much significant in terms of scalability issues. 4 Future work and Discussion In this poster, we discuss and implement the semantic web service responsible for ontology-based reasoning and statistical computation. However, in response to nature of scientific research, much complicated computational models are required to precisely simulate the actual process. In this case, we need rather powerful semantic processing and even machine learning capabilities. Google Prediction API provides handy interface to conduct machine learning operations on massive data and it will help boost the power of semantic service in a order of magnitude. So we will take that into the advaned version of this system. Knowledge transparency is also vital in evaluating a good semantic service. Problems such as how to visualize the actual process such that domain experts can better evaluate the model, how to manage the ontology evolution such that service quality can be defined, recorded and managed are all subject to proper solutions. In the future, more efforts will be spent on how to represent the knowledge to domain expert who don’t have semantic web background, i.e. don’t know either ontology or OWL grammar, in such as way that they can understand and make modifications. 5 Conclusion In this paper, a new semantic web service framework based on SADI for citizen science is proposed and demonstrated which facilitate data collection, data processing and knowledge sharing. Detailed semantic transactions are illustrated to showcase how semantic web technology is helped in eutrophication detection as a concrete example. Related works on citizen science as well as their drawbacks are discussed. Possible future works that extend this framework is also discussed. Reference [1] Citizen science http://www.openscientist.org/2011/09/finalizing-definition-of-citizen.html [2]Eutrophicaiton definition by USGC http://toxics.usgs.gov/definitions/eutrophication.html [3] Semantic Automatic Discovery and Integraion service framework http://sadiframework.org/content/ [4] FuXi Python Reasoning API http://code.google.com/p/fuxi/ [5]Biological and chemical indicators of eutrophication in the Yellowstone River and major tributaries during August 2000, USGS [6] Xiao-e Yang, Xiang Wu, Hu-lin Hao,and Zhen-li He, Mechanisms and assessment of water eutrophication, Journal of Zhejiang University [7] CNEPA (Environmental Protection Agency of China) Environmental Quality Standard for Surface Water. GB 3838-2002. 2002