- Tetherless World Constellation

Towards Semantically-enabled Exploration and Analysis of Environmental Ecosystems Ping Wang, Linyun Fu, Evan W. Patton, Deborah L. McGuinness Joshua Dein, Sky Bristol U.S. Geological Survey City, USA {sbristol, fjdein}@usgs.gov Tetherless World Constellation Rensselaer Polytechnic Institute line 3: Troy, USA {wangp5, ful2, pattoe, dlm}@cs.rpi.edu Abstract—We aim to provide a broad and deep range of decision support tools for resource managers who need to examine large complex ecosystems and make recommendations in the face of many tradeoffs and conflicting drivers. We take a semantic technology approach, leveraging background ontologies and the growing body of open linked data. In previous work, we designed and implemented a semantically-enabled environmental monitoring framework called SemantEco and used it to build a water quality portal named SemantAqua. In this work, we significantly extend SemantEco to include knowledge required to support resource decisions concerning endangered species and their habitats. Our previous system included foundational ontologies to support environmental regulation violations, and relevant human health effects. Our enhanced framework includes foundational ontologies to support modeling of wildlife observation and wildlife health impacts, thereby enabling deeper and broader support for large ecosystem analysis in the face of environmental pollution. Our results include a refactored and expanded version of the SemantEco portal. Additionally the updated system is now compatible with the emerging best in class Extensible Observation Ontology (OBOE). A wider range of relevant data has been integrated, focusing on additions concerning wildlife health. The resulting system stores and exposes provenance concerning where the data came from, how it was used, and also the rationale for choosing the data. In this paper, we describe the system, highlight its research contributions, and describe current and envisioned usage. Keywords-Semantic Web; Semantic Environmental Informatics; Provenance; Ecological Data Integration; I. INTRODUCTION In many places around the world, wildlife and their habitats on which they depend are deteriorating. For instance, almost 40 percent of the United States’ freshwater fish species are considered at risk or vulnerable to extinction according to [1]. Aiming at preserving the environment and wildlife, scientists and resource managers have initiated various efforts to monitor ecological and environmental trends, investigate causes and possible effects of pollution, and identify threats to wildlife and their habitats [2]. Meanwhile, information technology experts building environmental information systems information technologies to improve access to concerning ecological and environmental data. have been and using information In previous work [3], [4], we proposed the Tetherless World Constellation Semantic Ecology and Environment Portal (SemantEco) as both an environment portal application and as an example of a semantic infrastructure for environmental informatics applications. In this paper, we extend the focus of SemantEco beyond water quality and related health effects to a more comprehensive effort including endangered species and their related health effects. This extension provides a broader focus to reach out to an ecosystem perspective where one focus now is on supporting resource managers as they attempt to make decisions about more complex ecosystems. To realize these extension goals and make the portal more reusable, extensible, and possibly lower barriers to adoption by environmental and observational communities, some issues needed to be addressed. Challenges came in a number of high level categories including terminology, data integration, provenance, and scalability. Terminology: In our previous iterations of SemantAqua 1 and SemantEco, we built our SemantEco ontology family that was driven by the use case generated requirements. This approach worked well in that it yielded a relatively small ontology that directly met our application needs. We modularized the ontologies to according to domain (thus there is a water module 2 , an air module 3 , a general contaminantthreshold layer4, etc.) The basic ontology structure has held up to a number of extensions and one the nice properties of it is the relative simplicity for pollution detection based on regulations. However, it was built by lay people with respect to observations and environmental data. Initially, we did not want to adopt larger environmental ontologies since their breadth was more than was needed, and some lacked depth. Now however as we move to a setting where more breadth is useful as larger ecosystems are considered, and as we move to a setting where we hope are engaging with environmentalists, it has become useful to make connections to ontologies that were already familiar in environmental communities interested in scientific observation data. 1 http://aquarius.tw.rpi.edu/projects/semantaqua/ http://escience.rpi.edu/ontology/2/0/water.owl# 3 http://escience.rpi.edu/ontology/2/0/air.owl# 4 http://escience.rpi.edu/ontology/semanteco/2/0/pollution.owl# 2 Data Integration: Quality wildlife observation and environmental data are becoming increasingly available on the web. However, it is often difficult to access, particularly when users are interested in understanding the data enough to integrated data from multiple sources and use it in their own applications. While some data sets and services are doing a better job at providing some documentation, the documentation is usually in natural language, possibly separated from and not in sync with the data, or even incomplete. The semantics of the data are not explicitly captured, and thus it is often difficult for humans, let alone computers, to understand or reason over the data. Scalability: Our initial efforts focused on a few US states worth of data. When we scaled up to include data that covered the entire United States just for water quality and facilities monitoring, we reached over 5 billion triples. As the project now expands to endangered species and use cases begin to address more frequent observations, the scale of the raw data expands. Querying and reasoning over large collections of data can be time consuming, thus we are exploring techniques that allow the portal to scale while still doing relevant reasoning in a timely manner. Provenance: Our portal has always captured provenance concerning where data was retrieved from, when the retrieval happened, and any manipulations that were done to the data. Further, it has used that information to provide some provenance-aware features that allow access, for example, to summary views that use only some resources or reason against particular regulations. Our collaborator Sky Bristol from USGS however asked for additional provenance – capturing the rationale for data or data services choices as well as manipulation choices. As pointed out by Sky, rationale that explains the choices made during data manipulation is helpful for portal users to obtain a deeper understanding of the data integration. Additionally, the rationale can be invaluable in helping data and data service providers determine the metadata and service characteristics they should provide to see increased usage of their services. We use semantic technologies to provide responses to these challenges. Our updated system is compatible with the Extensible Observation Ontology (OBOE) [5]. This ontology is aimed to support interoperable observation data. Our ontology family complements OBOE by providing modules aimed at supporting environmental monitoring and potential correlations to observation data. Additionally, we integrated various ecological and environmental data: wildlife observation data from the Avian Knowledge Network (AKN 5 ), and U.S. Geological Survey (USGS6); environmental criteria for wildlife from the Environmental Protection Agency (EPA7); water body data from USGS, and health effects of contaminants on wildlife from Wildpro8. Our approach provides a formal encoding of the semantics of the data and provides services for automatic reasoning and visualizations over the data. Furthermore, we compared the performance of a standard reasoner with our 5 http://www.avianknowledge.net http://www.usgs.gov/ 7 http://www.epa.gov/ 8 http://wildpro.twycrosszoo.org customized rule based reasoner over our data. Lastly, we enhanced our provenance support by incorporating rationale as provenance. In this paper, section 2 and 3 elaborate how semantic web technologies have been used to extend and improve the portal, including extension for wildlife monitoring, connecting to OBOE, capturing rationale as provenance, and reasoned comparison. Section 4 reviewed related work. Section 5 discusses impacts, highlights, and future directions. Section 6 presents the conclusion. II. EXTENSION FOR WILDLIFE MONITORING A. Use Case The USGS provides integrated science and methodology to support the Wyoming Landscape Conservation Initiative (WLCI9): an effort to assess and enhance aquatic and terrestrial habitats at a landscape scale in southwest Wyoming. One vision that the USGS team has for WLCI is to produce a decision support system for resource managers that facilitates examination of the many tradeoffs and conflicting drivers at work in the focus area, from energy, agricultural, and agricultural development to fish and wildlife conservation. Our USGS collaborators are interested in building both the decision support system for Wyoming and the infrastructure that supports the building of such systems for other states with semantic science and technologies. To this end, we designed the following use case to identify necessary extensions to the portal for wildlife monitoring: The resource manager chooses a geographic region of interest by entering a zip code and the species of concern in the species facet. The portal identifies polluted water sources and polluting facilities, and visualizes the results on a map using different icons. Meanwhile, the portal displays the distribution of the species in the region. Then, the resource manager views the map to find out if the selected species might be endangered by water pollution in the region. The resource manager can click on polluting facilities or polluted sites to investigate more about the pollution, e.g. the health effects of the pollution on the species. To realize the use case, we enhanced our ontology for modeling the domain of wildlife observation, integrated wildlife observation data according to linked data principles [6], and developed visualizations to present the data and provenance. B. Ontology for Wildlife Monitoring There are a number of existing ontologies for modeling and publishing RDF data about species descriptions [7], [8]. After reviewing these ontologies, we choose to reuse the Geospecies ontology for the purpose of modeling the domain of wildlife monitoring as it contains most of the concepts required by the use case. For example, the Geospecies ontology defines the 6 Identify applicable sponsor/s here. (sponsors) 9 http://www.wlci.gov/ classes Observation and SpeciesConcept, and links the two classes with object properties hasObservation and hasSpecies. However, the Observation class from Geospecies does not capture the observed habitat of the wildlife and the date of the observation. Thus, in the extension10, we introduce a new class WildlifeObservation, which is the subclass of the Observation class from Geospecies, but enhanced with two properties: hasHabitat and hasDate. A subset of our ontology extension is illustrated in Fig. 1. Water Body Data: The USGS National Hydrography Dataset (NHD) services are used to get HUC codes given locations on the map 14 , and the data for the water body shapes15. Health Data: We obtain the health effects on wildlife from the research effort Wildpro, an electronic encyclopedia and library for wildlife. Each contaminant can cause different adverse effects on different wildlife species. For example, when exposed to excessive Zinc concentrations, mallards exhibited leg paralysis and decreased food consumption while invertebrates exhibited decreased growth rate and increased mortality [2]. To help researchers investigate health impacts of contaminants on wildlife species, we refactored our ontologies to include the class HealthEffect to model the potential health effects of overexposure to contaminants 11 . The property isCausedBy establishes the relationship between each effect and its causing contaminant and the property forSpecies links each effect with its target species. Figure 1. Subset of wildlife ontology C. Wildlife Data Integration 1) Source Data Bird observation data: One major source of bird observation data is AKN, which an international effort of government and non-government institutions to understand the patterns and dynamics of bird populations across the Western Hemisphere. We obtain a subset of the eBird Reference Dataset (3.0) [9] from AKN via its database query interface. The datasets are based on reported observations from novice and experienced bird observers and contains count data for bird species, the location where observation took place, and time observation started. Fish observation data: The National Fisheries Data Infrastructure brings together local and regional fisheries information systems and provides fisheries managers and decision-makers with one source of comprehensive data and information of fish species12. We fetch fish observation data from its query interface. The fish dataset includes the species name, the hydrological unit code (HUC) 13 of the watersheds where the fish species is observed, the date of the observation, and the originating database. Regulation Data: We integrate EPA's compilation of national recommended water quality criteria [10], which is presented as a summary table containing recommended water quality criteria for the protection of aquatic life and human health in surface water for approximately 150 contaminants. 10 http://escience.rpi.edu/ontology/semanteco/3/0/wildlife.owl http://escience.rpi.edu/ontology/semanteco/3/0/wildlifehealtheffect.owl 12 http://ecosystems.usgs.gov/fisheriesdata/querybystate.aspx 13 HUC8 is an 8-digit hydrological unit code identifying a sub-basin area of size around 700 square miles. See http://en.wikipedia.org/wiki/Hydrological_code 11 2) Data Conversion The general-purpose csv2rdf4lod tool provides us with the capability of quick and easy data integration [11]. We provide the converter with declarative parameters that map properties of the raw data to terms defined in ontologies. For example, the field “Common Name” in the eBird dataset is mapped to the property geospecies16:hasCommonName. Using this mapping, the converter is able to generate RDF triples compatible to our ontologies from the raw tabular data provided by our data sources We use the same regulation ontology design and conversion workflow with our previous work [4] to map the rules in wildlife regulations to OWL [12] classes. 3) Data Visualization We support two types of visualizations: (1) map visualization that displays the sources of the water pollution together with species habitat in the context of geographic regions and (2) time series visualization that depicts species count over time with respect to a particular geographic region. The map visualization gets the sources of the water pollution from the back-end reasoner and the species habitats by querying the triple store. We visualize clean and polluted water sources, and clean and polluting facilities with different markers. In the extension, we focus on waterfowl and fish, and visualize water bodies that are their habitats. We then highlight the water bodies with a different color. When the user clicks one of the highlighted water bodies, the information of the water body and the provenance of the information are shown in the "Water Body Properties" tab of the pop up window. When the user clicks on a polluted site, and a pop up window shows more details about the pollution: names of contaminants, 14 http://services.nationalmap.gov/ArcGIS/rest/services/nhd/MapServer http://nhd.usgs.gov/data.html 16 http://rdf.geospecies.org/ont/geospecies# 15 measured values, limit values, time of measurement, and health effects on the species. Fig. 2 shows an example of our map visualization. In the example, the portal applies EPA's water quality criteria for aquatic life on the region with the zip code 98103 (Seattle, WA) and identifies polluting site that is close to bird habitats. The time series visualization retrieves species count data within a particular geographic region by querying the triple store and displays the data as a time series using the d3.js library. Fig. 3 shows the count of Canada geese in the Washington state in 2007. 1. With the SemantEco ontology, each observation record is modeled using the class Measurement. OBOE contains the additional class "Observation" and measurements are tied to the corresponding observations. 2. With the SemantEco ontology, each observation record only generates one Measurement entity as the subject, and all fields of each observation record are directly linked to the generated Measurement entity. According to OBOE, one observation record can generate multiple Observation and Measurement entities. For example, both the measured value and the measurement date are modeled as observations which contain measurements. And the measurement date is connected to the measured value using the predicate "hasContext". 3. While the SemantEco ontology captures fields like measurement value, unit and date using datatype properties, OBOE models them using entities. For instance, the unit mg/L is encoded as oboe17:MilligramPerLiter. To address the differences, we incorporate an adapter ontology from our SemantEco ontology family to OBOE and develop an data converter for encoding the water observation data according to the OBOE. We name the previous version of our SemantEco ontologies as version 2 and the new version as version 3. Figure 2. Map Visualization The updated data modeling is compatible to OBOE, except for measurement value. We cannot encode the measurement value as entities, since our regulation ontology maps the rules in regulations to OWL classes18, and encodes allowable ranges of regulated characteristics via numeric range restrictions on datatype properties. Whether an observational item implies a pollution event is reflected by whether the item is a member of any class mapped from the regulations. It is required to encode measurement values as numerical data to enable such reasoning. So to capture measurement value, instead of the object property oboe:hasValue, we use the data property hasNumbericValue, which is defined in an adapter ontology19 from our SemantEco ontology family to OBOE. Fig 4 and 5 depict the data presentation of one example water observation record generated according to the SemantEco version 2 and 3. Figure 3. Time Series Visualization III. EXTENSION OF THE FRAMEWORK To make the portal more reusable, extensible, and possibly lower barriers to adoption by environmental and observational communities, we upgrade the portal in three aspects: connect to a more general ontology family, enhance provenance support and investigate reasoning performance. A. Connect to OBOE In our previous iterations of SemantAqua and SemantEco, we built our SemantEco ontology family that was driven by the use case generated requirements, and the resulting ontologies are lightweight and directly met our application needs. In contrast, OBOE is for generic scientific observation and measurement, and serving as a convenient basis for adding semantic annotations to scientific data. Thus, the SemantEco ontology and OBOE differs in several ways as follows. Figure 5. Measurement in our pollution ontology We model a polluted site as something that is both a measurement site and polluted thing. The different observation data models leads to different models of polluted things. In version 2, a polluted thing is defined as something that has at 17 http://ecoinformatics.org/oboe/oboe.1.0/oboe.owl# e.g., http://purl.org/twc/ontology/swqp/region/ny; others are listed at http://purl.org/twc/ontology/swqp/region/ 19 http://escience.rpi.edu/ontology/semanteco/3/0/oboe-pollution.owl# 18 Figure 4. Measurement in OBOE least one "measurement" that violates a regulation. With version 3, a polluted thing is modeled as something that has at least an observation of a regulation violation. An observation of a regulation violation is modeled as an observation that has at least one "measurement" that violates a regulation. Fig 6 gives the updated model of a polluted site. It is necessary to change the regulation encoding to reflect the updated model. We replaced some properties from our ontology with similar properties defined in OBOE, e.g. pol 20 :hasCharacteristic became oboe:ofCharacteristic. In addition, we promoted some properties from data properties to object properties. For example, pol:hasUnit originally had a range is string and has been replaced by oboe:usesStandard with a range of class oboe:Standard. The changes are relatively minor, and our regulation converter is robust and supported the changes with little effort. Jena Semantic Web Framework [14] to reason over the data and ontologies to answer the queries for polluted sites. Query (1) is under SemantEco version 2, and query (2) is under version 3. To identify polluted sites over the dataset, it takes 63263.4 ms when using version 3, while it takes 22477.2 ms with version 2. The time is an average from 5 executions of queries. We can see the tradeoff between interoperability and reasoning performance. While OBOE brings greater interoperability for the portal, it incurs longer reasoning time. Reasoning performance is tested on a Linux Vserver running Ubuntu 10.04 sharing its Linux 2.6.32 kernel with an Ubuntu 10.04 host. The physical hardware was a Dell PowerEdge R510 configured with a quad core Intel Xeon E5620 with hyper-threading operating at 2.4 GHz and 32 GB of 1333 MHz dual ranked memory. Tests using Pellet were conducted using the 64-bit Java 1.6.0_26 runtime environment (sun-java6-jre, version 6.26-1lucid1) and the virtual machine was configured with 1GB min heap size and 2GB max heap size. TABLE I. Figure 6. Updated model of a polluted site. We investigate the reasoning implication of using OBOE for interpreting data. The data set we use contains 3517 water measurements taken from 588 USGS monitoring sites in Kent County, Rhode Island (county code: 003, state code: 44). Our reasoner uses the Pellet OWL reasoner [13] together with the 20 http://escience.rpi.edu/ontology/semanteco/2/0/pollution.owl# EXAMPLES OF THE RATIONALE CAPTURED Identified thing Type Rationale USGS Data Organization It is an authoritative government agency for science about the Earth, its resources, and the environment. NWIS Dataset csv2rdf4lod Dataset It is distributed via web services and can be accessed periodically with automated means. The open source tool provides an quick and easy way to convert tabular data into wellstructured RDF. We have direct support from the author of the tool, who is our lab mate. Software select distinct ?pollutedSite where{ ?violation a pol:RegulationViolation. ?violation pol:hasSite ?pollutedSite. } (1) select distinct ?pollutedSite where{ ?obervation a oboe-pol:ObservationOfRegulationViolation. ? obervation oboe-core:hasContext ?obsContext. ?context oboe-core:ofEntity oboe-pol:SpatialLocationEntity. ?context oboe-core:hasMeasurement ?locationMea. ?locationMea oboe-core:hasValue ?pollutedSite. } (2) B. Rationale as provenance During the construction of an information portal, we make various choices: from what data sources to fetch data, use what tools to integrate data, what manipulation to conduct over the data, etc. As pointed by Sky Bristol from USGS, rationale which explains how we make these choices, is very important information. Rationale helps portal users to obtain a wholesome understanding of the portal, and facilitates the maintaining and reuse of the portal. Users would be more likely to have confidence in the portal if the rationale behind the construction of the portal and the presentation of the data is acceptable for them. If other portal builders are interested in reusing the architecture or workflow of the portal, they can more easily decide whether they would like to reuse one dataset or software agent when given the rationale for why we select the dataset or software agent. To encode rationale as provenance, we extend the Proof Markup Language (PML) 2 [15]. PML 2 is a modular explanation interlingua and contains three ontologies that focus on three types of explanation metadata: provenance, information manipulation or justifications, and trust. We introduce the property pmlp:hasRationale, whose domain is the class Identified Thing and range is String. Identified-things can be information, language and sources (including organization, person, agent, services). With hasRationale, we can provide the rationale for why we choose to adopt the identified things in simple text. By extending the scope of provenance to include rationale, we are able to capture some important information, which would be totally missing when the original builders of the portal leave the project. In Table 1, we give three examples of the rationale captured. C. Compare the performance of different reasoners Due to forward-chaining closure computing, standard reasoners such as Pellet are much slower than general rule reasoners with only the necessary rules. For example, a standard reasoner for the RDF 21 language would include the following rule (encoded with Jena rule syntax22):  [rdf1: (uuu aaa yyy) -> (aaa rdf:type rdf:Property)] 21 22 http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#RDFRules http://jena.sourceforge.net/inference/#RULEsyntax Although this rule ensures the fulfillment of the semantics of RDF, it is not useful in the query answering task of our system. There are 14 such rules embedded in a standard RDFS reasoner23 and even more for an OWL-DL reasoner like Pellet. We avoid the invocation of these rules to boost query answering efficiency. For example, on the data set with 3517 measurements taken from 588 USGS monitoring sites in Kent County, Rhode Island (county code: 003, state code: 44), it takes a specifically tailored rule reasoner with only rule (3) below 1242 ms to answer query 4) below, while it takes the Pellet reasoner 64620 ms to answer the same query. The experiment is performed on a laptop with Intel Pentium P6000 CPU (1.86GHz X 2, 3MB L3 cache), 4GB DDR3 Memory running 64-bit Windows 7 operating system. [Chloride: (?x pol:hasValue ?v) ge(?v, 10.0) (?x pol:hasCharacteristic pol:Chloride) (?x repr:hasUnit 'mg/l') -> (?x rdf:type pol:ExcessiveChlorideMeasurement)]…(3) select ?s ?x ?v where { ?x a pol:ExcessiveChlorideMeasurement. ?x pol:hasValue ?v. ?x pol:hasSite ?s. }…(4) IV. RELATED WORK In ecology and environmental community, there have been research efforts that facilitate domain knowledge integration via semantic approaches. These research projects focus on different fields of ecological and environmental science. OBOE focuses on encoding generic scientific observation and measurement [5]. GeoSpecies is an effort for enabling species data to be linked together as part of the Linked Data network [7]. Chen et al. proposed a prototype system that integrates water quality data from multiple sources [16]. As the goal of SemantEco is to build a comprehensive ecological and environmental information system, we need to model knowledge spanning multiple fields. Thus, we designed a family of ontologies for encoding water measurement, species observation and the health effects of pollution on species and utilize the ontologies for data integration and reasoning. eScience can benefit from provenance for a number of reasons. For example, provenance provides a context for data interpretation and enables one to evaluate reliability of experiment results and replicate scientific workflows [17]. Research projects such as myGrid [18] and CMCS [19] have been conducted to build infrastructure that generates provenance data and allows users view and use provenance data. The provenance support of this work differs from that of previous projects in that we extend the scope of provenance support to include rationale. V. DISCUSSION AND FUTURE WORK Ecology and Environmental information systems benefit from semantic science and technology from several aspects. Firstly, by encoding the domain knowledge required by the 23 http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#RDFSRules information system with ontologies, we make the information system easier to maintain and extend. In SemantEco, we encode the environmental regulation rules and the health effects of pollution on wildlife as OWL classes. If one regulation rule becomes stricter, we only need to update the OWL class to adopt the stricter threshold value. Similarly, adding a new OWL instances is sufficient for an extension like introducing the health effect of a new contaminant. In contrast, if we embed the domain knowledge in the source code of the information system, these changes would require us to modify the source code and re-deploy the system, which are more costly than changing the ontology files. Semantic technologies facilitate data integration, which is common practice in building ecological and environmental information systems. Converting observation data according to our wildlife ontology leads to controlled vocabulary of the datasets. For example, we map the field "Latitude" and "Longitude" of the eBird Reference Dataset to the property wgs24:lat, and wgs:long. Resource managers need analysis results from the collected ecological and environmental data to make informed decisions. This often involves large amounts of data and the analysis can require much time and effort to arrive at a decision with significant impacts. Semantic technologies can be used to lower the cost and shorten the time required by such decisionmaking processes. We use SPARQL [20] to perform appropriate data aggregation, which is often used to get an overall understanding about the datasets and it can only be performed when it is sensible to aggregate the data objects [5]. Not only does SPARQL enable us to specify the constraints of the data aggregation, it also supports aggregation functions including COUNT, SUM, AVG, MIN, and MAX. For example, we obtain the total counts of "Canada Goose" in Washington state in 2007 with the SPARQL query as follows. The query result can be provided in XML or JSON, which would be readily consumed by different visualization toolkits (e.g. D3.js) to produce a time series plot. This way, with SPARQL and visualization toolkits, the portal enables users to review and interact with the growing data resource in the form of maps and other visualizations. GROUP BY ?month (5) It can be challenging to retrieve, aggregate and visualize data not encoded in semantic format. For instance, the Washington Department of Fish and Wildlife provide species distribution data in a spreadsheet. To retrieve the data to be aggregated and then perform the aggregation, a resource manager has three options: do it manually, write ad hoc programs, or write complex excel macros. All of the three options require considerable time and effort from the resource manager. The SemantEco portal has a lot of potential directions to explore. We can use semantic technologies to enable automatic analysis over ecological and environmental data. For example, we model "EndangeredSpot " as a place where some animals are reported as dead or sick, and "CriticalSpot" as a location that is both an " EndangeredSpot " and a "PollutedSite". Then, if we feed a semantic reasoner with the ontology and data, it will automatically identify the critical spots. :EndangeredSpot rdfs:subClassOf [ rdf:type owl:intersectionOf ( :SickEventSpot :DeathEventSpot ) ]. :SickEventSpot owl:equivalentClass [ rdf:type owl:someValuesFrom :WildlifeSickEvent ]. :DeathEventSpot owl:equivalentClass [ rdf:type owl:someValuesFrom :WildlifeDeathEvent ]. :CriticalSite rdfs:subClassOf [ PREFIX wildlife: <http://www.semanticweb.org/ontologies/2012/2/wildlife.owl# >. ] . (7) geospecies:hasCommonName "Canada Goose"; wildlife:hasYearCollected "2007"; wildlife:hasMonthCollected ?month; wildlife:hasObservationCount ?count.} 24 owl:Restriction ; owl:onProperty :hasWildlifeEvent ; rdf:type WHERE {?obv wildlife:hasState "Washington"; owl:Restriction ; owl:onProperty :hasWildlifeEvent ; PREFIX geospecies: <http://rdf.geospecies.org/ont/geospecies#> . SELECT ?month SUM(?count) as ?total owl:Class ; owl:Class ; owl:intersectionOf ( :PollutedSite :EndangeredSpot ) Such modeling and reasoning has a constraint: a monitoring site has records for both environmental observation and wildlife health events. However, as environmental qualities and wildlife health are usually monitored by different sites, the constraint usually does not hold. In such cases, we can model "CriticalSite" as a "PollutedSite" having at least one "EndangeredSpot" nearby. Then we can get the location information of "PollutedSite" and "EndangeredSpot" with query snippet (8) and utilize SPARQL filter to identify "CriticalSite" as shown in (9). To identifying "EndangeredSpot", the portal need data for wildlife health event, which are provided web monitoring http://www.w3.org/2003/01/geo/wgs84_pos systems such as the Wildlife Health Event Reporter (WHER25). We are interested in linking our portal to WHER and enable the portal to identify "EndangeredSpot" and "CriticalSite". ?pollutedSite wgs:long ?siteLong. [3] P. Wang et al., “A Semantic Portal for Next Generation Monitoring Systems,” in Proceedings of the 10th International Semantic Web Conference, 2011, pp. 253-268. [4] P. Wang, “Semantically Enabling Next Generation Environmental Informatics Portals,” RPI, 2012. [5] J. Madin, S. Bowers, M. Schildhauer, S. Krivov, D. Pennington, and F. Villa, “An ontology for describing and synthesizing ecological observation data,” Ecological Informatics, vol. 2, no. 3, pp. 279-296, 2007. [6] C. Bizer, T. Heath, and T. Berners-Lee, “Linked Data - The Story So Far,” International Journal on Semantic Web and Information Systems, vol. 5, no. 3, pp. 1-22, 2009. [7] “GeoSpecies Knowledge Base.” [Online]. Available: http://lod.geospecies.org. [Accessed: 15-Jan-2009]. [8] L. Dodds and T. Scott, “BBC Ontologies - The Wildlife Ontology,” 2010. [9] M. A. Munson et al., “The eBird Reference Dataset, Version 3.0,” Ithaca, NY, 2011. [10] US EPA, “National Recommended Water Quality Criteria.” [Online]. Available: http://water.epa.gov/scitech/swguidance/standards/criteria/current/. [Accessed: 20-Jun-2012]. [11] T. Lebo and G. T. Williams, “Converting governmental datasets into linked data,” in Proceedings of the 6th International Conference on Semantic Systems, 2010, pp. 38:1-38:3. [12] P. Hitzler, M. Krötzsch, B. Parsia, P. F. Patel-Schneider, and S. Rudolph, “OWL 2 Web Ontology Language Primer,” W3C Recommendation 27 October 2009, 2009. [Online]. Available: http://www.w3.org/TR/owl2-primer/. [Accessed: 05-Mar-2012]. [13] E. Sirin, B. Parsia, B. Cuenca Grau, A. Kalyanpur, and Y. Katz, “Pellet: A practical OWL-DL reasoner,” Web Semantics: Science, Services and Agents on the World Wide Web, vol. 5, no. 2, pp. 51-53, Jun. 2007. [14] J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson, “Jena: implementing the semantic web recommendations,” in Proceedings of the 13th International World Wide Web Conference, 2004, pp. 74-83. [15] D. L. Mcguinness, L. Ding, P. P. D. Silva, and C. Chang, “PML 2: A Modular Explanation Interlingua,” in Proceedings of the AAAI 2007 Workshop on Explanation-aware Computing, 2007, pp. 22 23. [16] Z. Chen, a Gangopadhyay, S. Holden, G. Karabatis, and M. Mcguire, “Semantic integration of government data for water quality management,” Government Information Quarterly, vol. 24, no. 4, pp. 716-735, Oct. 2007. [17] Y. L. Simmhan, B. Plale, and D. Gannon, “A survey of data provenance in e-science,” ACM SIGMOD Record, vol. 34, no. 3, pp. 31-36, Sep. 2005. [18] J. Zhao, C. Goble, R. Stevens, and S. Bechhofer, “Semantically Linking and Browsing Provenance Logs for E-science,” in Proceedings of the 1st International Conference on Semantics of a Networked World, 2004, vol. 3226, pp. 158-176. [19] J. Myers, C. Pancerella, C. Lansing, K. Schuchardt, and B. Didier, “Multi-scale science: supporting emerging practice with semantically derived provenance,” in ISWC 2003 Workshop on ?pollutedSite wgs:lat ?siteLat. ?endangeredSpot wgs:long ?spotLong. ? endangeredSpot wgs:lat ?spotLat. (8) FILTER ( ?siteLat < (?spotLat+"+delta+") && ? siteLat > (?spotLat-"+delta+") && ? siteLong < (?spotLong+"+delta+") && ? siteLong > (?spotLong-"+delta+")) (9) Next, for modeling wildlife habitat, we plan to connect to the Harmonisa project [21] which provides semantic descriptions of land-use and land-cover categories. Furthermore, EPA’s water quality criteria [10] provides multiple types of thresholds and would provide and important facet to complement our existing datasets. These criteria include measures of acute pollution in freshwater, chronic pollution in freshwater, acute pollution in saltwater, and chronic pollution in saltwater. We currently incorporate thresholds for acute pollution in freshwater for two reasons: 1) we mainly focus on inland water bodies; 2) acute pollution can affect both species that live near the polluting water source and pass by the water source occasionally. To support thresholds for chronic pollution, we need to consider some additional factors, e.g. the time that species stay near the polluting water source. We would require species distribution models from animal experts for modeling the chronic pollution. Lastly, we plan to enhance our modeling of rationale as provenance and to collect and integrate more data on the health effects of pollution on species. VI. CONCLUSION We extended our SemantEco portal based on two driven factors: to facilitate decision support systems for resource managers and to make the portal more broadly reusable. Our extension includes: support for wildlife monitoring; connections to OBOE, integration of wildlife observation data as linked data; enhanced provenance support through the incorporation of rationale; and performance comparison between a standard reasoner and customized rule base reasoner. REFERENCES National Fish Habitat Board, “Through a Fish’s Eye: The Status of Fish Habitats in the United States 2010,” Washington D.C., 2010. [1] [2] M. S. Schwarz, K. R. Echols, M. J. Wolcott, and K. J. Nelson, “Environmental contaminants associated with a swine concentrated animal feeding operation and implications for McMurtrey national wildlife refuge,” Grand Island, Nebraska, 2004. 25 http://www.whmn.org/wher/ Semantic Web Technologies for Searching and Retrieving Scientific Data, 2003. [20] E. Prud’hommeaux and A. Seaborne, “SPARQL Query Language for RDF,” W3C Recommendation, 2008. [Online]. Available: http://www.w3.org/TR/rdf-sparql-query/. [Accessed: 06-Mar-2012]. [21] “HarmonISA - Harmonisation of Land-Use Data.” [Online]. Available: http://harmonisa.uni-klu.ac.at/. [Accessed: 20-Jun-2012].

- Tetherless World Constellation

Related documents

Products

Support

- Tetherless World Constellation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib