Semantic Water Quality Portal Jin Guang Zheng, Ping Wang, Deborah McGuiness, Joanne Luciano Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, New York, USA TWC-TR# (assigned once accepted- for homework assignments use format specified in introduction) 1 INTRODUCTION Semantic web portal(SWP) is a website built based on the semantic web technologies, which collects information for a community of users and for those users to share and exchange information[1]. Some of benefits that SWP provides include automated inference and reasoning, semantic query, etc. Despite the useful features provided by SWP, the deployment status of SWP is still in an unsatisfactory state. One of the main problems for this deployment status of semantic web technologies based portals or frameworks is its difficulties for non-specialist to replicate the system and the system’s uneasiness to be scale to different domains. There are substantial materials needed to be learned by the nonspecialists in order to adopt the systems to various domains, e.g. Provenance, SPARQL. Also, some SWP systems may require substantial amount of configuration in order to be adapt to a new domain. In this paper, we present an easyto-deploy SWP. The proposed SWP requires user with minimum learning curve and little configuration process while enabling following semantic web technologies based features: 1. Provide both data level and application level of provenance, 2. Support OWL typed inference and reasoning, 3. Visualize semantic data. We already deployed our SWP on the environmental domain, more specific, we use the system to develop a Semantic Water Quality Portal(SWQP). 2 DATA Data from three different sources are collected to develop our SWQP: 1. USGS data about water sources. 2. EPA data about facilities. 3. State Regulation data about pollutants. USGS data: This dataset provides measurements of many different chemicals in groundwater and waterways (e.g. Arsenic). EPA data: This dataset provides information about specific companies that must abide by the federal guidelines put in place and if or when they have violated EPA regulations State Regulation data: Since USGS are not responsible for managing and enforcing regulations, their data must be evaluated under federal or state regulations such as Rhode Island's Water Quality Regulations (2009). Since the data are collected from various sources, we must be able to use semantic technologies to combine the data and present data to end-users. 3 METHOD/SYSTEM ARCHITECHTURE To support aforementioned features for SWP and build our SWQP, we implemented following components: Data Conversion Component: There are two converters implemented. One of the converters is a general converter, which is able to convert any data in csv format to rdf format. Another converter is an ad-hoc converter for SWQP, which converts some of the regulation data from PDF format to RDF format. Ontology Component: In SWQP, we designed a core regulation ontology. When data are converting to RDF format, we encode the data using the ontology. Therefore, we can perform reasoning on the data we collected. The ontology itself is designed and encoded use OWL2 [2]. Provenance Component: There are two levels of provenance information we are able to capture using our provenance component. The first level is data level provenance: when data are converted to RDF using our data conversion component, we inject provenance information about data sources using PML[3]. The second level is application level provenance: when a water source is marked as polluted water source(or facility been marked as violating facility), we provide provenance information on what data we used for this reasoning. Visualization Component: This component is responsible for mash-up and represents the data we collected from various sources in a meaningful way. Right now, we provide a Geo Map visualization of water sources and facilities. Back-end Reasoner Component: We also built a back-end reasoner using JENA and PELLET. In SWQP, the reasoner performs OWL 2 reasoning using the ontology we designed 1 Luciano, J et al. ETST_2011_Luciano_Joanne_A1 over the data we collected from various sources to determine polluted water sources and violated facilities. Provenance: Two tasks need to finish in for provenance: provenance based query, and visualizing provenance data. 4 Portal Functionalities: We can also add few more functionalities to our portal: ontology based facet browser, and more ways to visualize our data. DISCUSSION In this section, we discuss the project in two different aspects: 1. For the class, 2. For the research tasks. 4.1 Discussion for the class In this section, I will discuss the project w.r.t class purpose, mainly answering questions who is primary responsible for what tasks? What I think I will learn? Who is primary responsible for what tasks? Data Conversion: Ping is primary responsible for the task on converting different data using either the converter written by her or Tim. Ontology Development and Reasoner Development: Jin is primary responsible for the task on developing ontologies and the reasoners used in the system. Functionality Development: Both Jin and Ping are responsible on developing functionalities required by the system. What I think I will learn? What I think I will learn? I think I will learn more on ontology and reasoned developments, paper writing skills, and the applications of semantic web technologies. 4.2 Discussion for the research tasks Some of the development and implementation tasks are already finished in last semester. However, some minor adjustment and new functionalities needed to develop in this semester. Scale the portal: There are two dimensions we can and need to scale our portal. One of the dimensions is scale the portal w.r.t state regulation. Second one is scale the portal to different domain, e.g. Health Portal. Ontology and Reasoner: The goal for this task is to build a more robust ontology to support various kinds of reasoning we need in our portal. There are some interesting ontologies already implemented [4]. We can borrow some of the ideas of these ontology designs and maybe link to the existing ontologies. 2 5 CONCLUSION In this paper, we presented an easy-to-deploy Semantic Web Portal. We also deployed the portal to the environment domain and build Semantic Water Quality Portal. SWQP demonstrates interesting and useful semantic web technologies based features provided by our SWP: 1. Provides provenance information about data and reasoning, 2. Support automatic OWL based reasoning, 3. Visualize semantic data in a meaningful way. As we discussed in the previous section, we will be continue to work on the portal system to build a more robust portal system. REFERENCES [1] Lausen,H., Ding, Y., Stollberg, M., Fensel, D., Hernandez, R., and Han,S. (2005): Semantic web portals:state-of-the-art survey. Journal of Knowledge Management, vol. 9(5), pp. 40--49 [2] Hitzler, P., Krotzsch, M., Parsia, B., Patel-Schneider, P., Ru dolph, S., (2009) OWL 2 Web Ontology Language Pri mer. <http://www.w3.org/TR/owl2-primer/> [3] McGuinness, D., Silva, P., Ding, L., (2007): Proof Markup Language (PML) Primer, <http://inferenceweb.org/2007/primer/> [4] Parekh V., (2005): Applying Ontologies and Semantic Web technologies to Environmental Sceiences and Engineering. Mater Thesis, University of Maryland, Baltimore County