Advanced Semantic Technologies Project: S2S Publication Eric Rozell* Tetherless World Constellation, Rensselaer Polytechnic Institute, Troy, New York, USA AST_2011_Rozell_Eric_Week6 ABSTRACT During Summer 2010, I researched at Woods Hole Oceanographic Institution (WHOI) under the Summer Student Fellowship Program. My project involved the design and implementation of a customizable and extensible dashboard interface for searching and analyzing various kinds of oceanographic data. The outcome of the project was S2S, a web application and framework for building advanced search interfaces for web services described by semantic annotations. During the Advanced Semantic Technologies class, I plan to write up a publication for submission to the Earth Science Informatics journal about this project. 1 PROJECT SUMMARY During my summer fellowship, I worked closely with oceanographers, data managers, and IT specialists at WHOI in the development of a dashboard interface for searching and analyzing oceanographic data. Interviews with oceanographers were used to develop scientific use cases that would refine the scope of the summer project and to establish reputable resources for oceanographic data on the Web. Interviews with data managers were held to learn more about data services at WHOI, and to identify a project that would be useful to the data management community. Oceanographers identified the potential utility of a uniform interface for searching over data across oceanographic disciplines, claiming that they were often unaware of reputable sources outside their own disciplines, and experienced a learning curve when they found and needed access to such sources. Data managers identified the need to develop standardized web services, but noted that there was little internal benefit to such practices, other than visibility in a larger project or web portal. S2S was designed to suit the needs of the broad oceanographic community, including both scientists and data managers. Oceanography covers a broad range of scientific disciplines (e.g., physics, chemistry, biology, engineering), and as such the data products found throughout the domain are highly heterogeneous. Thus, identifying a metadata model that suited the broad needs of the oceanographic community was beyond the scope of a summer project. Rather, S2S * takes a service-oriented approach. S2S uses the Resource Description Framework (RDF) and the OWL Web Ontology Language to organize metadata about web services and user interface components. The purpose of these semantics is to enable community development from different perspectives (i.e., scientists’, service developers’, and user interface developers’), and to support a greater degree of extensibility and customizability than traditional object-oriented approaches. S2S can be used to search and analyze data from web services that comply with select web service standards supporting semantic annotation (e.g., OpenSearch, SAWSDL) and also SPARQL-enabled services. Search and analysis tools for S2S can be created and stored at distributed locations on the Web. 2 HOW IT WORKS There are two primary components to the S2S framework: the S2S ontology and knowledge base, and the S2S server. The S2S ontology and (potentially distributed) knowledge base contains information relevant to the discovery of web services and user interface components. Scientists can use the contextual metadata provided by the knowledge base to assist in their selection of a web service. Scientists can also interchange user interface components based on usability preferences or prior experience (i.e., they can customize the interface). The ontology model associates user interface components with particular service output formats and search parameters they support, so that components can be reused across a variety of services. As an example, in oceanography, a geographic bounding box is a commonplace search parameter used across many data services. A user interface developer could design a map “widget” that could be used to enable bounding box input. This semantic metadata for the map widget would associate it to the geo:box search parameter (a URI representative of a geographic bounding box input). Thus, when a search interface for a particular web service is being constructed, when it is detected that the service uses the geo:box parameter, the user would be presented with the choice to use the map widget, among other user interface components that support searching over the geo:box parameter. To whom correspondence should be addressed. 1 Rozell, E. The second component to the S2S framework is the S2S server. The server handles requests for metadata from the knowledge base and acts as a proxy to distributed web services. The S2S server has an object-oriented interface for accessing information about web services that need not (or cannot) be encoded within the knowledge base (e.g., information that is represented in an XML document such as a SAWSDL, or OpenSearch description document). To enable a web service standard, a PHP class can be created that implements the S2S search service interface and extracts important information from the web service or its description. The intent of the framework is to support the development of web portals for federating search across standard web services and to enable the development of customizable, extensible uniform interfaces. A prototype web application has been built for searching OpenSearch services using jQuery. 3 ADVANCED SEMANTIC TECHNOLOGIES PROJECT The overall design of S2S is starting to converge, and, as such, I would like to use the Advanced Semantic Technologies class to write a paper summarizing the current results. I plan to submit to the Earth Science Informatics journal, and there are two alternatives to the kind of paper I could write, a research paper and a methodology paper. 3.1 Research Paper In order to write a research paper I have to come up with a research question that is addressed by the S2S application and framework. Following the methods of The Craft of Research, I need to identify a research topic and research question. The research problem that will be used to answer the question, of course, would be to design and implement S2S. The topic I originally set out to work in was data integration, but as I started designing use cases, I discovered that I was mostly interested in web service and application integration due to the diversity of data products in oceanography. Thus, my research topic of interest is data and application integration. I had planned from the beginning of the project to apply Semantic Web technologies. Since I was working primarily at the web service layer, and I was aware of existing approaches to apply semantics in the description of web services, I chose to apply Semantic Web technologies for the development of a data and application integration framework. Alternatively, I could have used solely objectoriented programming techniques in the implementation of the framework. Thus, the potential research question that I’ve identified is: “Can employing Semantic Web technologies in the implementation of a web application framework 2 AST_2011_Rozell_Eric_Week6 provide additional flexibility beyond a traditional objectoriented approach?” I think the research answer here is yes; using Semantic Web technologies can enable the development of web application frameworks that take advantage of the distributed nature of the web. Rather than requiring a centralized code repository for all framework components, framework component metadata can be encoded in RDF and placed anywhere on the Web. The semantics of the application framework can be encoded using OWL, and the framework components can link to each other using these semantics. 3.2 Methodology Paper Alternatively, this paper can be written as a methodology paper. As Prof. McGuinness stated, methodology papers are not usually considered as significant as research papers, but this work does present an interesting combination of two methodologies for web systems development. One of these methodologies is the Semantic Web Methodology & Technology Development Process, which has been used extensively by Prof. McGuinness and Prof. Fox in the design and implementation of virtual observatories and eSciencerelated projects. The other methodology is that of Steve Lerner and Andrew Maffei in developing interfaces for heterogeneous oceanographic data, including imagery and realtime data, and designing reusable software. This paper would focus mostly on the details of implementing S2S and how these two methodologies were successfully combined.