PPT - Center for Software Engineering

advertisement
A Software Architecture for Highly
Data-Intensive Systems
Chris A. Mattmann
mattmann@usc.edu
USC Center for Software Engineering
Annual Research Review
March 2004
Special thanks to Dan Crichton, Steve Hughes, and Sean Kelly
for some of the slides!
Overview




Motivation
Problem Statement
OODT: A Software Architecture and
Middleware for Data-Intensive Systems
Evaluation: Science Problems



Planetary Science
Cancer Research
Conclusion
Motivation
Problem Statement

Information Integration in Data-Intensive Systems

Needed to support data access, distribution, processing and retrieval
across existing heterogeneous data sources



Software and Techniques exist to perform Information Integration

But…..




NASA’s Planetary Data System
NCI’s Early Detection Research Network
No Software Re-use
No Design Methods to start from
No mapping of integration techniques to software components, interaction
mechanisms, or arrangements of components
Lack of Re-use and software standards for information integration in
data-intensive systems has forced systems to be “built from scratch”



Little or no interoperability with other software systems
Programmer almost always “in the loop”
New GDS proposal accompanies most new NASA mission proposals
Our Approach

A Software Architecture for Data-Intensive Systems

Data Architecture



Software Architecture




Data Dictionary
Resource Profiles
Components: Product Servers, Profile Servers, Query Servers
Connector: Messaging Layer
Configurations of Product/Profile/Query Servers
..and a middleware implementation based on the software architecture

Middleware leverages existing distributed object middleware frameworks such as
CORBA, RMI


We’re currently working on a SOAP version
Built and maintained at the Jet Propulsion Laboratory




Yes, the Mars folks
Architecture+middleware = OODT (Object Oriented Data Technology)
Middleware being developed at JPL
Architecture being formalized at USC-CSE
Data Dictionary

Common Data Model containing



Data Elements which the user is interested in querying for
Data Elements which the user would like to retrieve
Challenge:

Integrate data sources linked in by exploiting the Data
Dictionary structure

Map common data model to data source models across
data-intensive system


Use a common data element structure

ISO-11179 Specification and Standardization of Data
Elements
Handles the integration of data models across the system,
but still need to integrate software interfaces
Resource Profiles

Provides mechanisms for describing data systems, data
products, etc including




Common data attributes using Dublin Core (I.e. Title, Author,
Subject) data elements to describe electronic resources
Mechanisms for describing where the data is located and how to
access it
Domain data elements that are useful for describing the product
(i.e. TARGET_NAME, MISSION_NAME, INSTUMENT_NAME,
etc)
Enables “search and retrieval” of distributed data products

Searches to a Profile Server yields information regarding the
characteristics of distributed resources (i.e. descriptive
information about the product, access information, etc)
Resource Profiles Example

“country = US and windspeed > 120”
<profile>…
<resAttributes>…
<resLocation>urn:eda:rmi:Western…
<profileElement>
<elemName>country</elemName>…
<elemValue>US</elemValue>…
<profileElement>
<elemName>state</elemName>…
<elemValue>WA</elemValue>
<elemValue>CA</elemValue>…
<profileElement>
<elemName>windspeed</elemName>…
<elemMinValue>3</elemMinValue>
<elemMaxValue>146</elemMaxValue>…
<profile>…
<resAttributes>…
<resLocation>urn:eda:rmi:Southern…
<profileElement>
<elemName>country</elemName>…
<elemValue>US</elemValue>…
<profileElement>
<elemName>state</elemName>…
<elemValue>LA</elemValue>
<elemValue>TX</elemValue>…
<profileElement>
<elemName>windspeed</elemName>…
<elemMinValue>1</elemMinValue>
<elemMaxValue>89</elemMaxValue>…
Matches!
Components

Product Server

Responsible for abstracting heterogeneous data source
interfaces



Provides a common query interface across heterogeneous data
sources
Profile Server

Describe data resources using resource profiles


Attach a Product Server to each data source that is integrated
Allow data resources to be discovered and located at query-time
Query Server



Tie it all together
Uses Profile Servers to discover data resources which could
potentially satisfy a query
Queries discovered data resources (such as Product Servers)
and collects obtained data products to return to the user
Connectors

Messaging Layer

Each OODT component registers itself with a
Component Registry



Allows Components to define and provide services
Components defined by unique URNs
Transfers OODT Query Object containing

OODT Style Query


(Keyword = Value) predicates joined by logical operators
(AND, OR, etc)
The result list to be populated
Configurations: Example
Configurations: Example (2)
Configurations: Example (3)
Planetary Science

Planetary Data System

Official NASA “Active” Archive for all Planetary Data





Data ingestion required as part of Announcement of
Opportunity (AO) for a mission
9 Nodes with data located at discipline sites
Common Data Architecture
Different data systems located at the sites
Prior to October 2002, no ability to find and share data
between PDS nodes


Data distribution via CD ROM
Limited electronic distribution
OODT PDS Deployment
Early Detection Research
Network


OODT’s success has lead to interagency agreements with both NIH
and NCI
OODT has provided the NCI with a bioinformatics infrastructure for
sharing data across the nation







Currently deployed at 10 of 31 NCI Research Institutions for the Early
Detection Research Network (EDRN)
Providing real-time access to distributed, heterogeneous databases
Created a national virtual repository for biospecimens (now a NCI
Director Initiative)
Now integrating new datasets: validation studies, images, biomarkers,
etc
Meet Federal security regulations
Operational September 2002
Same core software framework as deployed in planetary, earth and
engineering
OODT EDRN Deployment
Conclusion

OODT is…..

A novel software architecture to describe data intensive
systems


A reference implementation of above software architecture



integration, search, retrieval and discovery of heterogeneous
data stored in heterogeneous domain data sources
Java-based middleware
C++. Perl, Python, PHP Client APIs
A process for annotating and creating standard metadata
models to describe heterogeneous data based on data
standards


Dublin Core
ISO-11179
Referred Papers







Mattmann C, Ramirez P, Crichton D, and Hughes, J.S. Packaging Data Products using
Data Grid Middleware for Deep Space Mission Systems. Accepted for Publication at the
8th International Conference on Space Operations, Montreal, Canada, 2004.
Mattmann C, Freeborn D, Crichton D. Towards a Distributed Information Architecture for
Avionics Data. In Proceedings of the 2nd International IADIS Conference on the WorldWide-Web and Internet, Volume II, pp 829-832. Algarve, Portugal, 2003.
Crichton D, Hughes, J.S., Kelly, S. A Science Data System Architecture for Information
Retrieval. Clustering and Information Retrieval. Kluwer Academic Publishers. December
2003. - Book Chapter on OODT
Crichton D, Hughes, J.S., Kelly, S, Rameriz, P. A Component Framework Supporting
Peer Services for Space Data Management. 2002 IEEE Aerospace Conference. Big Sky,
Montana. March 2002.
Crichton D, Downing G, Hughes J. S, Kincaid H, Srivistava S. An Interoperable Data
Architecture for Data Exchange in a Biomedical Research Network. 14th IEEE
Symposium on Computer-Based Medical Systems. July 2001.
Crichton, D., Hughes J. S, Hardman S, Kelly S. A Distributed Component Framework for
Data Product Interoperability. 17th CODATA International Conference, Baveno, Italy.
October 2000.
Crichton, D., Hughes J. S, Kelly S, Hyon J. Science Search and Retrieval using XML.
Second National Conference on Scientific and Technical Data, Washington D.C., National
Academy of Sciences. March 2000.
Questions?

Contacts


OODT Website: http://oodt.jpl.nasa.gov
Principal Investigator


Co-Investigator


Steve Hughes (Steve.Hughes@jpl.nasa.gov)
Programmer/Research Grunt


Dan Crichton (Dan.Crichton@jpl.nasa.gov)
Me (chris.mattmann@jpl.nasa.gov)
Thanks for your attention!
Backup Slides
Resource Profiles Example

“country = US and windspeed > 120”
<profile>…
<resAttributes>…
<resLocation>urn:eda:rmi:Western…
<profileElement>
<elemName>country</elemName>…
<elemValue>US</elemValue>…
<profileElement>
<elemName>state</elemName>…
<elemValue>WA</elemValue>
<elemValue>CA</elemValue>…
<profileElement>
<elemName>windspeed</elemName>…
<elemMinValue>3</elemMinValue>
<elemMaxValue>146</elemMaxValue>…
<profile>…
<resAttributes>…
<resLocation>urn:eda:rmi:Southern…
<profileElement>
<elemName>country</elemName>…
<elemValue>US</elemValue>…
<profileElement>
<elemName>state</elemName>…
<elemValue>LA</elemValue>
<elemValue>TX</elemValue>…
<profileElement>
<elemName>windspeed</elemName>…
<elemMinValue>1</elemMinValue>
<elemMaxValue>89</elemMaxValue>…
Matches!
Object Oriented Data
Technology

Object-Oriented Data Technology (OODT)


Funded in 1998 by NASA’s Office of Space Science to develop a
national software framework for sharing data across
heterogeneous, distributed data repositories
Develop…


a common data and software framework to enable data sharing
across multiple science and engineering disciplines
A reusable software architecture across data management projects




Reusable software components with common interfaces
Interfaces to enable new components to be plugged in
Mechanism to wrap legacy data system components with minimal impact
OODT should provide..


Science domain independence (use in engineering, science and
biomedicine)
Data location independence (describe what you want, not
how/where to get it
Download