DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer

advertisement
DATABASE MANAGEMENT SYSTEMS
IN DATA INTENSIVE ENVIRONMENNTS
Leon Guzenda
Chief Technology Officer
AGENDA
• Introduction
• Issues and Approaches
• Summary & Resources
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
Objectivity, Inc. & Objectivity/DB
Objectivity
Corporate
Informationfor:
Object Database
Management
• Data intensive applications that manipulate complex data
• High throughput systems
• Very large volumes of data
Main Markets
Product Highlights
• Government
• High Performance with complex data
• Scientific
• Scalability and High Availability
• Telecommunications
• Engineering
• Manufacturing
• Complex IT
• Fully Distributed
• Interoperability
- C++, Java, Smalltalk, SQL and XML
- Linux, LynxOS, Unix and Windows
• Productivity
- Eclipse IDE
- Eliminates the object to DB mapping layer
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
SCALABILITY
• Data Volume - 890 Terabytes [BaBar]
• Throughput – Ingested 32 Terabytes per Day [Benchmark]
In a recent benchmark with Objectivity/DB running on 64 Irix processors
(600 MHz), CXFS and a 100 Terabyte SAN we achieved:
• An ingest rate of 32 Terabytes per day (input, correlate and commit)
• Simultaneous queries from 32 processors running at near to 100% CPU capacity
• Simultaneous movement and deletion of aged data to a long term repository
• Simultaneous Users – 100s of Thousands [SprintPCS]
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
Issues and Approaches
ISSUES
• Describing complex data
• Exponentially increasing data volumes
• Sharing data across sites
• Querying huge datasets
• Cost of Ownership
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
DESCRIBING COMPLEX DATA
Approaches:
• Old Way
- Definitions buried in header files
- Language-specific schema language (DDL/SQL)
• Current Approaches
- Unified Modeling Language [UML]
- XML
• Trends
- Java Database Objects [JDO]
- Grid Database Access and Integration Services
- Higher level schemas and ONTOLOGIES
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
DATA VOLUMES
Approaches:
• Old Way
- Keep data in compressed files and index them in a DBMS
- Proprietary tape archives
• Current Approaches
- Store everything in an ODBMS (lower overheads than an RDBMS)
- Hierarchical storage systems (HPSS etc.)
• Trends
-
DMW2004
Solid State Disks at the front end, commodity disks at the back end
Heterogeneous Storage Area Networks [SAN], e.g. CXFS
Fiber Optic processor-to-SAN switches
Grid enablement (totally distributed archives)
Copyright  Objectivity, Inc. 2004
3/16/04
SHARING DATA ACROSS SITES
Approaches:
• Old Way
- Transfer files/disks/tapes
- Filesystem or no security
• Current Approaches
- Distributed databases and the World Wide Web
- High bandwidth networks
- Authentication and secure transport layers
• Trends
DMW2004
Grid enablement
Federated databases
Ultra-high bandwidth networks and remote replication
Flexible, localized security mechanisms
Copyright  Objectivity, Inc. 2004
3/16/04
Distributed Federations
User X1
A3
A
Organization X
User X2
Replica
of A
User X3
Organization Y
User Y1
A2
Replica of A
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
Distributed Federations
User X1 Mobile
and Detached
A3
A
Organization X
Replica of A
User X2
User X3
Organization Y
User Y1
A2
Replica of A
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
QUERYING HUGE DATASETS
Approaches:
• Old Way
- Hold metadata (indexes and relationships) in a searchable file
• Current Approaches
- Hold metadata in a RDBMS and data in files
- Hold metadata and data in an ODBMS
• Trends
- Adaptations of text search engines
- Distributed Parallel Query Engines
- Specialized search accelerators
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
Current Architecture
Queries run synchronously within the client
DBA Tools
APPLICATION
Lock Server
Lock Server
Language Interfaces
Object & Schema Managers
Data “Page” Server
Query & Index Managers
Storage & Transaction Managers
Data “Page” Server
Networking & Event Managers
Mass Storage
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
Parallel Query Engine [PQE]
Queries run asynchronously and in parallel, either locally
or distributed
DBA Tools
APPLICATION
Language Interfaces
Lock Server
Lock Server
Object & Schema Managers
Query & Index Managers
PQE
Data “Page” Servers
Storage & Transaction Managers
Networking & Event Managers
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
PQE and Search Accelerator
Queries run asynchronously and in parallel, but with
Predicate Management within the Search Accelerator
DBA Tools
APPLICATION
FPGA & RAM
Language Interfaces
Search Accelerator
Lock Server
Lock Server
Object & Schema Managers
Query Manager
PQE
Data Servers
Storage & Transaction Managers
Networking & Event Managers
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
COST OF OWNERSHIP
Approaches:
• Old Way
- Build It Yourself (many hidden costs)
- Run It Yourself
• Current Approaches
- Use Commercial Off The Shelf [COTS] software
- Open Source
- Commodity hardware & tiered storage
• Trends
- Heterogeneous storage
- Grid Enablement
- Resource and Skill Brokers (Future)
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
SUMMARY
SUMMARY
• Database languages are still evolving
• Data throughput and system latency times are decreasing
• Sharing data across sites still presents many challenges
• Querying vast datasets will become faster and cheaper
• Software vendors are wrestling with Open Source issues
• Startup costs are still high, but the trends are downward
• Grid enablement will help
• Keep working on the Standards!
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
RESOURCES
• http://www.objectivity.com
• Technical Overview
• Data Sheets and White Papers
• Free downloadable Java and C++ evaluation software and tutorials
• Global Grid Forum
• http://www.ggf.org
• Email: info@objy.com
ANY QUESTIONS?
DMW2004
Copyright  Objectivity, Inc. 2004
3/16/04
Download