DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer AGENDA • Introduction • Issues and Approaches • Summary & Resources DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Objectivity, Inc. & Objectivity/DB Objectivity Corporate Informationfor: Object Database Management • Data intensive applications that manipulate complex data • High throughput systems • Very large volumes of data Main Markets Product Highlights • Government • High Performance with complex data • Scientific • Scalability and High Availability • Telecommunications • Engineering • Manufacturing • Complex IT • Fully Distributed • Interoperability - C++, Java, Smalltalk, SQL and XML - Linux, LynxOS, Unix and Windows • Productivity - Eclipse IDE - Eliminates the object to DB mapping layer DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 SCALABILITY • Data Volume - 890 Terabytes [BaBar] • Throughput – Ingested 32 Terabytes per Day [Benchmark] In a recent benchmark with Objectivity/DB running on 64 Irix processors (600 MHz), CXFS and a 100 Terabyte SAN we achieved: • An ingest rate of 32 Terabytes per day (input, correlate and commit) • Simultaneous queries from 32 processors running at near to 100% CPU capacity • Simultaneous movement and deletion of aged data to a long term repository • Simultaneous Users – 100s of Thousands [SprintPCS] DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Issues and Approaches ISSUES • Describing complex data • Exponentially increasing data volumes • Sharing data across sites • Querying huge datasets • Cost of Ownership DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 DESCRIBING COMPLEX DATA Approaches: • Old Way - Definitions buried in header files - Language-specific schema language (DDL/SQL) • Current Approaches - Unified Modeling Language [UML] - XML • Trends - Java Database Objects [JDO] - Grid Database Access and Integration Services - Higher level schemas and ONTOLOGIES DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 DATA VOLUMES Approaches: • Old Way - Keep data in compressed files and index them in a DBMS - Proprietary tape archives • Current Approaches - Store everything in an ODBMS (lower overheads than an RDBMS) - Hierarchical storage systems (HPSS etc.) • Trends - DMW2004 Solid State Disks at the front end, commodity disks at the back end Heterogeneous Storage Area Networks [SAN], e.g. CXFS Fiber Optic processor-to-SAN switches Grid enablement (totally distributed archives) Copyright Objectivity, Inc. 2004 3/16/04 SHARING DATA ACROSS SITES Approaches: • Old Way - Transfer files/disks/tapes - Filesystem or no security • Current Approaches - Distributed databases and the World Wide Web - High bandwidth networks - Authentication and secure transport layers • Trends DMW2004 Grid enablement Federated databases Ultra-high bandwidth networks and remote replication Flexible, localized security mechanisms Copyright Objectivity, Inc. 2004 3/16/04 Distributed Federations User X1 A3 A Organization X User X2 Replica of A User X3 Organization Y User Y1 A2 Replica of A DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Distributed Federations User X1 Mobile and Detached A3 A Organization X Replica of A User X2 User X3 Organization Y User Y1 A2 Replica of A DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 QUERYING HUGE DATASETS Approaches: • Old Way - Hold metadata (indexes and relationships) in a searchable file • Current Approaches - Hold metadata in a RDBMS and data in files - Hold metadata and data in an ODBMS • Trends - Adaptations of text search engines - Distributed Parallel Query Engines - Specialized search accelerators DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Current Architecture Queries run synchronously within the client DBA Tools APPLICATION Lock Server Lock Server Language Interfaces Object & Schema Managers Data “Page” Server Query & Index Managers Storage & Transaction Managers Data “Page” Server Networking & Event Managers Mass Storage DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 Parallel Query Engine [PQE] Queries run asynchronously and in parallel, either locally or distributed DBA Tools APPLICATION Language Interfaces Lock Server Lock Server Object & Schema Managers Query & Index Managers PQE Data “Page” Servers Storage & Transaction Managers Networking & Event Managers DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 PQE and Search Accelerator Queries run asynchronously and in parallel, but with Predicate Management within the Search Accelerator DBA Tools APPLICATION FPGA & RAM Language Interfaces Search Accelerator Lock Server Lock Server Object & Schema Managers Query Manager PQE Data Servers Storage & Transaction Managers Networking & Event Managers DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 COST OF OWNERSHIP Approaches: • Old Way - Build It Yourself (many hidden costs) - Run It Yourself • Current Approaches - Use Commercial Off The Shelf [COTS] software - Open Source - Commodity hardware & tiered storage • Trends - Heterogeneous storage - Grid Enablement - Resource and Skill Brokers (Future) DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 SUMMARY SUMMARY • Database languages are still evolving • Data throughput and system latency times are decreasing • Sharing data across sites still presents many challenges • Querying vast datasets will become faster and cheaper • Software vendors are wrestling with Open Source issues • Startup costs are still high, but the trends are downward • Grid enablement will help • Keep working on the Standards! DMW2004 Copyright Objectivity, Inc. 2004 3/16/04 RESOURCES • http://www.objectivity.com • Technical Overview • Data Sheets and White Papers • Free downloadable Java and C++ evaluation software and tutorials • Global Grid Forum • http://www.ggf.org • Email: info@objy.com ANY QUESTIONS? DMW2004 Copyright Objectivity, Inc. 2004 3/16/04