Matakuliah Tahun : M0264/Manajemen Basis Data : 2008 Manajemen Basis Data Pertemuan 11 Objectives • Introduction to Parallel Databases (Pengenalan Basis Data Paralel) • Introduction to Distributed Databases (Pengenalan Basis Data Terdistribusi) Bina Nusantara Introduction to Parallel Databases • Why Parallel Access To Data? 1 Terabyte 1 Terabyte 10 MB/s Parallelism: At 10 MB/s 1.2 days to scan Bina Nusantara 1,000 x parallel divide a big problem into many smaller ones 1.5 minute to scan. to be solved in parallel. Introduction to Parallel Databases • Parallelism is natural to DBMS processing – Pipeline parallelism: many machines each doing one step in a multi-step process. – Partition parallelism: many machines doing the same thing to different pieces of data. Bina Nusantara Introduction to Parallel Databases • Increased Availability • Distributed Access To Data • Analysis of Distributed Data Bina Nusantara Introduction to Parallel Databases • Shared Nothing • Shared Memory • Shared Disk Bina Nusantara Introduction to Parallel Databases • DBMSs are the most (only?) successful application of parallelism. – Teradata, Tandem vs. Thinking Machines, KSR.. – Every major DBMS vendor has some || server – Workstation manufacturers now depend on || DB server sales. • Reasons for success: – – – – Bina Nusantara Bulk-processing (= partition ||-ism). Natural pipelining. Inexpensive hardware can do the trick! Users/app-programmers don’t need to think in || Introduction to Distributed Databases • Data is stored at several sites, each managed by a DBMS that can run independently. • Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence principles). • Distributed Transaction Atomicity: Users should be able to write Xacts accessing multiple sites just like local Xacts. Bina Nusantara Introduction to Distributed Databases • Users have to be aware of where data is located, i.e., Distributed Data Independence and Distributed Transaction Atomicity are not supported. • These properties are hard to support efficiently. • For globally distributed sites, these properties may not even be desirable due to administrative overheads of making location of data transparent. Bina Nusantara Introduction to Distributed Databases Types of Distributed Databases • Homogeneous: Every site runs same type of DBMS. • Heterogeneous: Different sites run different DBMSs (different RDBMSs or even non-relational DBMSs). Bina Nusantara Introduction to Parallel Databases and Distributed Databases • Parallel DBMSs designed for scalable performance. Relational operators very well-suited for parallel execution. – Pipeline and partitioned parallelism. • Distributed DBMSs offer site autonomy and distributed administration. Must revisit storage and catalog techniques, concurrency control, and recovery issues. Bina Nusantara