Manajemen Basis Data Pertemuan 11 Matakuliah : M0264/Manajemen Basis Data

advertisement
Matakuliah
Tahun
: M0264/Manajemen Basis Data
: 2008
Manajemen Basis Data
Pertemuan 11
Objectives
• Introduction to Parallel Databases (Pengenalan Basis
Data Paralel)
• Introduction to Distributed Databases (Pengenalan Basis
Data Terdistribusi)
Bina Nusantara
Introduction to Parallel Databases
• Why Parallel Access To Data?
1 Terabyte
1 Terabyte
10 MB/s Parallelism:
At 10 MB/s 1.2 days
to scan
Bina Nusantara
1,000 x parallel
divide a big problem
into many smaller ones 1.5 minute to scan.
to be solved in parallel.
Introduction to Parallel Databases
• Parallelism is natural to DBMS processing
– Pipeline parallelism: many machines each doing one step in a
multi-step process.
– Partition parallelism: many machines doing the same thing to
different pieces of data.
Bina Nusantara
Introduction to Parallel Databases
• Increased Availability
• Distributed Access To Data
• Analysis of Distributed Data
Bina Nusantara
Introduction to Parallel Databases
• Shared Nothing
• Shared Memory
• Shared Disk
Bina Nusantara
Introduction to Parallel Databases
• DBMSs are the most (only?) successful application of
parallelism.
– Teradata, Tandem vs. Thinking Machines, KSR..
– Every major DBMS vendor has some || server
– Workstation manufacturers now depend on || DB server sales.
• Reasons for success:
–
–
–
–
Bina Nusantara
Bulk-processing (= partition ||-ism).
Natural pipelining.
Inexpensive hardware can do the trick!
Users/app-programmers don’t need to think in ||
Introduction to Distributed Databases
• Data is stored at several sites, each managed by a
DBMS that can run independently.
• Distributed Data Independence: Users should not have
to know where data is located (extends Physical and
Logical Data Independence principles).
• Distributed Transaction Atomicity: Users should be able
to write Xacts accessing multiple sites just like local
Xacts.
Bina Nusantara
Introduction to Distributed Databases
• Users have to be aware of where data is located, i.e.,
Distributed Data Independence and Distributed
Transaction Atomicity are not supported.
• These properties are hard to support efficiently.
• For globally distributed sites, these properties may not
even be desirable due to administrative overheads of
making location of data transparent.
Bina Nusantara
Introduction to Distributed Databases
Types of Distributed Databases
• Homogeneous: Every site runs same type of DBMS.
• Heterogeneous: Different sites run different DBMSs
(different RDBMSs or even non-relational DBMSs).
Bina Nusantara
Introduction to Parallel Databases and Distributed
Databases
• Parallel DBMSs designed for scalable performance.
Relational operators very well-suited for parallel
execution.
– Pipeline and partitioned parallelism.
• Distributed DBMSs offer site autonomy and distributed
administration. Must revisit storage and catalog
techniques, concurrency control, and recovery issues.
Bina Nusantara
Download