PMIT-6102 Advanced Database Systems ByJesmin Akhter Assistant Professor, IIT, Jahangirnagar University Continue from 16.01.2015-------- Every week Friday • From 2:30 PM-4:30 PM NB: Schedule may change Slide 2 Attendance Exercise test =10% =10% Instant test Assignment Presentation Class Test (Average of three) =20% Final Examination =60% ================================ =100% Slide 3 Introduction (Lecture 01) Overview of Relational DBMS (Lecture 02, 03) Distributed Database Design (Lecture 04) Overview of Query Processing (Lecture 05) Distributed Query Processing (Lecture 06) Tutorial-1 Tutorial-2 Distributed Transaction Management (Lecture 07) Distributed Concurrency Control (Lecture 08, 09) Reliability (Lecture 10, 11) Parallel Database Systems (Lecture 12,13) Distributed Object DBMS (Lecture14) Tutorial-4 Slide 4 Tutorial-3 Slide 5 Tutorial Date and Time Tutorial-01 14th February 2014 Tutorial-02 14th March 2013 Mid term Examination 28th March 2014 Tutorial-03 2nd May 2014 Final Examination 13th June 2013 NB: Schedule may change Slide 6 Lecture 01 Introduction to DDBMS Introduction Distributed Database System Applications Distributed DBMS Promises Problem Areas Architectural Models for Distributed DBMSs Slide 8 Application program 1 DBMS Data description Application program 2 Data manipulation control database Application program 3 Slide 9 Database Technology Computer Networks integration distribution Distributed Database Systems integration Slide 10 A number of autonomous processing elements that are interconnected by a computer network and that cooperate in performing their assigned tasks. The “processing element” referred to a computing device that can execute a program on its own. Slide 11 Processing logic: processing logic or processing elements are distributed Functions: Various functions of a computer system could be delegated to various pieces of hardware or software Data: Data used by a number of applications may be distributed to a number of processing sites Control: The control of the execution of various tasks might be distributed instead of being performed by one computer system. Slide 12 “Distributed database system” (DDBS) is used to refer jointly distributed database and the distributed DBMS. A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (D–DBMS) is the software manages the DDB and provides an access mechanism makes this distribution transparent to the users. Slide 13 Physical distribution does not necessarily imply that the computer systems be geographically far apart; May be in the same room. The communication between them is done over a network instead of through shared memory or shared disk (multiprocessor systems) with the network as the only shared resource. Slide 14 A timesharing computer system A loosely or tightly coupled multiprocessor system Not DDBS, Because in DDBS communication between computer systems is done over a network instead of through shared memory or shared disk with the network as the only shared resource. A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node Slide 15 The CPU time is shared by different processes Time slice is defined by the OS, for sharing CPU time between processes. Slide 16 P1 Pn M D Not a DDBS Slide 17 Each processor node has its own primary and secondary memory, may also have its own peripherals, are quite similar to the distributed environment, but there are differences. The fundamental difference is the mode of operation. Database systems that run over multiprocessor systems are called parallel database systems P1 M1 Pn D Mn 1 D n Not a DDBS Slide 18 Site 1 Site 2 Site 5 Communication Network Site 4 Site 3 Not a DDBS Slide 19 Site 1 Site 2 Site 5 Communication Network Site 4 Site 3 Slide 20 DBMS Software DBMS Software DBMS Software User Query User Application DBMS Software Communication Subsystem User Query DBMS Software User Application User Query Slide 21 Data stored at a number of sites each site logically consists of a single processor. Processors at different sites are interconnected by a computer network no multiprocessors parallel database systems Distributed database is a database, not a collection of files data logically related as exhibited in the users’ access patterns relational data model D-DBMS is a full-fledged DBMS not remote file system. Slide 22 Manufacturing - especially multi-plant manufacturing Military command and control Electronic fund transfers and electronic trading Corporate MIS Airline restrictions Hotel chains Any organization which has a decentralized organization structure Slide 23 Transparent management of distributed, fragmented, and replicated data Improved reliability/availability through distributed transactions Improved performance Easier and more economical system expansion Slide 24 Example: Four relations: EMP(ENO, ENAME, TITLE) PROJ(PNO,PNAME, BUDGET) SAL(TITLE, AMT) ASG(ENO, PNO, RESP, DUR). For a centralized DBMS, find out the names of employees with salary who worked on a project for more than 12 months SELECT ENAME, AMT FROM EMP, ASG, SAL WHERE ASG.DUR > 12 AND EMP.ENO = ASG.ENO AND SAL.TITLE = EMP.TITLE Slide 25 ASG EMP ENO ENAME TITLE E1 E2 E3 E4 E5 E6 E7 E8 J. Doe M. Smith A. Lee J. Miller B. Casey L. Chu R. Davis J. Jones Elect. Eng. Syst. Anal. Mech. Eng. Programmer Syst. Anal. Elect. Eng. Mech. Eng. Syst. Anal. ENO PNO PROJ E1 E2 E2 E3 E3 E4 E5 E6 E7 E7 E8 P1 P1 P2 P3 P4 P2 P2 P4 P3 P5 P3 RESP Manager Analyst Analyst Consultant Engineer Programmer Manager Manager Engineer Engineer Manager DUR 12 24 6 10 48 18 24 48 36 23 40 Sal PNO PNAME BUDGET TITLE AMT P1 P2 P3 P4 Instrumentation Database Develop. CAD/CAM Maintenance 150000 135000 250000 310000 Elect. Eng. Syst. Anal. Mech. Eng. Programmer 40000 34000 27000 24000 Slide 26 To localize data such that data about the employees in Waterloo office are stored in Waterloo, those in the Boston office are stored in Boston, and so forth. The same applies to the project and salary information. That is data is distributed. We partition each of the relations and store each partition at a different site. This is known as fragmentation. Data that are commonly accessed by one user can be placed on that user’s local machine as well as on the machine of another user with the same access requirements. That is data is replicated Slide 27 Fully transparent access means that the users can still create the query without paying any attention to the fragmentation, location, or replication of data. let the system worry about resolving these issues. SELECT FROM WHERE AND AND ENAME,AMT EMP,ASG,SAL DUR > 12 EMP.ENO = ASG.ENO SAL.TITLE = EMP.TITLE Tokyo Paris Boston Communication Network Paris projects Paris employees Paris assignments Boston employees Boston projects Boston employees Boston assignments Montreal New York Boston projects New York employees New York projects New York assignments Montreal projects Paris projects New York projects with budget > 200000 Montreal employees Montreal assignments Slide 28 A transparent system “hides” the implementation details from users. Fundamental issue is to provide Data independence in the distributed environment Network (distribution) transparency Replication transparency Fragmentation transparency horizontal fragmentation: selection vertical fragmentation: projection hybrid Slide 29 It refers to the immunity of user applications to changes in the definition and organization of data. Logical data independence Logical data independence refers to the immunity of user applications to changes in the logical structure (i.e., schema) of the database. Physical data independence Deals with hiding the details of the storage structure from user applications. Slide 30 Slide 31 What is the basic difference between Database systems and distributed Database Systems? What is being distributed? Define a loosely or tightly coupled multiprocessor system Draw Distributed Database System – Reality What do you mean by replicated data? What are the Promises Distributed DBMS Slide 32