SB03G : Introduction to DDB Dr. Farhi Marir KMG Research Group, School of Computing, London Met. University Lecture 1 Content l l l l l l l An idea about Centralised, decentralised and Distributed Database, Motivation and Advantages/Disadvantages of DDBMS Distributed Database design Architecture and Functions of DDBMS What is a distributed DBMS Problems Current state-of-affairs 10/12/2003 Copyright © 2001 Dr. F. Marir 2 Database Environment l l l l A major aim of the database is to provide the user with an abstract view of the data Different users may have different views of the data held in the database Abstraction is the starting point for the design of a database for a given organisation To achieve abstraction and the variety of views, a standard architecture is provided in most available commercial DBMS. 10/12/2003 Copyright © 2001 Dr. F. Marir 3 ANSI-SPARC three-level Architecture 10/12/2003 Copyright © 2001 Dr. F. Marir 4 An example of the three levels 8 10/12/2003 Copyright © 2001 Dr. F. Marir 5 Database Schema and Instances l l l Database schema is the overall description of the database Database Instance is the data in the database at any particular point in time. Three different type of schema in the database defined according to three level Architecture – l the external, conceptual and internal schema. The DBMS is responsible for three schemas mapping between theses schemas and – checking the schemas for consistency 10/12/2003 Copyright © 2001 Dr. F. Marir – 6 Data Independence l Logical: – Refers to immunity of external schemas to changes in conceptual schema. e.g. addition/removal of entities. l Physical: – Refers to immunity of conceptual schema to changes in the internal schema, e.g. using different file organisations, storage structures/devices. 10/12/2003 Copyright © 2001 Dr. F. Marir 7 Data Independence and the ANSISPARC Three-level Architecture 10/12/2003 Copyright © 2001 Dr. F. Marir 8 Data Models 10/12/2003 Copyright © 2001 Dr. F. Marir 9 Data Model l High-level collection of concepts that can be used to describe the structure of the database or schema – l l i.e. describing data, relationships between data and constraints on the data in an organisation, A tool for providing the database levels of abstraction. Represent the data in an understandable way & promote collaborative design of the database 10/12/2003 Copyright © 2001 Dr. F. Marir 10 Categories of Data Models l l l Object-based Record-based Physical 10/12/2003 Copyright © 2001 Dr. F. Marir 11 Object-based Data Models l Entity-Relationship (E-R) – – – l Object-Oriented – l Uses concepts such as entity (a distinct object), attribute and relationship It is a main technique for conceptual database design Used as a design methodology for this module It extends the E-R model to include action or operations of the object i.e. behaviour of the object. Other OO-Based Models e.g. Semantic & Functional models. 10/12/2003 Copyright © 2001 Dr. F. Marir 12 Record-based Data Models l l Physical Data Models and logical data models – – – Hierarchical Data Model Network Data Model Relational Data Model 10/12/2003 Copyright © 2001 Dr. F. Marir 13 The Physical Data Model l It describes how data is stored in the computer – – Representing information such as record structure, record ordering and access path most common ones are unifying models and frame memory 10/12/2003 Copyright © 2001 Dr. F. Marir 14 The Hierarchical Model Root Record B4 32 Mans Rd. Bristol B2 56 Clover Rd London B3 163 Main St . Patrick Glasgow B7 16 Argyll St. Sidcup London B5 22 Deer Rd SL41 Leigh Julie Lee SL21 John ... Dyce Aberdeen Assistant 9000 ... White SG14 David SG37 Manager 30000 Ford Ann ... David SG5 Deputy 18000 ... Snr. Ass 12000 Susan Brand SA9 10/12/2003 ... Mary Manager 24000 Howe ... Assistant 9000 Copyright © 2001 Dr. F. Marir 15 The Network Data Model SL41 Julie Lee ... Assistant 9000 SL21 John White ... Manager 30000 B3 163 Main St . Patrick Glasgow SA9 Mary Howe ... Assistant 9000 B4 32 Mans Rd. Bristol SG37 Ann David ... Snr. Ass 12000 London SG14 David Ford ... Deputy 18000 SG5 Brand ... Manager 24000 B5 22 Deer Rd Sidcup London B7 16 Argyll St. Dyce Aberdeen B2 56 Clover Rd 10/12/2003 Leigh Copyright © 2001 Dr. F. Marir Susan 16 Limitations of the Hierarchical and Network Models l l l Require the user to have knowledge of the physical database being accessed Adopt navigational approach i.e. write programs to specify how the data is to be retrieved However, better than the file based approach when it comes to integrating organisation information but 10/12/2003 Copyright © 2001 Dr. F. Marir 17 The Relational Data Model l l l l The relational data model was first proposed by E.F. Codd (1970) Based on the mathematical concept of relation Allow higher degree of data independence i.e Application programs not affected by changes to the internal data representation Provide techniques to dealing with semantics, consistency and redundancy problem. 10/12/2003 Copyright © 2001 Dr. F. Marir 18 Distributed Database Systems 10/12/2003 Copyright © 2001 Dr. F. Marir 19 Centralised Database l l l The database and the DBMS reside at a single computer or site. Users may be able to access the centralised database system remotely via terminals connected to the site; however all the data access and processing takes place at the central site 10/12/2003 Copyright © 2001 Dr. F. Marir 20 An example of centralised database 10/12/2003 Copyright © 2001 Dr. F. Marir 21 What is a Distributed Database System l l l A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (D–DBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. Distributed database system (DDBS) = DDB + D–DBMS 10/12/2003 Copyright © 2001 Dr. F. Marir 22 More on Distributed Database... l Stored/spread physically across computers or sites in different locations – connected together by some form of data communication network. l The sites of a distributed database may be spread over – a large area connected via a Wide Area Network (WAN), or – over a small area connected via a local area network (LAN); l The fragments of the distributed database could be on – different platforms, – with different operating systems and – managed by different database management database 10/12/2003 Copyright © 2001 Dr. F. Marir 23 Distributed Database Management System l l l Users access the distributed database via applications. The database applications running at any of the system’s sites should be able to operate on any of the database fragments transparently, – i.e. as if the data come from a single database managed by one DBMS. The software that manages a distributed database is called a Distributed Database Management System (DDBMS) 10/12/2003 Copyright © 2001 Dr. F. Marir 24 Decentralised and Distributed Databases 10/12/2003 Copyright © 2001 Dr. F. Marir 25 Motivation for DDBMS l Many organisations are naturally distributed over different locations. – For example, a company may have locations at different cities, or a bank may have multiple branches. l l It is natural for databases used in those organisations to be distributed over these locations. A vital requirement from such a distribution is that the users can access data both locally and globally from any other locations 10/12/2003 Copyright © 2001 Dr. F. Marir 26 Motivation... l The necessary technology is already there Database Technology Computer Network Distribution Integration Distributed Database Systems Integration Integration Not Centralisation 10/12/2003 Copyright © 2001 Dr. F. Marir 27 Advantages l l The DDBMS mirrors the structure of an enterprise. Local autonomy (control). – Local data are locally owned and managed with local accountability. Security, integrity, storage representation, hardware are controlled locally. At the same time, users can access remote data when necessary. l No Reliance on a central site. – Avoid bottlenecks and systems vulnerability. l Reliability and Availability. – Continue to operate if one or more sites go down or communication links fail. 10/12/2003 Copyright © 2001 Dr. F. Marir 28 Advantages... l Speed up of querying processing – Queries about data stored locally are answered faster. Moreover, queries can be split to execute in parallel at different sites or they can be redirected to less busy sites. l Modular Growth. – It is much easier to add another site than to expand a centralised system. l Economics – It costs much less to create and maintain a system of smaller computers with the equivalent – power of a single large computer. 10/12/2003 Copyright © 2001 Dr. F. Marir 29 Disadvantage... l Software complexity and high costs. – DDBMS hides the distributed nature from the user and provides an acceptable level of performance, reliability and availability is inherently more complex than a centralised DBMS. – Therefore, DDBMS is more expensive to buy and maintain. l l Additional manpower costs to manage and maintain the local DBMSs and the underlying network. Processing overheads – Increased query processing costs, catalogue management, consistency maintenance. 10/12/2003 Copyright © 2001 Dr. F. Marir 30 Disadvantage... l Data integrity. – It is harder to enforce data integrity when data is updated at different sites simultaneously. l Database Design more complex. – As fragmentation and replication of data and allocation of fragments has to be taken into account. 10/12/2003 Copyright © 2001 Dr. F. Marir 31 Distributed DBMS Issues l Distributed Database Design – how to distribute the database – replicated & non-replicated database distribution – a related problem in directory management l Query Processing – convert user transactions to data manipulation instructions – optimization problem – min{cost = data transmission + local processing} – general formulation is NP-hard 10/12/2003 Copyright © 2001 Dr. F. Marir 32 Distributed DBMS Issues… l Concurrency Control – synchronisation of concurrent accesses – consistency and isolation of transactions' effects – deadlock management l Reliability – how to make the system resilient to failures – atomicity and durability 10/12/2003 Copyright © 2001 Dr. F. Marir 33 Distributed DBMS Issues… l Concurrency Control – synchronisation of concurrent accesses – consistency and isolation of transactions' effects – deadlock management l Reliability – how to make the system resilient to failures – atomicity and durability 10/12/2003 Copyright © 2001 Dr. F. Marir 34 Distributed DBMS Issues… l Concurrency Control – synchronisation of concurrent accesses – consistency and isolation of transactions' effects – deadlock management l Reliability – how to make the system resilient to failures – atomicity and durability 10/12/2003 Copyright © 2001 Dr. F. Marir 35 Relationship Between Issues… Directory Management Query Processing Distributed Design Reliability Concurrency Control Deadlock Management 10/12/2003 Copyright © 2001 Dr. F. Marir 36 Related Issues l Operating System Support – operating system with proper support for database operations – dichotomy between general purpose processing requirements and database processing requirements l Open Systems and Interoperability – Distributed Multi-database Systems – More probable scenario – Parallel issues 10/12/2003 Copyright © 2001 Dr. F. Marir 37