1 UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell Orsborn Uppsala Database Laboratory Department of Information Technology, Uppsala University, Uppsala, Sweden Kjell Orsborn 2016-03-23 2 UU - DIS - UDBL Introduction to Distributed DBMSs (Elmasri/Navathe ch. 24) Distributed DBMS (ch. 24.4 and 24.5 are omitted) Kjell Orsborn Uppsala Database Laboratory, Department of Information Technology, Uppsala University, Uppsala, Sweden Kjell Orsborn 2016-03-23 3 UU - DIS - UDBL Distributed DBMSs • A distributed database (DDB) is a collection of several logically interrelated databases distributed over a computer network including a number of computers (nodes). • A distributed database mangement system (DDBMS) is a software system that permits management of DDB’s and that makes the distribution transparent for the user. • A DDB is not: – a collection of files (need structure and DB manager) – a client-server interface to a database • data on one node, clients on other nodes in network • (almost) every centralized DBMS has client-server interface Kjell Orsborn 2016-03-23 4 UU - DIS - UDBL Background • What is a Distributed System? • A Distributed System is a number of autonomous computers communicating over a network with software for integrated tasks. • Examples of Distributed Systems: • SUN’s Network File System (NFS), distributed file system Kjell Orsborn 2016-03-23 5 UU - DIS - UDBL Distributed DBMSs . . . Distributed database over several nodes in a network Centralized database in a network Node 5 Node 1 Node 5 Node 1 Node 2 Node 2 communication network communication network Node 4 Node 4 Node 3 Node 3 Kjell Orsborn 2016-03-23 6 UU - DIS - UDBL Centralized Database Server • • • • • Stream (row-by-row) based client-server interfaces DBMS specific interfaces Compiler integrated interfaces (embedded SQL) ODBC: SQL-based standardized subroutine call library (Microsoft) JDBC: ODBC for Java (not Microsoft) Kjell Orsborn 2016-03-23 7 UU - DIS - UDBL Distributed Databases • • • • Database seen as one unit; queries and updates to ONE database. Data in database transparently distributed over many DB nodes. Manual partitioning or fragmentation of data tables. DBMS automatically optimizes queries and updates to distributed database. Kjell Orsborn 2016-03-23 8 UU - DIS - UDBL Multi-Databases • Database seen as several heterogeneous units • Multi-database query language needed to combine data from the databases. • Primitives needed to integrate (combine, fuse) data from the databases. • Special query optimization techniques to deal with heterogneity and dynamism. Kjell Orsborn 2016-03-23 9 UU - DIS - UDBL Example of Multi-Database • Automatic Teller Machines, ATMs Kjell Orsborn 2016-03-23 10 UU - DIS - UDBL Fragmentation of data • data fragmentation (= data partitioning) • division of data sets (e.g. a relation) into several pieces - fragments transparently stored on several different nodes • increased accessability and performance • several types of fragmentation: – horisontal fragmentation – vertical fragmentation – mixed fragmentation • good when nodes far apart Kjell Orsborn 2016-03-23 11 UU - DIS - UDBL Replication of data • copies of the same data on several nodes • increased reliability and access performance • more complex updating, transactions handling, recovery. – updates must be propagated to each replica! – special procedures after failures to restore consistency – more problematic transaction synchronization! • types of replication: – – – – – Kjell Orsborn full replication (whole db at each node) no replication (each fragment only at one node) partial replication (certain fragments replicated) not necessary to replicate all tables full replication often not realistic! 2016-03-23 12 UU - DIS - UDBL Transparency in a DDBMS • • • By transparency we here mean the hiding of basic implementation details from one abstraction level to another. Data independence – logical data independence – physical data independence Network transparency – – – protect user from operational details of network hides the existence of a network no machine names in database table references • • • Replication transparency – – – • location transparency naming transparency user should not experience data replicas automatic handling of updates, such as replica propagation automatic handling of node crasches Fragmentation transparency – hides the existence of fragments • – Kjell Orsborn e.g. that a logical relation is horizontally fragmented into local physical tables handling of transformation of global queries to fragmented queries 2016-03-23 13 UU - DIS - UDBL Advantages of Distributed DBMSs . . . • Data sharing – uniform interface and sharing of data through the DDBMS – natural to distribute certain database applications • Increased reliability – redundance increase security and accessability – crashes less severe (if application not dependent of non-local data) • Local independence – allows sharing of data but keeps local control of data • Improved performance – avoid unnecessary data transfer • Expandibility – easy to add new nodes (not always linear scale up due to central directory) • Local autonomy – local control – local policies Kjell Orsborn 2016-03-23 14 UU - DIS - UDBL Problems with Distributed DBMSs . . . • Complexity – database administration becomes more complex (such as recovery) – increased complexity of system design, implementation and maintenance • Security – keep security in a network harder • Networking a known problem • Distributed administration – less control and more meetings • Cost – hardware - software - development/maintenance Kjell Orsborn 2016-03-23 15 UU - DIS - UDBL Problems with Distributed DBMSs . . . • Distributed schema management – – – – schema is accessed whenever SQL query issued! global directory => Central Database becomes hot spot local directories => Data replication => Since schema is not updated often but need to be accessed very often it is normally fully replicated by the DDBMS. • Distributed concurrency control – consistency of replicas: mutual consistency • Distributed deadlock management • Reliability of DDBMS – consistency of replicas – bring up (fragmented) database at failed sites • OS Support – multiple layers of network software Kjell Orsborn 2016-03-23 16 UU - DIS - UDBL Additional functionality required by DDBMS • Access of physically divided databases - schema management • Handling of distribution and replication of data – which copy of data should for example be used • Handling of consistency of replicated data • Handling of distributed queries • Handling of distributed transactions (over several network nodes) • Handling of recovery/restart from crashes (of nodes) and new types of errors such as communication errrors/failures. Kjell Orsborn 2016-03-23 17 UU - DIS - UDBL Distributed database design • Goal: – to minimize the combined cost of maintaining data, recieve efficient communication and good performance for transactions. • Problems: – – – – – where (on which node/nodes) shall data and applications be placed partitioning of data (split data into distributed partitions) replication of data (copies of data on several nodes) NP-complete optimization problem. distributed query processing • automatically done by distributed query processor of DDBMS • analyze query --> distributed execution plan • factors: – data replication – data availability – communication costs Kjell Orsborn 2016-03-23