And Franchise Colleges HSQ - DATABASES & SQL 05s Concepts of DBMS By MANSHA NAWAZ Section 05 Concepts Of DBMS 1 Distributed Database Management Systems (DDBMS) • • • Multiple physically connected sites, where users can access the data from another site. A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network. Distributed Databases (2 site example) Local DDBMS Section 05 System Messages Data Exchange Concepts Of DBMS Local DDBMS 2 • In a true distributed database, the data itself is located on more than one machine. • There are various possible approaches, depending on the needs of the application and the degree of emphasis placed on central control versus local autonomy. • In general, organisations may wish to: – Reduce data communications costs by putting data at the location where it is most often used, – Aggregate information from different sources, – Provide a more robust system (e.g. when one node goes down the others continue working), – Build in extra security by maintaining copies of the database at different sites. Section 05 Concepts Of DBMS 3 • Distributed Database systems are not always designed that way originally. • Traditional systems may develop into Distributed Database Systems as organisational needs become apparent. • One approach is for a complete central database to be maintained and updated in the normal way. – Local copies (in whole or part) are sent periodically to remote sites, to be used for fast and cheap retrieval (daily). – Any local updates have no effect on the central database. – This approach is only effective consistency between all copies of the database at all times is not crucial. Section 05 Concepts Of DBMS 4 Types of DDBMS • Homogeneous: All sites use the same DBMS product. • Heterogeneous: Distributed database development may involve the linking together of previously separate systems, perhaps running on different machine architectures with different software packages. – Individual sites manage and update their own databases for standard operational applications, but that information is collected and aggregated for higher-level decision support functions. – In this case there is no single location where the whole database is stored; it is genuinely split over two or more sites. • Homogeneous DDBMS are generally based on a Relational Database Management Systems. Section 05 Concepts Of DBMS 5 DDBMS Overview • A collection of logically related data. • The data is split into a number of fragments. • Fragments may be replicated. • Fragments / Replicas are allocated to sites. • Sites are linked by a communications network. • The data at each site is under the control of a DBMS. • The DBMS at each site can handle local applications, autonomously. • Each DBMS participates in at least one global application. Section 05 Concepts Of DBMS 6 Advantages of a DDBMS Approach • Organisational structure – Many organisations are naturally distributed over several locations. • Shareability and local autonomy – The geographical distribution of an organisation can be reflected in the distribution of the data. • Improved availability – Local data is kept locally - some local technical support required. • Improved reliability – The failure of a node or a communication link does not necessarily make the data inaccessible. • Improved performance – Local data storage is faster - total storage can be greater. – Distributed transactions may be faster - a complex issue. – Less contention for centralised CPU and I/O. • Modular growth – It is more easier to handle expansion. Section 05 Concepts Of DBMS 7 Disadvantages of a DDBMS Approach • Complexity - more complex than a centralised DBMS • Cost - Increased complexity means the costs for a DDBMS will be higher – Experienced staff also required. • Security issues. – Security becomes more important and complex • Integrity control more difficult – Database integrity refers to the validity and consistency of stored data • Lack of standards - No standard for DDBMSs available • Lack of experience - finding experienced staff is difficult. • Database design more complex - Fragmentation, allocation, and data replication, etc. • However, to date, general-purpose distributed DBMSs have not been widely accepted Section 05 Concepts Of DBMS 8 User viewpoint of a DDBMS • User sees the system at conceptual level as if it is physically and logically centralised Global View (Global Schema) Site A Section 05 Site B Site C Concepts Of DBMS Site D 9 Replication and Fragmentation overview. • DDBMSs aim to support: – Location transparency – Fragmentation Transparency • Horizontal • Vertical – Replication Transparency Section 05 Concepts Of DBMS 10 Replication • Copies of tables (or fragments) duplicated at a number of sites. – The DRDBMS keeps data consistent between sites. • Increase availability / parallelism. – Parallelism involves complex query optimisation. • Reduction in data movement and thus comms costs. – Local data stays local. – Large local ‘read only’ transactions are more efficient. • Increased resilience to failure. – If one site fails the data at that site can be available in a replica on another site. • Problems of integrity / concurrency etc. – DRDBMS provide facilities to support this aspect. Section 05 Concepts Of DBMS 11 Horizontal fragmentation • Relations are split into a number of row-subset relations. – Local sites have their own rows. – Fragments can be replicas of data from other sites. • The original relation can be re-constructed by the relational union operation. • Increased localisation of data through horizontal fragmentation. • Queries that require rows from may fragments (at many sites) are handled transparently by the DRDBMS. • An example would be horizontal fragmentation of an ORDER table fragmented such that rows physically resided at the branch that generated them. Queries like .. SELECT * FROM ORDER WHERE date > ’11-OCT-2000”; do not require the user to know where the data actually resides or which fragment / replica is used. Section 05 Concepts Of DBMS 12 Vertical Fragmentation. • Tables are vertically split, the resulting tables containing a subset of attributes. (Relational Projection etc.) • The original relation can be reconstructed by use of the relational join operator. • A simple example: – using the table SALE from the Winsor & Allsthop Conservatories scenario. – A vertically fragmented replica of all the SALEs is placed at the head office limited to (sale#, model#,branch#). – Horizontally fragmented sections of the SALE table, including all other attributes, are kept at the appropriate branches. • Provides increased localisation of data through vertical fragmentation (and horizontal in the example above). • Queries that require rows from may fragments (at many sites) are handled transparently by the DRDBMSConcepts - the user see a global database only. Section 05 Of DBMS 13 More Disadvantages? • Communication and transfer of data can slow down response rate – 3 sites: A, B & C – A needs to join 10,000 records at A with 5 at B • Possible approaches: – Send 10,000 rows to B, join there and send result to A – Send 5 to A, join there!! – Send 10,000 from A to C, 5 from B to C, join there, and send result to A • Thus using the CPU power of C if A & B are busy. • Difference of 1 second vs. ‘a long time’. • Needs an SQL intelligent optimiser! – Theories of optimising parallel query processing are a favourite research topic. (C.J. Date et al) Section 05 Concepts Of DBMS 14 Updates with Replication – site fails? • Primary copy approach • Concurrency over sites – Global deadlock problem Ta Tb Site A Td • Tc Site B A DRBMS must provide a distributed concurrency control mechanism. Section 05 Concepts Of DBMS 15 Recovery - brief overview • Multiple updates and aborts – A DRBMS must provide a distributed recovery control mechanism. • 2 phase commit used (commit locally, commit globally) Section 05 Concepts Of DBMS 16 Database Optimisation and Tuning • • • • Optimisation and Tuning DBMS Front end features Database tuning involves ensuring that the database is configured so that it performs at maximum speed for all applications. In practice this is difficult to achieve because different application programs may have conflicting needs. Further, it is common for databases to be multi-user and so many different applications may be accessing a database at the same time. In the case of a single user database the problems are generally less severe as often only one application will be running at a time. The software has the whole resources of the computer and often such systems only have a limited amount of data to deal with and thus perform quickly anyway. Database tuning becomes ever more important as the volume of data in tables grows. For example, if a system has an order table with 500 orders, then searching for a single order will be simple as the whole order table will often be loaded into main memory making sophisticated search techniques largely unnecessary. However, if a multi-user has 400,000 orders stored and up to 40 users (telephone sales operators) using this data then the problem is quite different. Section 05 Concepts Of DBMS 17 Indexes • Database indexes are the key method of speeding up database access. In relational databases rows are not stored in any particular order. Thus if a customer table has 80,000 rows and a telesales operator wants to see the account of a customer called 'SMITH' then there are 80,000 rows to search. SELECT * FROM customer WHERE cust_name = 'SMITH'; • • There would perhaps be many 'SMITH's returned, every row would need to be searched to find the target rows. Creating a secondary index on the attribute 'cust_name' would make a major difference to the speed of this query. CREATE INDEX nameidx ON customer (cust_name); • The index is called a secondary index to distinguish it from a primary index. A primary index is similar but ensures that all values indexed are unique. e.g. CREATE UNIQUE INDEX cust_primary_idx ON customer (cust_number); Section 05 Concepts Of DBMS 18 Indexing Technology • Modern Relational Database Management Systems use powerful indexing routines generally making use of B+Tree technology. The speed and power of indexing systems in a highly important aspect of developing a competitive RDBMS product. The B+Tree index is fast and flexible. It is excellent at finding exact targets such as 'SMITH' but is also good at finding the results of range queries. For example, finding all the customers whose name starts with 'S‘ SELECT * FROM customer WHERE name LIKE 'S%‘ Response Times • Use of indexes also provides a more consistent response time for queries. The time to find a particular target row is not dependent on its position in the table. The response time is dependent mainly on the depth of the index and all queries have to navigate the full depth of the index. Thus response time is more even than when, for example, one query into an non-indexed table finds its target in the first of 80,000 rows and the next finds its target in the last of 80,000 rows. Section 05 Concepts Of DBMS 19 Disadvantages of Indexing • One problem of creating indexes to improve performance of data manipulation queries (SELECT) is that each index is itself often a huge file of information. This file is usually hidden from the users but does consume a lot of space. Further, inserting and deleting new rows in a database table results in the indexes on that table all requiring to be updated. If every attribute and useful combination of attributes in a table has a separate index then the overhead of insertion and deletion will be very large. General Tuning Approaches • • A useful approach to database tuning is to implement the following indexes. This approach does require analysis of probable query types. However, this is normally done as part of Systems Analysis and Design techniques during the development of software. Indexes on: – – – – Primary keys - Primary Indexes Foreign keys (to speed joining tables) - secondary indexes. Attributes that are frequently queried yield a small set of rows - secondary indexes. Attributes that are frequently used for displaying data in a sorted output - secondary indexes. Section 05 Concepts Of DBMS 20 Other Tuning Approaches • • Hashing Another method of storing data in a way in which it can be found quickly is HASHING. This method works by taking the data, say a customer number, and applying some mathematical formula to the value. • The outcome of this formula is than used as a physical address in a file for the location where this record will be stored. So the record for the Customer with the code 'C99762' is stored in a calculated position, to retrieve the record the system runs the calculation again to find where it originally put the record. • • Clustering In this method the Database Administrator uses DBMS facilities to physically sort database tables based on likely access. Thus for an ORDER table, perhaps it is useful to physically group all the rows for each CUSTOMER together. Within that grouping ORDERS might also be sorted into date sequence if this is how they are normally retrieved. This also means that simpler, and thus faster, indexing techniques can be used. Each subset of rows that need to be stored together is called a 'cluster'. • Clustering can beyond this approach, depending on the facilities provided by the DBMS. For example, each cluster of ORDERS belongs to a single CUSTOMER but each order is associated with a number of ORDER LINES in the ORDER LINE table. • A more powerful clustering approach is to store the ORDER rows for each customer together followed physically by the appropriate ORDER LINES. The ORDER LINES would also be appropriately physically sorted. Thus when SQL queries require the joining of ORDERS and ORDER LINES for a customer then all the data is quickly available. • Some Database Management Systems allow you to go even further and interleave, for example, individual ORDERS and matching ORDER LINES. Section 05 Concepts Of DBMS 21 Disadvantages of Physical Clustering • Any kind of physical clustering is difficult to maintain. As new data is entered it will need physical clustering and that may involve significant database reorganisation. Often a DBMS will offer facilities to optimise clustering during quiet periods (over night etc.). • Clustering is a powerful way of tuning database performance but will only be useful if a particular type of query that matches the clustering is the overwhelmingly dominant query for this data. • Would this approach work for the databases supporting ATM's? Section 05 Concepts Of DBMS 22 Client Server Systems • Databases are by their very nature a shared resource. So far we have only made use of single user systems. However, databases in a commercial environment are nearly always a shared resource with, perhaps, many users adding, editing and deleting data. In this situation client/server systems are very effective. Client 1 App 1 Client 2 App 2 Server Client 3 App 3 Client 4 App 4 DBMS Data etc. Netwo rk Connectio n Section 05 Concepts Of DBMS Typical Client / Server Archit ecture 23 Client • • The client is a machine that provides a 'Front End' to the database. The front end is used to provide a suitable user interface for the users. The front end software might be written in JAVA or Visual BASIC (etc.) perhaps or it might be an SQL interpreter, a report generator or a full database tool like Microsoft Access. Thus the front end may be a specially written application written in some language. It also could be a more general purpose interface allowing users to access a remote data base but to configure software (for example write queries) on the local machine. Section 05 Concepts Of DBMS 24 Database Server • The database server is also connected to the network and is referred to as the 'Back End'. The back end Database Management System is installed on the server. This does not have to be, in the case of a MS Access front end, any particular Database Management System although Microsoft produce SQL Server for this purpose. Generally any high power DBMS can be configured to operate as the back end. ORACLE is a common choice due to its high performance and depth of technical resource. • The back end server is generally a dedicated database machine. This implies that it will have far greater data bandwidth than a typical PC. This means that its ability to transfer data to and from its hard disc system is very fast. • Ultra fast hard disc access is expensive especially if it is needed on many small desktop machines. By placing the database on a specialist machine the technology used can be far more appropriate to the needs of a busy DBMS. Section 05 Concepts Of DBMS 25 Database Server cont.. • DBMS servers typically have: • • • • • • • Unix operating systems (not essential) Very fast disc interfaces Large main memory Large dedicated hardware disc cache memory Built in fast backup facilities Un-interruptible power supplies (UPS) Expert management • To provide these facilities on individual desktop systems would be far more expensive. The other main advantage of the back end database server approach is that it naturally places all the data in a single location making data sharing easier. • Another advantage is that the huge load handling a large volume of data and the processing requirements of running a complex database management system are removed from the local desktop machines. Further the requirement for relentless back up of data is moved from many machines to one where automatic fast and reliable backup technology can be used. • Losing commercially sensitive data is a potential disaster for most companies - for example what would happen to a large mail order company that lost details of all current orders, deliveries and outstanding accounts Section 05 Concepts Of DBMS 26 The following table summarises client/server functions. Database Client - Desktop PC Database Server Manages User Interface Accepts database requests (SQL) Accepts (& validates) user data Processes database requests: Performs integrity checks Handles concurrent access Optimises SQL queries Performs security checks (user access) Provides database recovery from system failures (crashes) Processes application program logic Generates database requests (SQL) Transmits requests (SQL) to the server Receives results from server Formats and displays results according to application software (could be tables, reports or graphical output May import data into local system for local processing Transmits results of database requests to client Database physical optimisation Provides statistical information on database May import data from foreign systems Provides facilities for database administrator to optimise and tune database access performance Section 05 Concepts Of DBMS 27 Further Client / Server Configurations • More complex systems are possible where there are several database servers. These may be in different geographic locations and the connection of the network may include elements of both Local Area and Wide Area networks (internet). Summary: Distributed Databases • Usually homogeneous & relational • Advantages & Disadvantages (many of both!) • Transparency: Location, Fragmentation, Replication Section 05 Concepts Of DBMS 28 End of Lecture Section 05 Concepts Of DBMS 29