Concepts of Database Management Systems 1

advertisement
And Franchise Colleges
HSQ - DATABASES & SQL
05s
Concepts of DBMS
By MANSHA NAWAZ
Section 05
Concepts Of DBMS
1
Distributed Database Management Systems (DDBMS)
•
•
•
Multiple physically connected sites, where users can access the data from
another site.
A logically interrelated collection of shared data (and a description of this data),
physically distributed over a computer network.
Distributed Databases (2 site example)
Local
DDBMS
Section 05
System Messages
Data Exchange
Concepts Of DBMS
Local
DDBMS
2
•
In a true distributed database, the data itself is located on more than one
machine.
•
There are various possible approaches, depending on the needs of the
application and the degree of emphasis placed on central control versus
local autonomy.
•
In general, organisations may wish to:
– Reduce data communications costs by putting data at the location where
it is most often used,
– Aggregate information from different sources,
– Provide a more robust system (e.g. when one node goes down the others
continue working),
– Build in extra security by maintaining copies of the database at different
sites.
Section 05
Concepts Of DBMS
3
•
Distributed Database systems are not always designed that way originally.
•
Traditional systems may develop into Distributed Database Systems as
organisational needs become apparent.
•
One approach is for a complete central database to be maintained and updated
in the normal way.
– Local copies (in whole or part) are sent periodically to remote sites, to be
used for fast and cheap retrieval (daily).
– Any local updates have no effect on the central database.
– This approach is only effective consistency between all copies of the
database at all times is not crucial.
Section 05
Concepts Of DBMS
4
Types of DDBMS
•
Homogeneous: All sites use the same DBMS product.
•
Heterogeneous: Distributed database development may involve the linking
together of previously separate systems, perhaps running on different machine
architectures with different software packages.
– Individual sites manage and update their own databases for standard
operational applications, but that information is collected and aggregated for
higher-level decision support functions.
– In this case there is no single location where the whole database is stored; it
is genuinely split over two or more sites.
•
Homogeneous DDBMS are generally based on a Relational Database
Management Systems.
Section 05
Concepts Of DBMS
5
DDBMS Overview
•
A collection of logically related data.
•
The data is split into a number of fragments.
•
Fragments may be replicated.
•
Fragments / Replicas are allocated to sites.
•
Sites are linked by a communications network.
•
The data at each site is under the control of a DBMS.
•
The DBMS at each site can handle local applications, autonomously.
•
Each DBMS participates in at least one global application.
Section 05
Concepts Of DBMS
6
Advantages of a DDBMS Approach
•
Organisational structure
– Many organisations are naturally distributed over several locations.
•
Shareability and local autonomy
– The geographical distribution of an organisation can be reflected in the distribution
of the data.
•
Improved availability
– Local data is kept locally - some local technical support required.
•
Improved reliability
– The failure of a node or a communication link does not necessarily make the data
inaccessible.
•
Improved performance
– Local data storage is faster - total storage can be greater.
– Distributed transactions may be faster - a complex issue.
– Less contention for centralised CPU and I/O.
•
Modular growth
– It is more easier to handle expansion.
Section 05
Concepts Of DBMS
7
Disadvantages of a DDBMS Approach
•
Complexity - more complex than a centralised DBMS
•
Cost - Increased complexity means the costs for a DDBMS will be higher
– Experienced staff also required.
•
Security issues.
– Security becomes more important and complex
•
Integrity control more difficult
– Database integrity refers to the validity and consistency of stored data
•
Lack of standards - No standard for DDBMSs available
•
Lack of experience - finding experienced staff is difficult.
•
Database design more complex - Fragmentation, allocation, and data replication,
etc.
•
However, to date, general-purpose distributed DBMSs have not been widely
accepted
Section 05
Concepts Of DBMS
8
User viewpoint of a DDBMS
•
User sees the system at conceptual level as if it is physically and
logically centralised
Global View
(Global Schema)
Site A
Section 05
Site B
Site C
Concepts Of DBMS
Site D
9
Replication and Fragmentation overview.
•
DDBMSs aim to support:
– Location transparency
– Fragmentation Transparency
• Horizontal
• Vertical
– Replication Transparency
Section 05
Concepts Of DBMS
10
Replication
•
Copies of tables (or fragments) duplicated at a number of sites.
– The DRDBMS keeps data consistent between sites.
•
Increase availability / parallelism.
– Parallelism involves complex query optimisation.
•
Reduction in data movement and thus comms costs.
– Local data stays local.
– Large local ‘read only’ transactions are more efficient.
•
Increased resilience to failure.
– If one site fails the data at that site can be available in a replica on another site.
•
Problems of integrity / concurrency etc.
– DRDBMS provide facilities to support this aspect.
Section 05
Concepts Of DBMS
11
Horizontal fragmentation
•
Relations are split into a number of row-subset relations.
– Local sites have their own rows.
– Fragments can be replicas of data from other sites.
•
The original relation can be re-constructed by the relational union
operation.
•
Increased localisation of data through horizontal fragmentation.
•
Queries that require rows from may fragments (at many sites) are handled
transparently by the DRDBMS.
•
An example would be horizontal fragmentation of an ORDER table fragmented
such that rows physically resided at the branch that generated them.
Queries like .. SELECT * FROM ORDER WHERE date > ’11-OCT-2000”; do
not require the user to know where the data actually resides or which
fragment / replica is used.
Section 05
Concepts Of DBMS
12
Vertical Fragmentation.
•
Tables are vertically split, the resulting tables containing a subset of
attributes. (Relational Projection etc.)
•
The original relation can be reconstructed by use of the relational join
operator.
•
A simple example:
– using the table SALE from the Winsor & Allsthop Conservatories scenario.
– A vertically fragmented replica of all the SALEs is placed at the head office
limited to (sale#, model#,branch#).
– Horizontally fragmented sections of the SALE table, including all other
attributes, are kept at the appropriate branches.
•
Provides increased localisation of data through vertical fragmentation (and
horizontal in the example above).
•
Queries that require rows from may fragments (at many sites) are handled
transparently by the DRDBMSConcepts
- the user
see a global database only.
Section 05
Of DBMS
13
More Disadvantages?
•
Communication and transfer of data can slow down response rate
– 3 sites: A, B & C
– A needs to join 10,000 records at A with 5 at B
•
Possible approaches:
– Send 10,000 rows to B, join there and send result to A
– Send 5 to A, join there!!
– Send 10,000 from A to C, 5 from B to C, join there, and send result to A
• Thus using the CPU power of C if A & B are busy.
•
Difference of 1 second vs. ‘a long time’.
•
Needs an SQL intelligent optimiser!
– Theories of optimising parallel query processing are a favourite research topic. (C.J.
Date et al)
Section 05
Concepts Of DBMS
14
Updates with Replication – site fails?
•
Primary copy approach
•
Concurrency over sites
– Global deadlock problem
Ta
Tb
Site A
Td
•
Tc
Site B
A DRBMS must provide a distributed concurrency control mechanism.
Section 05
Concepts Of DBMS
15
Recovery - brief overview
•
Multiple updates and aborts
– A DRBMS must provide a distributed recovery control mechanism.
•
2 phase commit used (commit locally, commit globally)
Section 05
Concepts Of DBMS
16
Database Optimisation and Tuning
•
•
•
•
Optimisation and Tuning
DBMS Front end features
Database tuning involves ensuring that the database is configured so that it
performs at maximum speed for all applications. In practice this is difficult to
achieve because different application programs may have conflicting needs.
Further, it is common for databases to be multi-user and so many different
applications may be accessing a database at the same time. In the case of a
single user database the problems are generally less severe as often only one
application will be running at a time. The software has the whole resources of the
computer and often such systems only have a limited amount of data to deal with
and thus perform quickly anyway.
Database tuning becomes ever more important as the volume of data in tables
grows. For example, if a system has an order table with 500 orders, then
searching for a single order will be simple as the whole order table will often be
loaded into main memory making sophisticated search techniques largely
unnecessary. However, if a multi-user has 400,000 orders stored and up to 40
users (telephone sales operators) using this data then the problem is quite
different.
Section 05
Concepts Of DBMS
17
Indexes
•
Database indexes are the key method of speeding up database access. In
relational databases rows are not stored in any particular order. Thus if a
customer table has 80,000 rows and a telesales operator wants to see the
account of a customer called 'SMITH' then there are 80,000 rows to search.
SELECT *
FROM customer
WHERE cust_name = 'SMITH';
•
•
There would perhaps be many 'SMITH's returned, every row would need to be
searched to find the target rows.
Creating a secondary index on the attribute 'cust_name' would make a major
difference to the speed of this query.
CREATE INDEX nameidx
ON customer (cust_name);
•
The index is called a secondary index to distinguish it from a primary index. A
primary index is similar but ensures that all values indexed are unique. e.g.
CREATE UNIQUE INDEX cust_primary_idx
ON customer (cust_number);
Section 05
Concepts Of DBMS
18
Indexing Technology
•
Modern Relational Database Management Systems use powerful indexing
routines generally making use of B+Tree technology. The speed and power of
indexing systems in a highly important aspect of developing a competitive
RDBMS product. The B+Tree index is fast and flexible. It is excellent at
finding exact targets such as 'SMITH' but is also good at finding the results of
range queries. For example, finding all the customers whose name starts with
'S‘
SELECT *
FROM customer
WHERE name LIKE 'S%‘
Response Times
•
Use of indexes also provides a more consistent response time for queries.
The time to find a particular target row is not dependent on its position in
the table. The response time is dependent mainly on the depth of the index
and all queries have to navigate the full depth of the index. Thus response
time is more even than when, for example, one query into an non-indexed
table finds its target in the first of 80,000 rows and the next finds its target
in the last of 80,000 rows.
Section 05
Concepts Of DBMS
19
Disadvantages of Indexing
•
One problem of creating indexes to improve performance of data
manipulation queries (SELECT) is that each index is itself often a huge file of
information. This file is usually hidden from the users but does consume a lot
of space. Further, inserting and deleting new rows in a database table results
in the indexes on that table all requiring to be updated. If every attribute and
useful combination of attributes in a table has a separate index then the
overhead of insertion and deletion will be very large.
General Tuning Approaches
•
•
A useful approach to database tuning is to implement the following indexes. This
approach does require analysis of probable query types. However, this is normally
done as part of Systems Analysis and Design techniques during the development
of software.
Indexes on:
–
–
–
–
Primary keys - Primary Indexes
Foreign keys (to speed joining tables) - secondary indexes.
Attributes that are frequently queried yield a small set of rows - secondary indexes.
Attributes that are frequently used for displaying data in a sorted output - secondary indexes.
Section 05
Concepts Of DBMS
20
Other Tuning Approaches
•
•
Hashing
Another method of storing data in a way in which it can be found quickly is HASHING. This method works
by taking the data, say a customer number, and applying some mathematical formula to the value.
•
The outcome of this formula is than used as a physical address in a file for the location where this record
will be stored. So the record for the Customer with the code 'C99762' is stored in a calculated position, to
retrieve the record the system runs the calculation again to find where it originally put the record.
•
•
Clustering
In this method the Database Administrator uses DBMS facilities to physically sort database tables based
on likely access. Thus for an ORDER table, perhaps it is useful to physically group all the rows for each
CUSTOMER together. Within that grouping ORDERS might also be sorted into date sequence if this is
how they are normally retrieved. This also means that simpler, and thus faster, indexing techniques can be
used. Each subset of rows that need to be stored together is called a 'cluster'.
•
Clustering can beyond this approach, depending on the facilities provided by the DBMS. For example,
each cluster of ORDERS belongs to a single CUSTOMER but each order is associated with a number of
ORDER LINES in the ORDER LINE table.
•
A more powerful clustering approach is to store the ORDER rows for each customer together followed
physically by the appropriate ORDER LINES. The ORDER LINES would also be appropriately physically
sorted. Thus when SQL queries require the joining of ORDERS and ORDER LINES for a customer then all
the data is quickly available.
•
Some Database Management Systems allow you to go even further and interleave, for example, individual
ORDERS and matching ORDER LINES.
Section 05
Concepts Of DBMS
21
Disadvantages of Physical Clustering
•
Any kind of physical clustering is difficult to maintain. As new data is entered it
will need physical clustering and that may involve significant database
reorganisation. Often a DBMS will offer facilities to optimise clustering during
quiet periods (over night etc.).
•
Clustering is a powerful way of tuning database performance but will only be
useful if a particular type of query that matches the clustering is the
overwhelmingly dominant query for this data.
•
Would this approach work for the databases supporting ATM's?
Section 05
Concepts Of DBMS
22
Client Server Systems
•
Databases are by their very nature a shared resource. So far we have only made
use of single user systems. However, databases in a commercial environment are
nearly always a shared resource with, perhaps, many users adding, editing and
deleting data. In this situation client/server systems are very effective.
Client 1
App 1
Client 2
App 2
Server
Client 3
App 3
Client 4
App 4
DBMS
Data
etc.
Netwo rk Connectio n
Section 05
Concepts Of DBMS
Typical Client / Server Archit ecture
23
Client
•
•
The client is a machine that provides a 'Front End' to the database. The front end
is used to provide a suitable user interface for the users. The front end software
might be written in JAVA or Visual BASIC (etc.) perhaps or it might be an SQL
interpreter, a report generator or a full database tool like Microsoft Access.
Thus the front end may be a specially written application written in some
language. It also could be a more general purpose interface allowing users to
access a remote data base but to configure software (for example write queries)
on the local machine.
Section 05
Concepts Of DBMS
24
Database Server
•
The database server is also connected to the network and is referred to as the
'Back End'. The back end Database Management System is installed on the
server. This does not have to be, in the case of a MS Access front end, any
particular Database Management System although Microsoft produce SQL
Server for this purpose. Generally any high power DBMS can be configured to
operate as the back end. ORACLE is a common choice due to its high
performance and depth of technical resource.
•
The back end server is generally a dedicated database machine. This implies
that it will have far greater data bandwidth than a typical PC. This means that its
ability to transfer data to and from its hard disc system is very fast.
•
Ultra fast hard disc access is expensive especially if it is needed on many small
desktop machines. By placing the database on a specialist machine the
technology used can be far more appropriate to the needs of a busy DBMS.
Section 05
Concepts Of DBMS
25
Database Server cont..
•
DBMS servers typically have:
•
•
•
•
•
•
•
Unix operating systems (not essential)
Very fast disc interfaces
Large main memory
Large dedicated hardware disc cache memory
Built in fast backup facilities
Un-interruptible power supplies (UPS)
Expert management
•
To provide these facilities on individual desktop systems would be far more expensive. The other
main advantage of the back end database server approach is that it naturally places all the data in a
single location making data sharing easier.
•
Another advantage is that the huge load handling a large volume of data and the processing
requirements of running a complex database management system are removed from the local
desktop machines. Further the requirement for relentless back up of data is moved from many
machines to one where automatic fast and reliable backup technology can be used.
•
Losing commercially sensitive data is a potential disaster for most companies - for example what
would happen to a large mail order company that lost details of all current orders, deliveries and
outstanding accounts
Section 05
Concepts Of DBMS
26
The following table summarises client/server functions.
Database Client - Desktop PC
Database Server
Manages User Interface
Accepts database requests (SQL)
Accepts (& validates) user data
Processes database requests:

Performs integrity checks

Handles concurrent access

Optimises SQL queries

Performs security checks (user access)

Provides database recovery from system failures
(crashes)
Processes application program logic
Generates database requests (SQL)
Transmits requests (SQL) to the server
Receives results from server
Formats and displays results according to application
software (could be tables, reports or graphical output
May import data into local system for local processing
Transmits results of database requests to client
Database physical optimisation
Provides statistical information on database
May import data from foreign systems
Provides facilities for database administrator to optimise
and tune database access performance
Section 05
Concepts Of DBMS
27
Further Client / Server Configurations
•
More complex systems are possible where there are several database
servers. These may be in different geographic locations and the
connection of the network may include elements of both Local Area and
Wide Area networks (internet).
Summary: Distributed Databases
•
Usually homogeneous & relational
•
Advantages & Disadvantages (many of both!)
•
Transparency: Location, Fragmentation, Replication
Section 05
Concepts Of DBMS
28
End of Lecture
Section 05
Concepts Of DBMS
29
Download