SB03G : Introduction to DDB Lecture 1 Content

advertisement
SB03G : Introduction to
DDB
Dr. Farhi Marir
KMG Research Group,
School of Computing,
London Met. University
Lecture 1 Content
l
l
l
l
l
l
l
An idea about Centralised, decentralised
and Distributed Database,
Motivation and
Advantages/Disadvantages of DDBMS
Distributed Database design
Architecture and Functions of DDBMS
What is a distributed DBMS
Problems
Current state-of-affairs
10/12/2003
Copyright © 2001 Dr. F. Marir
2
Database Environment
l
l
l
l
A major aim of the database is to provide
the user with an abstract view of the data
Different users may have different views
of the data held in the database
Abstraction is the starting point for the
design of a database for a given
organisation
To achieve abstraction and the variety of
views, a standard architecture is provided
in most available commercial DBMS.
10/12/2003
Copyright © 2001 Dr. F. Marir
3
ANSI-SPARC three-level Architecture
10/12/2003
Copyright © 2001 Dr. F. Marir
4
An example of the three levels
8
10/12/2003
Copyright © 2001 Dr. F. Marir
5
Database Schema and Instances
l
l
l
Database schema is the overall
description of the database
Database Instance is the data in the
database at any particular point in time.
Three different type of schema in the
database defined according to three
level Architecture
–
l
the external, conceptual and internal schema.
The DBMS is responsible for three
schemas
mapping between theses schemas and
– checking the schemas for consistency
10/12/2003
Copyright © 2001 Dr. F. Marir
–
6
Data Independence
l
Logical:
– Refers to immunity of external schemas
to changes in conceptual schema.
e.g. addition/removal of entities.
l
Physical:
– Refers to immunity of conceptual schema
to changes in the internal schema,
e.g. using different file organisations,
storage structures/devices.
10/12/2003
Copyright © 2001 Dr. F. Marir
7
Data Independence and the ANSISPARC Three-level Architecture
10/12/2003
Copyright © 2001 Dr. F. Marir
8
Data Models
10/12/2003
Copyright © 2001 Dr. F. Marir
9
Data Model
l
High-level collection of concepts that
can be used to describe the structure
of the database or schema
–
l
l
i.e. describing data, relationships between data
and constraints on the data in an organisation,
A tool for providing the database
levels of abstraction.
Represent the data in an
understandable way & promote
collaborative design of the database
10/12/2003
Copyright © 2001 Dr. F. Marir
10
Categories of Data Models
l
l
l
Object-based
Record-based
Physical
10/12/2003
Copyright © 2001 Dr. F. Marir
11
Object-based Data Models
l
Entity-Relationship (E-R)
–
–
–
l
Object-Oriented
–
l
Uses concepts such as entity (a distinct object),
attribute and relationship
It is a main technique for conceptual database
design
Used as a design methodology for this module
It extends the E-R model to include action or
operations of the object i.e. behaviour of the
object.
Other OO-Based Models e.g. Semantic &
Functional models.
10/12/2003
Copyright © 2001 Dr. F. Marir
12
Record-based Data Models
l
l
Physical Data Models and
logical data models
–
–
–
Hierarchical Data Model
Network Data Model
Relational Data Model
10/12/2003
Copyright © 2001 Dr. F. Marir
13
The Physical Data Model
l
It describes how data is stored in the
computer
–
–
Representing information such as record
structure, record ordering and access path
most common ones are unifying models and
frame memory
10/12/2003
Copyright © 2001 Dr. F. Marir
14
The Hierarchical Model
Root Record
B4 32 Mans Rd.
Bristol
B2 56 Clover Rd
London
B3 163 Main St . Patrick Glasgow
B7 16 Argyll St.
Sidcup London
B5 22 Deer Rd
SL41
Leigh
Julie
Lee
SL21
John
...
Dyce
Aberdeen
Assistant 9000
...
White
SG14 David
SG37
Manager 30000
Ford
Ann
...
David
SG5
Deputy 18000
...
Snr. Ass 12000
Susan
Brand
SA9
10/12/2003
...
Mary
Manager 24000
Howe
...
Assistant 9000
Copyright © 2001 Dr. F. Marir
15
The Network Data Model
SL41
Julie
Lee
...
Assistant 9000
SL21
John
White
...
Manager 30000
B3 163 Main St . Patrick Glasgow
SA9
Mary
Howe
...
Assistant 9000
B4 32 Mans Rd.
Bristol
SG37
Ann
David
...
Snr. Ass 12000
London
SG14 David
Ford
...
Deputy 18000
SG5
Brand
...
Manager 24000
B5 22 Deer Rd
Sidcup
London
B7 16 Argyll St.
Dyce
Aberdeen
B2 56 Clover Rd
10/12/2003
Leigh
Copyright © 2001 Dr. F. Marir
Susan
16
Limitations of the Hierarchical and Network
Models
l
l
l
Require the user to have knowledge of
the physical database being accessed
Adopt navigational approach i.e. write
programs to specify how the data is to
be retrieved
However, better than the file based
approach when it comes to integrating
organisation information but
10/12/2003
Copyright © 2001 Dr. F. Marir
17
The Relational Data Model
l
l
l
l
The relational data model was first
proposed by E.F. Codd (1970)
Based on the mathematical concept of
relation
Allow
higher
degree
of
data
independence i.e Application programs
not affected by changes to the internal
data representation
Provide techniques to dealing with
semantics, consistency and redundancy
problem.
10/12/2003
Copyright © 2001 Dr. F. Marir
18
Distributed Database
Systems
10/12/2003
Copyright © 2001 Dr. F. Marir
19
Centralised Database
l
l
l
The database and the DBMS reside at a
single computer or site.
Users may be able to access the
centralised database system remotely via
terminals connected to the site;
however all the data access and
processing takes place at the central site
10/12/2003
Copyright © 2001 Dr. F. Marir
20
An example of centralised database
10/12/2003
Copyright © 2001 Dr. F. Marir
21
What is a Distributed Database
System
l
l
l
A distributed database (DDB) is a
collection of multiple, logically
interrelated databases distributed over a
computer network.
A distributed database management
system (D–DBMS) is the software that
manages the DDB and provides an access
mechanism that makes this distribution
transparent to the users.
Distributed database system (DDBS) =
DDB + D–DBMS
10/12/2003
Copyright © 2001 Dr. F. Marir
22
More on Distributed Database...
l
Stored/spread physically across computers or
sites in different locations
– connected together by some form of data
communication network.
l
The sites of a distributed database may be
spread over
– a large area connected via a Wide Area Network (WAN), or
– over a small area connected via a local area network (LAN);
l
The fragments of the distributed database
could be on
– different platforms,
– with different operating systems and
– managed by different database management database
10/12/2003
Copyright © 2001 Dr. F. Marir
23
Distributed Database Management
System
l
l
l
Users access the distributed database via
applications.
The database applications running at any of the
system’s sites should be able to operate on any
of the database fragments transparently,
– i.e. as if the data come from a single database
managed by one DBMS.
The software that manages a distributed
database is called a Distributed Database
Management System (DDBMS)
10/12/2003
Copyright © 2001 Dr. F. Marir
24
Decentralised and Distributed
Databases
10/12/2003
Copyright © 2001 Dr. F. Marir
25
Motivation for DDBMS
l
Many organisations are naturally
distributed over different locations.
– For example, a company may have locations at
different cities, or a bank may have multiple
branches.
l
l
It is natural for databases used in those
organisations to be distributed over
these locations.
A vital requirement from such a
distribution is that the users can access
data both locally and globally from any
other locations
10/12/2003
Copyright © 2001 Dr. F. Marir
26
Motivation...
l
The necessary technology is already
there
Database
Technology
Computer
Network
Distribution
Integration
Distributed Database
Systems
Integration
Integration Not Centralisation
10/12/2003
Copyright © 2001 Dr. F. Marir
27
Advantages
l
l
The DDBMS mirrors the structure of an
enterprise.
Local autonomy (control).
– Local data are locally owned and managed with local
accountability. Security, integrity, storage
representation, hardware are controlled locally. At the
same time, users can access remote data when
necessary.
l
No Reliance on a central site.
– Avoid bottlenecks and systems vulnerability.
l
Reliability and Availability.
– Continue to operate if one or more sites go down or
communication links fail.
10/12/2003
Copyright © 2001 Dr. F. Marir
28
Advantages...
l
Speed up of querying processing
– Queries about data stored locally are answered faster.
Moreover, queries can be split to execute in parallel at
different sites or they can be redirected to less busy
sites.
l
Modular Growth.
– It is much easier to add another site than to expand a
centralised system.
l
Economics
– It costs much less to create and maintain a system of
smaller computers with the equivalent
– power of a single large computer.
10/12/2003
Copyright © 2001 Dr. F. Marir
29
Disadvantage...
l
Software complexity and high costs.
– DDBMS hides the distributed nature from the user
and provides an acceptable level of performance,
reliability and availability is inherently more complex
than a centralised DBMS.
– Therefore, DDBMS is more expensive to buy and
maintain.
l
l
Additional manpower costs to manage and
maintain the local DBMSs and the underlying
network.
Processing overheads
– Increased query processing costs, catalogue
management, consistency maintenance.
10/12/2003
Copyright © 2001 Dr. F. Marir
30
Disadvantage...
l
Data integrity.
– It is harder to enforce data integrity when
data is updated at different sites
simultaneously.
l
Database Design more complex.
– As fragmentation and replication of data
and allocation of fragments has to be taken
into account.
10/12/2003
Copyright © 2001 Dr. F. Marir
31
Distributed DBMS Issues
l
Distributed Database Design
– how to distribute the database
– replicated & non-replicated database
distribution
– a related problem in directory management
l
Query Processing
– convert user transactions to data manipulation
instructions
– optimization problem
– min{cost = data transmission + local processing}
– general formulation is NP-hard
10/12/2003
Copyright © 2001 Dr. F. Marir
32
Distributed DBMS Issues…
l
Concurrency Control
– synchronisation of concurrent accesses
– consistency and isolation of transactions'
effects
– deadlock management
l
Reliability
– how to make the system resilient to failures
– atomicity and durability
10/12/2003
Copyright © 2001 Dr. F. Marir
33
Distributed DBMS Issues…
l
Concurrency Control
– synchronisation of concurrent accesses
– consistency and isolation of transactions'
effects
– deadlock management
l
Reliability
– how to make the system resilient to failures
– atomicity and durability
10/12/2003
Copyright © 2001 Dr. F. Marir
34
Distributed DBMS Issues…
l
Concurrency Control
– synchronisation of concurrent accesses
– consistency and isolation of transactions'
effects
– deadlock management
l
Reliability
– how to make the system resilient to failures
– atomicity and durability
10/12/2003
Copyright © 2001 Dr. F. Marir
35
Relationship Between Issues…
Directory Management
Query
Processing
Distributed Design
Reliability
Concurrency Control
Deadlock
Management
10/12/2003
Copyright © 2001 Dr. F. Marir
36
Related Issues
l
Operating System Support
– operating system with proper support for
database operations
– dichotomy between general purpose processing
requirements and database processing
requirements
l
Open Systems and Interoperability
– Distributed Multi-database Systems
– More probable scenario
– Parallel issues
10/12/2003
Copyright © 2001 Dr. F. Marir
37
Download