Pclec01

advertisement
Pearcey Centre Course CO24
Database Design using SQL
My name is: Rod Simpson
My current office is C 4.46
My phone number is (03) 990 32352
My email is rod.simpson@csse.monash.edu.au
Pclec 01 / 1
Introduction to Database Technology,
and Database Design using SQL
Objectives : To introduce you to
– Database Technology
– RDB Management Systems
– The Relational Database Model
– Relational Database Design concepts
– Structured Query Language
– Data Warehousing
Pclec 01 / 2
Introduction to Database Technology,
and Database Design using SQL
• And some insight into some of the components and
structure of the Oracle DBMS (version 8i)
Pclec 01 / 3
Database and Associated Topics
The objective of this lecture is to introduce you to a cross
section of material which will be introduced over the next 9
lectures.
You will look at
- the scope of database,
- why this form of data management is so deeply entrenched
in the Information Technology world,
- the different ‘sizes’ of database - and the reasons for this
- the aspects of security, recovery, accuracy and integrity
- and some of the advantages and disadvantages of database
technology
Pclec 01 / 4
SQL Development
There will be some selected and appropriate SQL
commands (user level), and examples will be included in
the lecture material
AND
there will be some exercises based on SQL and its
functions each laboratory session.
There will also be some discussions and review material at
the laboratory sessions
Pclec 01 / 5
Database Theory
Why database?
Data is a valuable corporate resource which needs
accuracy,
consistency,
and security controls.
Pclec 01 / 6
Database Theory
The ‘centralised’ control of data means that for many
applications the data will already exist, and facilitate quicker
development.
Data will no longer be related by application programs, but
by the structure defined in the database.
And this also means
Easier, Faster and Less
Costly User System Maintenance
Pclec 01 / 7
Traditional File Systems
Consider some of the problems of traditional file systems.
In the the past as new applications were written, they either used
existing files, or created a new file or files for their use.
Frequently, several existing files needed to be sorted and merged to
obtain the new file. Thus, it is probable that several files contained
the same information stored in different ways. In other words, there
would have been redundant and possibly inconsistent data.
Consider the files for an insurance company
POLICYNUMBER
POLICYHOLDER
PREMIUMS
data ADDRESS
PREMIUM-PA
PREMIUM-TOTAL
POLICYNUMBER
POLICYHOLDER
AGENCY
data ADDRESS
AGENT-CODE
RENEWAL-DATE
RENEWAL-AMT
Pclec 01 / 8
Information / Data
A General Definition:
Data:
raw (unprocessed or part-processed) facts
which represent the state of entities (things)
which have occurred.
Information: data which has been processed into
a form useful to the user.
What is information to one user, may be
data to another user.
Pclec 01 / 9
Basic Definitions
Database: A collection of related data
Data:
Known facts that can be captured and recorded
Schema: Some part of the real world about which data
is stored in the database.
Database Management System(DBMS):
A software package to facilitate the creation and
maintenance of a computerised database.
Pclec 01 / 10
What is a Database ?
A DATABASE is a shared collection of
Inter-related data which is designed to meet the
needs of multiple types of users and applications.
Thus the concept of USER VIEWS
• Data stored is INDEPENDENT of the programs which
use it
• Data is structured to provide a foundation for future
applications
• Data may be physically distributed
Pclec 01 / 11
Data Base Management System
The Primary Objectives of a DBMS are to provide
facilities for :
1. Definition of Database Logical Structures
2. Definition of Physical Structures
3. Access to the Database
4. Definition of Storage Structures to store user data
These components are known as the ‘database architecture’
Pclec 01 / 12
DataBase Management System
• Software - Provides access to a database
in an integrated and controlled manner.
• Must contain
(1) Definition/Structure
capabilities
(2) Data manipulation
capabilities
Pclec 01 / 13
DBMS Components
1. Data Description Language (DDL)
- used to describe data at the database level
2 levels (1) Schema - complete description of a
database
(2) Sub-Schema - user view
2. Data Manipulation Language (DML)
Provides for Create, Insert, Delete, Drop,
Retrieve, Report, Update, Modify
Calculate (derive)
---> Common term ‘ QUERY’
Pclec 01 / 14
Three Schema Architecture
ANSI & ISO suggest that a DBMS should have three
schemas
Conceptual Schema - the global logical model of the data
and processing of the enterprise. i.e. community user view.
External Schema(s) - the logical application views of the
Conceptual Schema. i.e. individual user views.
Internal Schema - the internal level storage view.
Pclec 01 / 15
Data Base Architecture
3 Schema Architecture
1. User Views
- External Schema
2. Complete Database
3. Physical Database
- Conceptual Schema
- Internal Schema
Pclec 01 / 16
Three Schema Architecture
External
Schema 1
External
Schema 2
External
Schema n
Conceptual
Schema
Internal
Schema
Pclec 01 / 17
Application Development
Applications and their data needs are not considered in
isolation.
Centralised control of one or several databases takes
place. i.e. database administration.
Data administration is seen as an important part of
system development.
CLAIMS
PREMIUMS
D
B
M
S
CLAIMS
PREMIUMS
Pclec 01 / 18
Data Integrity
Validation or integrity rules may be defined and
automatically invoked at run time by the DBMS regardless
of the source of update i.e. application program, 4GL screen
or query language.
Significant variation exists among DBMS in the level of
support for semantic data integrity.
ISO suggest that 100% of all enterprise rules should be held
in the conceptual schema, and specifically none in
application programs.
An area of significant development during the 1990's.
Pclec 01 / 19
Data Integrity
Application
Programs
4GL Screens &
Stored Pros.
Interactive
Query
Language
D
B
M
S
CATALOGUE
Data Definitions &
Integrity Rules
STORED DATA
Pclec 01 / 20
Inter-Related Data
CLAIMS
RENEWALS
D
B
M
S
AGENCY
RENEWALS
CLAIMS AGENCY
Data related by structure
Flexible enquiry easier
QUERY
Pclec 01 / 21
Multiple Applications
LOCAL
VIEWS
DATABASE
AGENCY
CLAIMS
RENEWALS
Pclec 01 / 22
Important Database Functions
(1)
Data Integrity
Data Independence
Referential Integrity
Concurrency Control
Database Consistency
• Multi Users
• Distributed Database
• Replicated Database
• Partitioned Database
Pclec 01 / 23
Important Database Functions(2)
Recovery from Failure
• Transaction
• Media
Determinancy
• Consistent Results
• Respond to ALL events
• and cater for unpredictable order
Scalability
Pclec 01 / 24
Database Environment
Databases Can Be:
• Transaction Intensive Databases
•
Decision Support Databases
•
Mixed Load Databases
•
Small Databases
•
VLDB - Very Large Databases
•
Non-traditional Databases - weather forecasting
Pclec 01 / 25
The Many Faces of Database
They can be:
Data Warehouses
Data Marts (and Data Martlets)
How is a database size measured ?
There are a number of ‘measurements’
Raw data size
Total database size
Total usable disk space size (which includes media
protection such as mirroring)
Pclec 01 / 26
The Many Faces of Database
Hardware
Database
Raw Data
Total Disk
HP9000
Oracle
100GB
643GB
Digital 8400
Oracle
100GB
361GB
IBM SP2
DB2/6000
100GB
377GB
NCR5100
Teradata
100GB
880GB
NCR5100
Teradata
1,000GB
3,280GB
Pclec 01 / 27
The Many Faces of Database
The first databases were stored on large centralised
mainframe computers.
They were accessed from terminals which had no
processing capability
As distributed computing and microcomputers became
available during the early 1980’s, 2 new kinds of databases
emerged :
personal databases
client/server databases
Pclec 01 / 28
The Many Faces of Database
Personal databases (Microsoft Access and FoxPro) are
aimed at the single-user database applications which are
stored on the single user’s desktop computer - a client
workstation
When a personal DBMS is used for a multiuser
application,the database application files are stored on a file
server and transmitted to the individual users across a
network.
A Server refers to any computer able to accept requests
from other computers and to share some or all of its
resources such as printers, files, programs,
Pclec 01 / 29
The Many Faces of Database
A network is an infrastructure of telecommunications
hardware and software which enables computers to transmit
messages to each other
With a personal DBMS, each client workstation must load
the entire application into memory along with the client
database application in order to view, insert, update or print .
A client request for a small amount of data from a large
database might require the server to transmit the entire
database to the client’s workstation.
Pclec 01 / 30
The Many Faces of Database
Newer personal databases use indexed files which enable
the server to send only part of the database. In either case
there is a heavy demand on client workstations and on the
network.
Pclec 01 / 31
The Many Faces of Database
Client/server databases split the DBMS and the applications
into a ‘process’ running on the server and the applications
running on the client.
The client application sends data requests across the
network.
When the server receives a request, the server DBMS
process retrieves the data from the database, performs the
requested functions, and sends only the final query results
back via the network to the client.
This generates less network traffic than personal databases.
Pclec 01 / 32
The Many Faces of Database
Another important difference between client/server and
personal databases is in the handling of client failures.
In a personal database system, when a client workstation
fails, the database is likely to be damaged due to interrupted
updates, deletes, insertions.
Records in use at the failure time are locked. They are
unavailable to other users. The database may be able to be
repaired, but all users must log off during the repair process.
Often the processes active at the time of failure cannot be
reconstructed. The database must be restarted to the last
regular backup, but transactions since that backup are not
automatically available (normally)
Pclec 01 / 33
The Many Faces of Database
A client/server database is not affected when a client
workstation fails. The failed client’s in-process transactions
are lost, but the failure of a single client should not affect
other users.
In the case of a server failure, a central synchronised
transaction log, which contains a record of all current
database changes, enables in-progress transactions from all
clients to be either fully completed or rolled back.
Pclec 01 / 34
The Many Faces of Database
Rolling Back has the effect of the database never having
processed the transactions. Client transactions can then be
resubmitted. Most client/server database servers have
additional features to minimise the risk of failure and have
fast recovery mechanisms. It is a bit similar to the ‘undo’
which you have met in some of Microsoft’s office software
(there is a small exercise with commit and rollback in a few
week’s time)
Pclec 01 / 35
The Many Faces of Database
Client/server systems also differ in the way in which they
handle competing transactions. A system of locking is
normally applied which forces transaction other than the one
current to wait until the lock is unset.
A personal database uses optimistic locking - there is the
assumption that 2 or more competing transactions will not
occur at the same time. User code can be written if this
situation is not acceptable.
Transaction processing: This refers to the grouping of
related database changes into batches which must either all
succeed or all fail.
Pclec 01 / 36
DataBase Environment
All databases require:
–
–
–
–
–
–
–
Querying Capabilities
Data Display facilities
Database navigation
Data entry (Initial Load, Transactions)
Data validation
Data deletion
Committing capability
Pclec 01 / 37
Database Transactions
· Sometimes several database operations need to be
treated as one atomic unit which may either succeed or
fail.
EMP
EMPNO
E3
E4
E1
E2
BUDGET
SALARY
30,000
60,000
50,000
18,000
DEPT
D2
D2
D1
D1
DEPT
T0TAL SALARY
D1
D2
68,000
90,000
To keep the budget correct, any alteration to EMP would
need to flow onto (into ?) BUDGET
Pclec 01 / 38
Concurrency Control
· The DBMS should support multiple concurrent users of
the same data and ensure that the data remains
consistent at all times.
TX 1
Part 2
TX 2
QOH 10
Delivery of 10 items
Supply 5 items
QOH=QOH-5
Part 2
What is the correct result ?
QOH 20
Part 2
QOH 5
Pclec 01 / 39
Security
Each user may require identification with a user-id and
password.
Users may be limited in the data they can see and what
actions they can perform on that data.
The DBMS may encrypt and decrypt data as it is stored and
retrieved.
Many systems now provide data value sensitive security.
There is an article on ‘security’ in about Week 5.
Pclec 01 / 40
Disadvantages of Database Processing
• Complexity
• Expense
• Vulnerability
• Size
• Training Costs
• Compatibility
• Technology Lock-In
Pclec 01 / 41
Advantages of Database Processing
• Reduction in Data Redundancy
• Data Integrity
• Data Independence
• Data Security
• Data Consistency
• Easier Use of Data via DBMS Tools
(Query Language, 4GL’s
• Less Disk Storage
Pclec 01 / 42
Costs Associated with Database
The initial purchase cost
Planning and design
Database education and training
Application and data conversion
System overheads (response)
Management and Administration
Complexity of support
Pclec 01 / 43
The Users
So, who are the users ?
There are 4 main groups
1. Unsophisticated or ‘naïve’ users
They interact with the system by invoking one of the
application programs which have been written as part of
the design and implementation processes.
E.g. a person wishing to find a bank account balance
uses an ATM or Web program which has a ‘form’ the
person can complete and ‘send’.
The balance detail will be returned.
Pclec 01 / 44
The Users
2. Application Programmers
Normally these are computer professionals who write
application programs. They can choose from many tools
to develop the interfaces. RAD’s for instance are tools
which enable a programmer to construct forms and
reports.
There are languages which combine imperative control
structures (for loops, if-the-else statements) with
statements of the data manipulations language. (known
as 4th generation languages).
Pclec 01 / 45
The Users
3. ’Sophisticated’ users.
They interact with the system without writing programs.
They develop their database requests using a
database query language.
The queries are submitted to a query processor, which
interprets the query and converts it into instructions.
(non-procedural language).
On line analytical processing (OLAP) tools simplify
analysts’ tasks by the ‘viewing’ of results in a variety of
ways.
E.g. sales by region, or region and product, or by city
with a region.
Another class of tools is found in Data Mining
applications
Pclec 01 / 46
The Users
4. Specialised Users.
These are sophisticated users who who write specialised
database applications which don’t fit into the ‘traditional’
or ‘normal’ data processing framework.
Computer aided design, knowledge based and expert
systems. Systems which store data with complex data
types such as graphics and audio data, and environment
modelling systems - such as the Country Fire Authority
and the Ambulance systems.
These are gaining in popularity and use.
Pclec 01 / 47
And that’s it for the first session !
Pclec 01 / 48
Download