Chapter 1 - Introduction

advertisement
1.
Introduction
Syllabus
2. ER Model
3. Relational Model
4. SQL
5. Integrity and Security
6. Relational Database Design
7. File Structure, Indexing and Hashing
8. Transaction
9. Concurrency control
10. Recovery System
Chapter 1: Introduction
• What is Database?
• Purpose of Database Systems
• View of Data
• Data Models
• Data Definition Language
• Data Manipulation Language
• Transaction Management
• Storage Management
• Database Administrator
• Database Users
• Overall System Structure
Definition of Database
• A shared collection of logically related
data, designed to meet the information
needs of multiple users in an organization.
• The term database is often erroneously
referred to as a synonym for a “DataBase
Management System (DBMS)"
Contd…
•
A collection of data: part numbers, product codes, customer information, etc.
It usually refers to data organized and stored on a computer that can be
searched and retrieved by a computer program.
•
A data structure that stores metadata, i.e. data about data. More generally we
can say an organized collection of information.
•
A collection of information organized and presented to serve a specific
purpose. (A telephone book is a common database.) A computerized
database is an updated, organized file of machine readable information that
is rapidly searched and retrieved by computer.
•
An organized collection of information in computerized format.
•
A collection of related information about a subject organized in a useful
manner that provides a base or foundation for procedures such as retrieving
information, drawing conclusions, and making decisions.
Example
Thing
Data (Facts/Figures)
Cricket Player
Country, name, date of birth, specialty,
matches played, runs etc.
Scholars
Name, data of birth, age, country, field,
books published etc.
Food
Name, ingredients, taste, preferred
time, origin, etc.
Vehicle
Registration number, make, owner,
type, price, etc.
Purpose / Need of Database
Systems
Let us discuss an example
Example: Personal Calendar
• We might start by building a file with
the following structure:
What Day
When Who
Where
Lunch 10/24
CS123 10/25
Biking 10/26
Dinner 10/26
1pm
9am
9am
6PM
Joe’s Diner
Morris234
Jane’s house
Café Le Boeuf
Rick
Dr. Egghead
Jane
Jane
• This text file is easy to deal with. So
there's no need for a DBMS!
Problem 1: Data Organization
• Consider the all-important ``who'' field.
Do we also want to keep e-mail
addresses, telephone numbers etc?
• Expand our file to look like:
What When Who-name
Who-email
…
Who-tel …. Where
• Now we are keeping our address book in
our calendar and doing so redundantly.
“Link” Calendar with
Address Book?
• Two conceptual “entities” -- contact
information and calendar -- with a
relationship between them, linking
people in the calendar to their
contact information.
• This link could be based on something
as simple as the person's name.
Problem 2: Efficiency
• Size of personal address book is probably less
than one hundred entries, but there are things
we'd like to do quickly and efficiently.
– “Give me all appointments on 10/28”
– “When am I next meeting Jim?”
• “Program” these as quickly as possible.
• Have these programs executed efficiently.
• What would happen if you were using a
“corporate” calendar with hundreds of
thousands of entries?
Problem 3. Concurrency and
Reliability
• Suppose other people are allowed access to
your calendar and are allowed to modify it?
How do we stop two people changing the
file at the same time and leaving it in a
physical (or logical) mess?
• Suppose the system crashes while we are
changing the calendar. How do we recover
our work?
Example
Suppose a manager schedule a
meeting with his staff today (3:00pm)
and at the same time his secretary
schedules him to meet with the
Chairman. They both see that the
time is open, but presumably only one
of the two meetings will show on the
calendar later.
What is a DBMS?
• A database (DB) is a large, integrated
collection of data.
• A DB models a real-world enterprise / DBMS
contains information about a particular
enterprise.
• A database management system (DBMS) is a
software package designed to store and
manage databases / set of programs to
access the data.
• DBMS provides an environment that is
simultaneously convenient, secure and
efficient to use.
• Is the software or tool that is used to manage the
database and its users.
• A DBMS consist of different components or
subsystem.
• Each subsystem or component of the DBMS
performs different function(s).
• So a DBMS is collection of different programs
but they all work jointly to manage the data
stored in the database and its users.
• Database is collection of data, DBMS
is tool to manage this data, and both
jointly are called database system.
What the DBMS is about
•
•
•
•
•
Organization of data
Efficient retrieval of data
Reliable storage of data
Maintaining consistent data
All these topics are interrelated.
Drawbacks of file systems
• In the early days, database applications were
built directly on top of file systems
• Drawbacks of using file systems to store
data:
– Data redundancy and inconsistency
• Multiple file formats, duplication of information in
different files
– Difficulty in accessing data
• Need to write a new program to carry out each new
task
– Data isolation — multiple files and formats
Drawbacks of file systems
(Cont.)
– Integrity problems
• Integrity constraints (e.g. account balance > 0)
become “buried” in program code rather than being
stated explicitly
• Hard to add new constraints or change existing
ones
– Atomicity of updates
• Failures may leave database in an inconsistent
state with partial updates carried out
• Example: Transfer of funds from one account to
another should either complete or not happen at
all
Drawbacks of file systems
(Cont.)
• Concurrent access by multiple users
• Concurrent accessed needed for
performance
• Uncontrolled concurrent accesses can lead to
inconsistencies
– Example: Two people reading a balance
and updating it at the same time
• Security problems
• Hard to provide user access to some, but not
all, data
• Database systems offer solutions to all the above
problems
Database Applications
–
–
–
–
–
Banking: all business transactions
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Manufacturing: production, inventory,
orders, supply chain
– Human resources: employee records,
salaries, tax deductions
Data and Information
• Data is the collection of raw facts collected
from any specific environment for a specific
purpose.
• Data in itself does not show anything about
its environment.
• So to get desired types of results from the
data we transform it into information by
applying certain processing on it.
• Once we have processed data using different
methods data is converted into meaningful
form and that form of the Data is called
information
Levels of Abstraction
DBMS users are not computer trained,
developers hide the complexity from
users thro’ levels of abstraction, to
simplify user’s interactions with the
system
• Physical level: lowest level describes
how a record (e.g., customer) is stored.
• Logical level: next higher level
describes what data stored in database,
and the relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : integer;
end;
• View level: highest level describes
only part of the entire database.
DBMS may provide many views for
the same database.
View of Data
An architecture for a database system
Data Abstraction
What data users and
application programs
see ?
View Level
View 1
What data is stored ?
describe data properties such as
data semantics, data relationships
How data is actually stored ?
e.g. are we using disks ? Which
file system ?
View 2
Logical
Level
Physical
Level
…
View n
Instances and Schemas
Similar to types and variables in programming
languages
• Schema – the logical structure / overall design
of the database
– Example: The database consists of information about
a set of customers and accounts and the relationship
between them
– Analogous to type information of a variable in a
program
– Physical schema: database design at the physical
level
– Logical schema: database design at the logical level
• Instance – the actual content of the database at
a particular point in time
– Analogous to the value of a variable
Data Independence
• Physical Data Independence – the ability to
modify the physical schema without changing the logical
schema
– Applications depend on the logical schema
– In general, the interfaces between the various levels
and components should be well defined so that
changes in some parts do not seriously influence
others.
• Logical Data Independence – the ability to
modify the logical schema without causing application
programs to be rewritten. It is difficult to achieve
since application programs are heavily dependent on
logical structure of data that they access.
Database Language
Language for accessing and manipulating the data
organized by the appropriate data model
• One to specify database schema, storage
structure and access methods (DDL) and
• other to express database queries and
updates (DML)
Data Definition Language (DDL)
• Specification notation for defining the database
schema
• DDL compiler generates a set of tables stored in a
data dictionary, a file that contains metadata, i.e.
data about data. Data dictionary is consulted
before reading or modifying the actual data
Data Manipulation Language
(DML)
• Language for accessing and manipulating the data
organized by the appropriate data model
– DML also known as query language
• Two classes of languages
– Procedural – user specifies what data is
required and how to get those data
– Declarative (nonprocedural) – user specifies
what data is required without specifying how to
get those data
• SQL is the most widely used query language
Data Modeling
• A data model is a collection of concepts for describing
data properties and domain knowledge:
–
–
–
–
Data relationships
Data semantics
Data constraints
Relational Model
• Only one abstract concept
• Closer to the physical representation on disk
• Normalization
• Entity-Relationship data model
(mainly for database design)
• Relational model
• Object-based data models (Objectoriented and Object-relational)
• Semistructured data model (XML)
• Other older models:
– Network model
– Hierarchical model
Entity-Relationship Model
• Models an enterprise as a collection of entities and relationships
– Entity: a “thing” or “object” in the enterprise that is
distinguishable from other objects
• Described by a set of attributes
– Relationship: an association among several entities
e.g. each employee is an entity described by empno, empname,
designation etc.
– Entity-relationship Model
•
•
•
•
Diagrammatic representation
Easier to work with
Syntax not important, but remember the “meaning”
Remember what you can model
• Entity set – set of all entities of the
same type
• Relationship set – set of all
relationships of same type
• Mapping cardinality – number of
entities to which another entity can
be associated via relationship set
• The overall logical design of database
can be expressed graphically by an
E-R diagram, which has following
components:
• Rectangles - entity set
• Ellipses –attributes of an entity
• Diamonds –relationship among entity
sets
• Lines – link attributes to entity sets
and entity sets to relationship sets
Each component of E-R diagram is labeled with entity
or relationship that it represents
Example of schema in the entity-relationship model
Relational Model
• It uses collection of tables to
represent data as well as relationship
among those data.
• Each table has multiple columns, each
column has unique name.
Other Models
Network Model
• Data are represented by collection of
records, and relationships among
those data are represented by links,
which are viewed as pointers.
• Records are organized as a collection
of arbitrary graph
Other Models Cont.
Hierarchical Model
• Data are represented by collection of
records, and relationships among
those data are represented by links,
which are viewed as pointers.
• Records are organized as a collection
of trees rather than arbitrary graph
Database Users
Users are differentiated by the way they
expect to interact with the system
•End Users
access to the database for querying, updating, and generating reports
Casual end users:
occasionally access the database
need different information each time
learn only a few facilities that they may use repeatedly.
use a sophisticated database query language to specify their requests
typically middle- or high-level managers or other occasional browsers
• Application programmers – interact with system
through DML calls
• Sophisticated users – form requests in a database
query language
Database Users
• Specialized users – write specialized database
applications that do not fit into the traditional
data processing framework
• Naïve users – invoke one of the permanent
application programs that have been written
previously
– E.g. people accessing database over the web,
bank tellers, clerical staff
Database Users
• System Analysts and Application Programmers
– Determine the requirements of end users, especially naive and
parametric end users, and develop specifications for canned
transactions that meet these requirements
– Application programmers implement these specifications as
programs; then they test, debug, document, and maintain these
canned transactions
• Workers behind the Scene
– Typically do not use the database for their own purposes
– DBMS system designers and implementers
– design and implement the DBMS modules (for implementing
the catalog, query language, interface processors, data access,
concurrency control, recovery, and security. ) and interfaces as
a software package
Database Users
• Tool developers
– Tools are optional packages that are often
purchased separately
– include packages for database design,
performance monitoring, natural language or
graphical interfaces, prototyping, simulation,
and test data generation.
• Operators and maintenance personnel
– system administration personnel who are
responsible for the actual running and
maintenance of the hardware and software
environment for the database system
Database Administrator
• Coordinates all the activities of the database
system; the database administrator has a good
understanding of the enterprise’s information
resources and needs.
• Database administrator's duties include:
–
–
–
–
–
–
Schema definition
Storage structure and access method definition
Schema and physical organization modification
Granting user authority to access the database
Specifying integrity constraints
Monitoring performance and responding to changes in
requirements
Database Actors
• Database Administrators
– In a database environment, the primary
resource is the database itself and the secondary
resource is the DBMS and related software
– authorizing access to the database
– coordinating and monitoring its use
– acquiring software and hardware resources as
needed
Database Actors
• Database Designers
– identifying the data to be stored in the database
– choosing appropriate structures to represent and store
this data undertaken before the database is actually
implemented and populated with data
– communicate with all prospective database users, in
order to understand their requirements
– develop a view of the database that meets the data and
processing requirements for each group of users
– These views are then analyzed and integrated with the
views of other user groups. The final database design
must be capable of supporting the requirements of all
user groups
Overall Database System
Structure
Query Processor Components
• DML Compiler – translates DML statements in a
query language into low level instructions
• DDL interpreter –
interprets DDL statements
and records them in a set of tables containing
metadata.
• Query evaluation engine –
executes low-level
instructions generated by DML compiler
Storage Manager Components
• Authorization & integrity manager –
tests
for satisfaction of integrity constraints and checks the
authority of user to access the data
• Transaction manager –
ensures the consistency
of the database despite system failures, and concurrent
transaction executions proceed with conflicting
• File Manager –
Manages the allocation of space on
disk storage and the data structures used to represent
information on disk
• Buffer Manager –
responsible for fetching data
from disk storage into main memory, and deciding what
data to catch in memory
Data structures are required as a part of
physical implementation
• Data files – stores database itself
• Data dictionary - stores meta data about
the structure of database
• Indices – which provides fast access to
data items that hold particular values
• Statistical data – which stores statistical
information about the data in the
database, used by strategy selector
DATABASE SYSTEM
ARCHITECTURE
Storage Management
• Storage manager is a program module that
provides the interface between the low-level
data stored in the database and the application
programs and queries submitted to the system.
• The storage manager is responsible to the
following tasks:
– Interaction with the file manager
– Efficient storing, retrieving and updating of data
• Issues:
– Storage access
– File organization
– Indexing and hashing
Transaction Management
• A transaction is a collection of operations that
performs a single logical function in a database
application
• Transaction-management component ensures
that the database remains in a consistent
(correct) state despite system failures (e.g.,
power failures and operating system crashes)
and transaction failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions,
to ensure the consistency of the database.
Download