Uploaded by My Photos

Database management UNIT-1 JNTUK-UCEV

advertisement
Database Management Systems
What Is a Database System?
 Database:
A very large, integrated collection of data (logically related).
 Models a real-world enterprise
 Entities (e.g., teams, games)
 Relationships
(e.g., The Forty-Niners are playing in The Superbowl)
 More recently, also includes active components , often called
“business logic”. (e.g., the BCS ranking system)
 A Database Management System (DBMS) is a software system designed
to store, manage, and facilitate access to databases.
 Database System: DBMS + data (+ applications)
1.2
Database : Applications
 Other examples of database applications can be:
 Purchases from the supermarket.
 Purchasing using your credit card.
 Booking a holiday at the travel agent.
 Using the local library.
 Using the internet.
 Studying at university.
1.3
Database Systems: Then
1.4
Database Systems: Today
1.5
From Friendster.com on-line tour
Other databases you may use
1.6
=

Is the WWW a DBMS?
Fairly sophisticated search available
 crawler indexes pages on the web
 Keyword-based search for pages

But, currently
 data is mostly unstructured and untyped
 search only:
 can’t modify the data
 can’t get summaries, complex combinations of data
 few guarantees provided for freshness of data, consistency across data items, fault
tolerance, …
 Web sites typically have a DBMS in the background to provide these functions.

The picture is changing
 New standards e.g., XML, Semantic Web can help data modeling
 Research groups (e.g., at Berkeley) are working on providing some of this
functionality across multiple web sites.
1.7
“Search” vs. Query
 What if you wanted to
find
out
which
actors
donated to John Kerry’s
presidential campaign?
 Try “actors donated to
john
kerry”
in
your
favorite search engine.
1.8
A “Database Query” Approach
1.9
Is a File System a DBMS?
=
 Thought Experiment 1:
 You and your project partner are editing the same file.
 You both save it at the same time.
 Whose changes survive?
A) Yours
B) Partner’s
C) Both
D) Neither
E) ???
•Thought Experiment 2:
–You’re updating a file.
Q: How do you write programs
–The power goes out.
over
B) None
subsystem
when
promises you only “???” ?
–Which of your changes survive?
A) All
a
A: Very, very carefully!!
C) All Since Last Save
1.10
D) ???
it
Traditional File Processing System
 File-based system is a collection of application programs that perform
services for the end-users such as the production of reports.
 Each program defines and manages its own data.
 Early attempt to computerize the manual filing system.
 Files in cabinet and locks for security.
 For searching we may have indexing system that helps locate what
we want quickly.
 Works well
 While number of items to be stored is small.
 For only storage or retrieval functionality of large number of items.
1.11
Traditional File Processing System
 The manual system becomes more inefficient while processing the
information in the files.
 Typical real estate agent’s office holds two separate files:
 File for each property for sale or rent.
 File for each buyer and renter, and each member of staff.
1.12
Traditional File Processing System
Contract
Department
Sales
Department
1.13
Traditional File Processing System
 Consider the efforts that would be required to answer the following
questions:
 What three-bedroom properties do you have for sale with a garden
and garage?
 What flats do you have for rent within three miles of the city center?
 What is the average rent for a two-bedroom flat?
 What is the total annual salary bill for staff?
 How does last month’s turnover compare with the projected figure for
this month?
 What is the expected monthly turnover for the next financial year?
1.14
File-Based Approach
 The file-based system was developed in response to the needs of industry
for more efficient data access.
 Based on decentralized approach, where each department, with the
assistance of Data Processing (DP) staff, stored and controlled its
own data.
 Consider the DreamHome example.
1.15
File-Based Approach
1.16
File-Based Approach
 Significant amount of duplication of data.
 Before to discuss the limitations, it is useful to understand the terminology
used in file-based systems.
 A file is simply a collection of records, which contains logically related
data.
 For example, the PropertyForRent file contains six records, one for
each property.
 Each record contains a logically connected set of one or more fields.
 Each field represents some characteristics of the real-world object
that is being modeled.
1.17
Limitations of File-Based Approach
 Separation and isolation of data
 Each program maintains its own set of data.
 Users of one program may be unaware of potentially useful data held
by other programs.
 Duplication of data
 Decentralized approach taken by each department.
 Same data is held by different programs.
 Wasted space, money and time and perhaps more importantly data
integrity; in other words data consistency.
1.18
Limitations of File-Based Approach
 Data Dependence
 File structure is defined in the program code.
 Also known as a Program-Data dependence.
 Incompatible file formats
 Programs are written in different languages, and so cannot easily
access each other’s files.
 Fixed Queries/Proliferation of application programs
 Programs are written to satisfy particular functions.
 Any new requirement needs a new program.
1.19
Database Approach
 All the above limitations of file-based approach can be listed as:
 The definition of the data is embedded in the application programs,
rather than being stored separately and independently.
 There is no control over the access and manipulation of data outside
that forced by the application programs.
 The above limitations were overcome with the new approach called
database approach (database and DBMS).
1.20
Current Commercial Outlook
 A major part of the software industry:
 Oracle, IBM, Microsoft, Sybase
 also Informix (now IBM), Teradata
 smaller players: java-based dbms, devices, OO, …
 Well-known benchmarks (esp. TPC)
 Lots of related industries
 data warehouse, document management, storage, backup, reporting,
business intelligence, app integration
 Relational products dominant and evolving
 adapting for extensibility (user-defined types), adding native XML
support.
 Open Source coming on strong
 MySQL, PostgreSQL, BerkeleyDB
1.21
?
Why Study Databases??
 Shift from computation to information
 always true for corporate computing
 Web made this point for personal computing
 more and more true for scientific computing
 Need for DBMS has exploded in the last years
 Corporate: retail swipe/clickstreams, “customer relationship mgmt”,
“supply chain mgmt”, “data warehouses”, etc.
 Scientific: digital libraries, Human Genome project, NASA Mission
to Planet Earth, physical sensors, grid physics network
 DBMS encompasses much of CS in a practical discipline
 OS, languages, theory, AI, multimedia, logic
 Yet traditional focus on real-world apps
1.22
What’s the intellectual content?
 representing information
 data modeling
 languages and systems for querying data
 complex queries with real semantics*
 over massive data sets
 concurrency control for data manipulation
 controlling concurrent access
 ensuring transactional semantics
 reliable data storage
 maintain data semantics even if you pull the plug
* semantics: the meaning or relationship of meanings of a sign or set of signs
1.23
Files Vs DBMS
 Applications must stage large datasets between main memory
and secondary storage ( e.g., buffering, page oriented access,
32-bit addressing, etc. )
 Special code for different queries
 Must protect data from inconsistency due to multiple concurrent
users
 Crash recovery
 Security and access control.
1.24
Why Databases??
 Why not store everything on flat files: use the file system of the
OS, cheap/simple…
Name, Course, Grade
John Smith, CS112, B
Mike Stonebraker, CS234, A
Jim Gray, CS560, A
John Smith, CS560, B+
…………………
 Yes, but not scalable…
1.25
Problem 1
 Data redundancy and inconsistency
 Multiple file formats, duplication of information in different files
Name, Course, Email, Grade
John Smith, js@cs.bu.edu, CS112, B
Mike Stonebraker, ms@cs.bu.edu, CS234, A
Jim Gray, CS560, jg@cs.bu.edu, A
John Smith, CS560, js@cs.bu.edu, B+
Why this a problem?
 Wasted space
 Potential inconsistencies (multiple formats, John
Smith vs Smith J.)
1.26
Problem 2
 Data retrieval:
 Find the students who took CSE
 Find the students with Percentage > 50
For every query we need to write a program!
 We need the retrieval to be:
 Easy to write
 Execute efficiently
1.27
Problem 3
 Data Integrity
 No support for sharing:
 Prevent simultaneous modifications
 No coping mechanisms for system crashes
 No means of Preventing Data Entry Errors (checks must be hard-coded
in the programs)
 Security problems
 Database systems offer solutions to all the above problems
1.28
Benefits of the Database Approach
 The data can be shared
 Redundancy can be reduced
 Inconsistency can be avoided
 Transaction support can be provided
 Integrity can be maintained
 Security can be enforced
 Conflicting requirements can be balanced
 Standards can be enforced
1.29
Data Organization
 Physical level or Internal level or storage view : describes how a
record (e.g., customer) is stored.
 Conceptual or Logical level or community user view: describes
data stored in database, and the relationships among the data in
terms of data models of the DBMS.
type customer = record
name : string;
street : string;
city : integer;
end;
 Also, External (View) level: application programs hide details of
data types.
Views can also hide information (e.g., salary) for
security purposes.
1.30
View of Data
A logical architecture for a database system
1.31
Example
1.32
Levels of Abstraction
Users
 Views describe how users
see the data.
 Conceptual schema defines
View 1
logical structure
the files and indexes used.
called
View 3
Conceptual Schema
 Physical schema describes
 (sometimes
View 2
Physical Schema
the
DB
ANSI/SPARC model)
1.33
Example: University Database
 Conceptual schema:
 Students(sid: string, name: string,
login: string, age: integer, gpa:real)
View 1
View 2
View 3
 Courses(cid: string, cname:string,
Conceptual Schema
credits:integer)
 Enrolled(sid:string, cid:string,
Physical Schema
grade:string)
 External Schema (View):
 Course_info(cid:string,enrollment:integer)
 Physical schema:
 Relations stored as unordered files.
 Index on first column of Students.
1.34
DB
Data Independence
 Applications insulated from how
View 1
data is structured and stored.
 Logical
data
View 2
View 3
independence:
Protection from changes in logical
Conceptual Schema
structure of data.
Physical Schema
 Physical
data
independence:
Protection from changes in physical
structure of data.
 Q:
Why
are
these
important for DBMS?
particularly
1.35
DB
Architechture
1.36
Database Schema
 Similar to types and variables in programming languages
 Schema – the structure of the database
 e.g., the database consists of information about a set of
customers and accounts and the relationship between them
 Analogous to type information of a variable in a program
 Physical schema: database design at the physical level
 Logical schema: database design at the logical level

Instance – the actual content of the database at a particular point in
time

Analogous to the value of a variable
1.37
Data Models
 Data Model: A set of concepts to describe the structure of a
database, and certain constraints that the database should obey.
 Data Models: a framework for describing
 data
 data relationships
 data semantics
 data constraints
 Data Model Operations: Operations for specifying database
retrievals and updates by referring to the concepts of the data
model. Operations on the data model may include basic
operations and user-defined operations.
1.38
Categories of data models
 Conceptual (high-level, semantic) data models: Provide concepts
that are close to the way many users perceive data. (Also called
entity-based or object-based data models.)
 Physical (low-level, internal) data models: Provide concepts that
describe details of how data is stored in the computer.
 Implementation (representational) data models: Provide concepts
that fall between the above two, balancing user views with some
computer storage details.
1.39
History of Data Models
 Relational Model:
proposed in 1970 by E.F. Codd (IBM), first
commercial system in 1981-82. Now in several commercial products
(DB2, ORACLE, SQL Server, SYBASE, INFORMIX).

Network Model: the first one to be implemented by Honeywell in 196465 (IDS System). Adopted heavily due to the support by CODASYL
(CODASYL - DBTG report of 1971). Later implemented in a large variety
of systems - IDMS (Cullinet - now CA), DMS 1100 (Unisys), IMAGE
(H.P.), VAX -DBMS (Digital Equipment Corp.).
 Hierarchical Data Model: implemented in a joint effort by IBM and North
American Rockwell around 1965. Resulted in the IMS family of systems.
The most popular model. Other system based on this model: System 2k
(SAS inc.)
1.40
Slide 2-40
History of Data Models
 Object-oriented Data Model(s): several models have been proposed for
implementing in a database system.
One set comprises models of
persistent O-O Programming Languages such as C++ (e.g., in
OBJECTSTORE or VERSANT), and Smalltalk (e.g., in GEMSTONE).
Additionally, systems like O2, ORION (at MCC - then ITASCA), IRIS (at
H.P.- used in Open OODB).
 Object-Relational Models: Most Recent Trend. Started with Informix
Universal Server. Exemplified in the latest versions of Oracle-10i, DB2,
and SQL Server etc. systems.
1.41
Hierarchical Model
1.42
Hierarchical Model
• Advantages:
• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically organized domains -
e.g., assemblies in manufacturing, personnel organization in companies
• Language is simple; uses constructs like GET, GET UNIQUE, GET
NEXT, GET NEXT WITHIN PARENT etc.
• Disadvantages:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records
• Little scope for "query optimization"
1.43
Network Model
1.44
Network Model
• Advantages:
• Network Model is able to model complex relationships and represents
semantics of add/delete on the relationships.
• Can handle most situations for modeling using record types and relationship
types.
• Language is navigational; uses constructs like FIND, FIND member, FIND
owner, FIND NEXT within set, GET etc. Programmers can do optimal
navigation through the database.
• Disadvantages:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through a set of
records.
•
Little scope for automated "query optimization”
1.45
Entity-Relationship Model
Example of schema in the entity-relationship model
1.46
Entity Relationship Model (Cont.)
 E-R model of real world
 Entities (objects)
 E.g. customers, accounts, bank branch
 Relationships between entities
 E.g. Account A-101 is held by customer Johnson
 Relationship set depositor associates customers with accounts
 Widely used for database design
 Database design in E-R model usually converted to design in the
relational model (coming up next) which is used for storage and
processing
1.47
Relational Model
Attributes
 Example of tabular data in the relational model
Customer-id
customername
192-83-7465
Johnson
019-28-3746
Smith
192-83-7465
Johnson
321-12-3123
Jones
019-28-3746
Smith
customerstreet
customercity
accountnumber
Alma
Palo Alto
A-101
North
Rye
A-215
Alma
Palo Alto
A-201
Main
Harrison
A-217
North
Rye
A-201
1.48
Data Independence
 Applications insulated from how data is structured and stored.
 Logical data independence: Protection from changes in logical
structure of data.
 Physical data independence: Protection from changes in physical
structure of data.
 One of the most important benefits of using a DBMS!
1.49
Data Storage
 Data Storage
Where can data be stored?
 Main memory
 Secondary memory (hard disks)
 Optical storage (DVDs)
 Tertiary store (tapes)
 Move data? Determined by buffer manager
 Mapping data to files? Determined by file manager
1.50
Storage Management
 Storage manager is a program module that provides the interface
between the low-level data stored in the database and the application
programs and queries submitted to the system.
 The storage manager is responsible to the following tasks:
 Interaction with the OS file manager
 Efficient storing, retrieving and updating of data
 Issues:
 Storage access
 File organization
 Indexing and hashing
1.51
Database Architecture
(data organization)
DBA
DDL Commands
DDL Interpreter
File Manager
Buffer Manager
Storage Manager
Data
Secondary Storage
Metadata
Schema
1.52
Data retrieval
 Queries
Query = Declarative data retrieval
describes what data, not how to retrieve it
Ex. Give me the students with GPA > 3.5
vs
Scan the student file and retrieve the records with gpa>3.5
 Why?
1. Easier to write
2. Efficient to execute (why?)
1.53
Data retrieval
Query
Query Processor
Plan
Query Optimizer
Query Evaluator
Data
 Query Optimizer
“compiler” for queries (aka “DML Compiler”)
Plan ~ Assembly Language Program
Optimizer Does Better With Declarative Queries:
1. Algorithmic Query (e.g., in C) 1 Plan to choose from
2. Declarative Query (e.g., in SQL) n Plans to choose from
1.54
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
1.55
Query Processing (Cont.)
 Alternative ways of evaluating a given query
 Equivalent expressions
 Different algorithms for each operation
 Cost difference between a good and a bad way of evaluating a query
can be enormous
 Need to estimate the cost of operations
 Depends critically on statistical information about relations which
the database must maintain
 Need to estimate statistics for intermediate results to compute cost
of complex expressions
1.56
Data Definition Language (DDL)
 Specification notation for defining the database schema
 E.g.
create table account (
account-number
char(10),
balance
integer)
 DDL compiler generates a set of tables stored in a data dictionary
 Data dictionary contains metadata (i.e., data about data)

Database schema
 Data storage and definition language
 language in which the storage structure and access methods
used by the database system are specified
 Usually an extension of the data definition language
1.57
Data Manipulation Language (DML)
 Language for accessing and manipulating the data organized by the
appropriate data model
 DML also known as query language
 Two classes of languages
 Procedural – user specifies what data is required and how to get
those data
 Nonprocedural – user specifies what data is required without
specifying how to get those data
 SQL is the most widely used query language
1.58
SQL
 SQL: widely used (declarative) non-procedural language
 E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-name
from
customer
where customer.customer-id = ‘192-83-7465’
 E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from
depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
 Procedural languages: C++, Java, relational algebra
1.59
Data retrieval:
Indexing
 How to answer fast the query: “Find the student with SID = 101”?
 One approach is to scan the student table, check every student, retrurn
the one with id=101… very slow for large databases
 Any better idea?
1st keep student record over the SID. Do a binary search…. Updates…
2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the
same time keep the records sorted! In databases we use the B+-tree (multiway
search tree)
3rd Use a hash table. Much faster for exact match queries… but cannot support
Range queries. (Also, special hashing schemes are needed for dynamic data)
1.60
1.61
180
200
150
156
179
120
130
100
101
110
30
35
3
5
11
180
150
100
30
120
B+Tree Example
B=4
Root
Database Users
 Users are differentiated by the way they expect to interact with the system
 Application programmers – interact with system through DML calls
 Sophisticated users – form requests in a database query language
 Specialized users – write specialized database applications that do not fit
into the traditional data processing framework
 Naïve users – invoke one of the permanent application programs that
have been written previously
 E.g. people accessing database over the web, bank tellers, clerical
staff
1.62
Database Administrator
 Coordinates all the activities of the database system; the database
administrator has a good understanding of the enterprise’s information
resources and needs.
 Database administrator's duties include:
 Schema definition
 Storage structure and access method definition
 Schema and physical organization modification
 Granting user authority to access the database
 Specifying integrity constraints
 Acting as liaison with users
 Monitoring performance and responding to changes in requirements
1.63
Database Architecture
(data retrieval)
DB Programmer
User
Code w/ embedded queries
DBA
Query
DDL Commands
Query Optimizer
DML Precompiler
Query Evaluator
Query Processor
File Manager
Storage Manager
Buffer Manager
Secondary Storage
Indices
Data
Statistics
Metadata
Schema
1.64
DDL Interpreter
Data Integrity
Transaction processing
 Why Concurrent Access to Data must be Managed?
John and Jane withdraw $50 and $100 from a common
account…
John:
Jane:
1. get balance
1. get balance
2. if balance > $50
2. if balance > $100
3. balance = balance - $50
3. balance = balance - $100
4. update balance
4. update balance
Initial balance $300. Final balance=?
It depends…
1.65
Transaction
 An execution of a DB program
 Key concept is transaction, which is an atomic sequence of database
actions (reads/writes).
 ACID properties
 A – Atomicity
 C – Consistency
 I – Isolation
 D – Durability
 How: log and concurrency control sub-system
1.66
Data Integrity
Recovery
Transfer $50 from account A ($100) to account B ($200)
1. get balance for A
2. If balanceA > $50
3. balanceA = balanceA – 50
4.Update balanceA in database
System crashes….
5. Get balance for B
6. balanceB = balanceB + 50
7. Update balanceB in database
Recovery management
1.67
Transaction Management
 What if the system fails?
 What if more than one user is concurrently updating the same data?
 A transaction is a collection of operations that performs a single
logical function in a database application
 Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures (e.g.,
power failures and operating system crashes) and transaction failures.
 Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the database.
1.68
Database Architecture
DB Programmer
DBA
User
Code w/ embedded queries
DDL Commands
Query
Query Optimizer
DML Precompiler
Query Evaluator
Query Processor
File Manager
Transaction Manager
Recovery Manager
Buffer Manager
Storage Manager
Secondary Storage
DDL Interpreter
Indices
Data
Metadata
Integrity Constraints
Statistics
Schema
1.69
Client /Server Architecture
 The overall purpose of a database system
is
to
support
the
development
and
execution of database applications.
 From
high-level point of view, such
system can be regarded as having simple
two-part structure consisting of a Server
called the back end, and a set of clients
also called the front ends.
 The server is DBMS itself. It supports all
basic DBMS functions like
 Data Definition
 Data manipulation
 Data Security and integrity and etc.
1.70
Client /Server Architecture
 The clients are various applications that run on the top of DBMS- both
user written applications and built-in applications
 User written applications- regular application programs written either in
a conventional 3GL like C++, COBOL or in some proprietary 4GL.
 Vendor provided applications (tools)- applications whose basic
purpose is to assist in the creation and execution of other applications.
Ex: Query language processors, Report writers, Business graphic sub
systems,
spreadsheets,
statistical
packages,
Natural
language
processors, data extract tools, application generators, other application
tools
including computer-aided software engineering (CASE)
products, Data mining and visualization tools.
1.71
Database Applications
 Banking: all transactions
 Airlines: reservations, schedules
 Universities: registration, grades
 Sales: customers, products, purchases
 Manufacturing: production, inventory, orders, supply chain
 Human resources: employee records, salaries, tax deductions
 Databases touch all aspects of our lives
1.72
Download