Information Resources Management April 17, 2001

advertisement
Information Resources
Management
April 17, 2001
Agenda
Administrivia
 Database Architectures

Administrivia

Homework #8
Database Architectures
Centralized
 Client-Server
 Parallel - single site
 Distributed - multiple sites

Database Architectures
Centralized
Client-Server
Distributed
(Parallel)
Function
Data
Centralized
PC, Mini, or Mainframe
 Single Database
 Single Database Manager
 One or More Users
 Data and Function in One Place

Client-Server
PCs to Mainframes to Minis
 PC to PC
 Mainframe to Mainframe
 Use Desktop Processing Power
 Better User Interface
 Greater Functionality
 Retain Centralized Control of Data

Client-Server: Basic Model
Client
Client
Request
Result
Server
Client
Client
Client
Servers
Supercomputer
 Mainframe
 Mini
 PC Server


All retain all data
Client-Server Architecture
Data
Function
Thin
Fat
Client
Server
(Back-End)
Client
Client
(Front-End)
Functionality
Presentation
 I/O Processing
 Validation
 Business Rules
 Application Logic
 Data Management
 Validation
 Error Handling

“Thin” Client
Presentation Services Only
 Accept Input
 Format Output
 Display


Server does all processing
“Fat” Client
Presentation
 Validation
 Application Logic - Programs
 Data Management
 Send SQL to Server


Server is just DBMS
“In Between” Client
Client
 Presentation
 Some Application Logic
 Server
 Some Applicaton Logic
 Data Management and Services

Benefits of Client-Server
Use Local Processing Power
 Better User Interface
 Some Functionality if System Down
 Use Sunk Costs of PCs
 Support Reengineering
 Support Intranets
 Flexibility, Scalability, Customizeability

Challenges of Client-Server
Cost of (Upgraded) PCs
 Network Reliance
 Distributing Application Updates
 Management of Complex System
 Problem Identification & Resolution
 Application Partitioning

Other Client-Server
Architectures
Traditional is Two-Tiered (client-server)
 Three-Tiered
 Client-Application Server-DB Server
 (PC - Mini - Mainframe)
 (PC - PC Server - Mainframe)
 Beyond Three
 PC - PC Server - Web Server - Mini
- Mainframe

Client-Server vs. Distributed

Client-Server: Application Distribution

Distributed: Data Distribution
Often, “client-server” is used to refer to
either application distribution or data
distribution or both.
Middleware

What if
 Multiple databases (sources) need to
be accessed from a single client?
 Different kinds of clients?
 Mix of clients and servers?
 Want to take advantage of existing
base of applications (legacy
systems)?
Middleware
Fat Clients just send SQL transactions
 Other types of transactions may be
needed based on the server (system)

Middleware
Software that shields applications from the
complexity of the operating environment.
Client
Client
Client
Middleware
System
System
(Legacy)
(Legacy)
Types of Middleware
Transaction Process (TP) Monitor
 Database Middleware
 Remote Procedure Call (RPC)
 Message-Oriented Middleware (MOM)
 Object-Request Brokers
 (CORBA - ORB)

TP Monitor
Synchronous - sender must wait
 Queuing
 Message Delivery
 Insured Delivery
 Either Direction

Database Middleware
Variety of Clients/Platforms
 Variety of Servers/DBMSs/Platforms
 Specific to DB transactions (SQL)

Message-Oriented
Middleware (MOM)
Asynchronous - clients do not wait
 Queues & Queue
Management/Recovery
 Message Delivery
 Insured Delivery
 Either Direction

(like email or EDI only transactions)
Advantages of Middleware
Leverage sunk costs (legacy systems)
 Reduce development cost
 Reduce development time
 Increase responsiveness
 Improve overall systems management
 Consolidate diffuse information

Challenges of Middleware
Cost
 Session management - Transaction
state
 Security
 Network reliance
 Diversity of systems - lack of standards
 Constant technology change
 Availability of talent
 Middleware Management

Parallel and Distributed

Client-Server is an attempt to improve
performance
Reduce time to execute a transaction
 Parallel
 Reduce time to get the data
 Distributed

Parallel Systems
Single site for data
 Very Large databases
 Operations performed simultaneously

Parallel Database
Architecures
Shared Memory
 Shared Disk
 Shared Nothing
 Hierarchical

Shared Memory
P
P
P
M
Shared Memory
Advantages
 Extremely efficient communications
 Disadvantages
 Max of 32/64 processors
 Bus becomes bottleneck

Shared Disk
M
P
M
P
M
P
Shared Disk
Advantages
 No bus bottleneck
 Fault tolerance provided
 Disadvantages
 Disk access becomes bottleneck

Shared Nothing
M
P
P
M
P
M
Shared Nothing
Advantages
 No disk bottleneck
 Highly scaleable
 Disadvantages
 High communication overhead/cost
 Between processors
 To another processor’s data

Hierarchical
P
M
P
M
P
P
P
M
Hierarchical
Advantages
 Best of all worlds
 Disadvantages
 Worst of all worlds
 Some high communcation
overhead/cost
 Between subsystems
 Complexity

Distributed Databases

Client-Server - distribute functionality

What about distributing data?
Distributed Databases
Overview
 Distributed Storage
 Distributed Queries
 Distributed Transactions
 Multidatabase (Middleware)

Distributed Databases
Multiple locations
 Single logical database
 Several physical databases
 Network connections

Advantages
Sharing across locations
 Local control
 Availability

Challenges
Development costs
 People & Equipment
 Testing
 Problem identification & resolution
 Technical expertise
 Network dependence
 Increased processing overhead

Distributed Data Storage
Replication
 Fragmentation
 Both

Replication
Data is repeated
 Spectrum of options available
 Temporary replication of specific rows
 Replicate infrequently changed data
 Replicate by site
 Central site - all / each local site their data only
 Full replication
 Everything everywhere

Concerns with Replication
Availability needed
 Amount of parallelism in reads
 Overhead of updates
 Keeping replicas updated
 Conflicting updates

Fragmentation
Partitioning
 Divide data into subsets based on need
 Have to be able to pull back together to
get original tables

Fragmentation
Horizontal
 by rows
 specified conditions
 Vertical
 by column
 each requires primary key (or created
key)
 Mixed
 by row and column

Fragmentation & Replication

Repeat as necessary:
 Replicate fragments
 Fragment replicas

Don’t lose track of what you have and
where it is!
Network Transparency

Distributing data should not require that
the user know where or how it’s been
distributed.

The database should be seen as a
single entity no matter how fragmented
and replicated it becomes.
Network Transparency

Some DBMSs are starting to provide
this level of functionality so
transparency exists even at the program
level, but in many cases this
“transparency” must be programmed
into the applications.

It must always be designed into the
database.
Distributed Queries

How do you query data that is
everywhere?
Effeciency vs. Overhead
Splitting the query apart
 Keeping track of the data/locations
 Making sure everything gets executed
 Putting the results back together
 Generating network traffic
 Handling partial results

Distributed Queries

Full replication can avoid the overhead
 Huge increase in update overhead
 Parallel execution no longer possible
 Additional costs of replication
Example
5 sites - NY, Pgh, Chicago, Dallas, Los
Angeles
 Data fragmented by site - no replication


Query (in Pgh):
SELECT Name, Max (Salary) from
Employee
Option 1 - High Bandwidth
1. Have all sites send their full employee
tables to Pgh.
2. Build a temporary employee table.
3. Run the query against this table.
Option 2 Not so High Bandwidth
1. Examine the query and determine it
can be run separately at each location
and the results combined.
2. Submit just the query to each location.
3. Wait for the results from each city.
4. As results return, build a temporary
table (5 rows only).
5. Find the max using the temporary
table.
Distributed Transactions
Transaction Types
 Coordinators
 Commit Protocols
 Concurrency Controls
 Deadlocks

Transaction Types
Local - transaction only needs local data
 Global - transaction uses non-local data


My global becomes someone else’s
local

Either type of transaction must still have
ACID properties - global is the concern
System Structure

Things to do:
1. Process local transactions
(transaction manager)
2. Process and track global transactions
(transaction coordinator)
Global Processing
1. Recognize as global
2. Break up transaction
3. Distribute pieces
4. Assemble results
5. Coordinate termination
6. Handle problems
Coordinator of Coordinators
Coordinate among sites
 Detect problems
 Attempt to fix
 Share status with others

Coordinator Failure
Backup Coordinator
 receives all messages - maintains
state
 monitors coordinator
 automatically takes over if coordinator
down
 avoids delays - increases overhead
 Election
 highest pre-assigned number

Commit Protocols
Two-Phase
 Three-Phase

All sites must commit or all sites have to
rollback
 Replicated data only

Two-Phase Commit
Phase 1
 Send PREPARE to all sites
 Sites respond READY or ABORT
 Phase 2
 If all sites READY,
 COMMIT locally - Send COMMITs
 If not READY or time expires
 ROLLBACK locally - Send
ROLLBACK

Two-Phase Commit
Coordinator
Site
Site
Site requests commit
Site
Two-Phase Commit Phase 1
Coordinator
Site
Site
Site
Send PREPARE - all sites
Two-Phase Commit Phase 1
Coordinator
Site
Site
Sites respond READY
Site
Two-Phase Commit Phase 2
Coordinator
Site
Site
COMMIT locally
Site
Two-Phase Commit Phase 2
Coordinator
Site
Site
Send COMMIT - all sites
Site
Two-Phase Commit Phase 1
Coordinator
Site
Site
Site responds ABORT or
does not respond
Site
Two-Phase Commit Phase 2
Coordinator
Site
Site
ROLLBACK locally
Site
Two-Phase Commit Phase 2
Coordinator
Site
Site
Site
Send ROLLBACK - all sites
Site Failure - Recovery
COMMIT and ROLLBACK as normal
 If READY only
 Check with coordinator or other sites
 Either COMMIT or ROLLBACK
 If no one found, ROLLBACK

Coordinator Failure
Ask the sites
 If one has COMMIT, then REDO
 If one has ROLLBACK, then UNDO
 If one doesn’t have READY, UNDO
 If all READY only
 Coordinator must decide
 Sites must wait and locks are held
 “Blocking” occurs

Three-Phase Commit



Phase 1
 Sent PREPARE
 Sites respond READY or ABORT
Phase 2
 If all sites READY, send PRECOMMIT
 Else, ROLLBACK
 Sites must ACKNOWLEDGE
Phase 3
 If at least K sites ACKNOWLEDGE, send
COMMIT
Coordinator Failure
Three-Phase Commit prevents blocking
 If coordinator fails
 New coordinator is selected
 Sites queried to determine status
 New coordinator resumes

Network Partitioning
Network split creates two separate
networks
 Each “half” selects a coordinator
 Coordinators make independent
decisions
 Result could be different decisions
 Resolution of network problem may
create need to resolve database
problems

Concurrency Control
Single Lock Manager
 Multiple Lock Managers

Single Lock Manager

One site for all locking
 All other sites must go to it
 Can read from anywhere
 Updates must be to all copies

Advantages: Simple, Easy deadlock
detection
Disadvantages: Bottleneck, Vulnerability

Simple Multiple Lock Mgrs

Each site locks a unique partition of the
data
 non-replicated data

Advantages: Fairly simple, reduced
bottlenecks
Disadvantages: Complicated deadlock
detection

Majority Protocol





Each site locks its own data
 replication possible
Request owner for lock on data that isn’t local
When multiple owners, n/2 + 1 (majority)
must provide the lock
Advantages: No bottlenecks
Disadvantages: More messages sent,
Complicated deadlock detection, More
deadlocks (each gets 1/2)
Biased Protocol
Reduced form of Majority Protocol
 For a READ, only need any single lock
 For a WRITE, need all locks



Advantages: No bottle necks, Reduced traffic
Disadvantages: Update traffic, Deadlocks
Primary Copy
Site designated to hold “primary” copy
 Multiple sites
 Replicated Data
 All locks through that site



Advantages: Fairly simple, reduced
bottlenecks
Disadvantages: Vulnerability, Complicated
deadlock detection
Other Than Locking

Timestamps
 Centralized generation
 Local generation

Timestamp tests determine ability to
read or write
Deadlocks & Distributed Data
Centralized
 One Site
 Distributed


Centralized - same advantages and
disadvantages as other centralized
control (database or locking)
Distributed Deadlock
Detection



Each site tracks all transactions accessing its
own data
Dummy transaction for transactions that
originated here but are executing elsewhere
If deadlock found that includes dummy
transaction
 Must send deadlock information to other
sites
 They check for deadlock
 May have to pass on to another site
Homework #9
Continuuing with the Carnegie Library
 Client/Server
 Distrributed Database

Download