Information Resources Management April 17, 2001

Information Resources Management April 17, 2001 Agenda Administrivia  Database Architectures  Administrivia  Homework #8 Database Architectures Centralized  Client-Server  Parallel - single site  Distributed - multiple sites  Database Architectures Centralized Client-Server Distributed (Parallel) Function Data Centralized PC, Mini, or Mainframe  Single Database  Single Database Manager  One or More Users  Data and Function in One Place  Client-Server PCs to Mainframes to Minis  PC to PC  Mainframe to Mainframe  Use Desktop Processing Power  Better User Interface  Greater Functionality  Retain Centralized Control of Data  Client-Server: Basic Model Client Client Request Result Server Client Client Client Servers Supercomputer  Mainframe  Mini  PC Server   All retain all data Client-Server Architecture Data Function Thin Fat Client Server (Back-End) Client Client (Front-End) Functionality Presentation  I/O Processing  Validation  Business Rules  Application Logic  Data Management  Validation  Error Handling  “Thin” Client Presentation Services Only  Accept Input  Format Output  Display   Server does all processing “Fat” Client Presentation  Validation  Application Logic - Programs  Data Management  Send SQL to Server   Server is just DBMS “In Between” Client Client  Presentation  Some Application Logic  Server  Some Applicaton Logic  Data Management and Services  Benefits of Client-Server Use Local Processing Power  Better User Interface  Some Functionality if System Down  Use Sunk Costs of PCs  Support Reengineering  Support Intranets  Flexibility, Scalability, Customizeability  Challenges of Client-Server Cost of (Upgraded) PCs  Network Reliance  Distributing Application Updates  Management of Complex System  Problem Identification & Resolution  Application Partitioning  Other Client-Server Architectures Traditional is Two-Tiered (client-server)  Three-Tiered  Client-Application Server-DB Server  (PC - Mini - Mainframe)  (PC - PC Server - Mainframe)  Beyond Three  PC - PC Server - Web Server - Mini - Mainframe  Client-Server vs. Distributed  Client-Server: Application Distribution  Distributed: Data Distribution Often, “client-server” is used to refer to either application distribution or data distribution or both. Middleware  What if  Multiple databases (sources) need to be accessed from a single client?  Different kinds of clients?  Mix of clients and servers?  Want to take advantage of existing base of applications (legacy systems)? Middleware Fat Clients just send SQL transactions  Other types of transactions may be needed based on the server (system)  Middleware Software that shields applications from the complexity of the operating environment. Client Client Client Middleware System System (Legacy) (Legacy) Types of Middleware Transaction Process (TP) Monitor  Database Middleware  Remote Procedure Call (RPC)  Message-Oriented Middleware (MOM)  Object-Request Brokers  (CORBA - ORB)  TP Monitor Synchronous - sender must wait  Queuing  Message Delivery  Insured Delivery  Either Direction  Database Middleware Variety of Clients/Platforms  Variety of Servers/DBMSs/Platforms  Specific to DB transactions (SQL)  Message-Oriented Middleware (MOM) Asynchronous - clients do not wait  Queues & Queue Management/Recovery  Message Delivery  Insured Delivery  Either Direction  (like email or EDI only transactions) Advantages of Middleware Leverage sunk costs (legacy systems)  Reduce development cost  Reduce development time  Increase responsiveness  Improve overall systems management  Consolidate diffuse information  Challenges of Middleware Cost  Session management - Transaction state  Security  Network reliance  Diversity of systems - lack of standards  Constant technology change  Availability of talent  Middleware Management  Parallel and Distributed  Client-Server is an attempt to improve performance Reduce time to execute a transaction  Parallel  Reduce time to get the data  Distributed  Parallel Systems Single site for data  Very Large databases  Operations performed simultaneously  Parallel Database Architecures Shared Memory  Shared Disk  Shared Nothing  Hierarchical  Shared Memory P P P M Shared Memory Advantages  Extremely efficient communications  Disadvantages  Max of 32/64 processors  Bus becomes bottleneck  Shared Disk M P M P M P Shared Disk Advantages  No bus bottleneck  Fault tolerance provided  Disadvantages  Disk access becomes bottleneck  Shared Nothing M P P M P M Shared Nothing Advantages  No disk bottleneck  Highly scaleable  Disadvantages  High communication overhead/cost  Between processors  To another processor’s data  Hierarchical P M P M P P P M Hierarchical Advantages  Best of all worlds  Disadvantages  Worst of all worlds  Some high communcation overhead/cost  Between subsystems  Complexity  Distributed Databases  Client-Server - distribute functionality  What about distributing data? Distributed Databases Overview  Distributed Storage  Distributed Queries  Distributed Transactions  Multidatabase (Middleware)  Distributed Databases Multiple locations  Single logical database  Several physical databases  Network connections  Advantages Sharing across locations  Local control  Availability  Challenges Development costs  People & Equipment  Testing  Problem identification & resolution  Technical expertise  Network dependence  Increased processing overhead  Distributed Data Storage Replication  Fragmentation  Both  Replication Data is repeated  Spectrum of options available  Temporary replication of specific rows  Replicate infrequently changed data  Replicate by site  Central site - all / each local site their data only  Full replication  Everything everywhere  Concerns with Replication Availability needed  Amount of parallelism in reads  Overhead of updates  Keeping replicas updated  Conflicting updates  Fragmentation Partitioning  Divide data into subsets based on need  Have to be able to pull back together to get original tables  Fragmentation Horizontal  by rows  specified conditions  Vertical  by column  each requires primary key (or created key)  Mixed  by row and column  Fragmentation & Replication  Repeat as necessary:  Replicate fragments  Fragment replicas  Don’t lose track of what you have and where it is! Network Transparency  Distributing data should not require that the user know where or how it’s been distributed.  The database should be seen as a single entity no matter how fragmented and replicated it becomes. Network Transparency  Some DBMSs are starting to provide this level of functionality so transparency exists even at the program level, but in many cases this “transparency” must be programmed into the applications.  It must always be designed into the database. Distributed Queries  How do you query data that is everywhere? Effeciency vs. Overhead Splitting the query apart  Keeping track of the data/locations  Making sure everything gets executed  Putting the results back together  Generating network traffic  Handling partial results  Distributed Queries  Full replication can avoid the overhead  Huge increase in update overhead  Parallel execution no longer possible  Additional costs of replication Example 5 sites - NY, Pgh, Chicago, Dallas, Los Angeles  Data fragmented by site - no replication   Query (in Pgh): SELECT Name, Max (Salary) from Employee Option 1 - High Bandwidth 1. Have all sites send their full employee tables to Pgh. 2. Build a temporary employee table. 3. Run the query against this table. Option 2 Not so High Bandwidth 1. Examine the query and determine it can be run separately at each location and the results combined. 2. Submit just the query to each location. 3. Wait for the results from each city. 4. As results return, build a temporary table (5 rows only). 5. Find the max using the temporary table. Distributed Transactions Transaction Types  Coordinators  Commit Protocols  Concurrency Controls  Deadlocks  Transaction Types Local - transaction only needs local data  Global - transaction uses non-local data   My global becomes someone else’s local  Either type of transaction must still have ACID properties - global is the concern System Structure  Things to do: 1. Process local transactions (transaction manager) 2. Process and track global transactions (transaction coordinator) Global Processing 1. Recognize as global 2. Break up transaction 3. Distribute pieces 4. Assemble results 5. Coordinate termination 6. Handle problems Coordinator of Coordinators Coordinate among sites  Detect problems  Attempt to fix  Share status with others  Coordinator Failure Backup Coordinator  receives all messages - maintains state  monitors coordinator  automatically takes over if coordinator down  avoids delays - increases overhead  Election  highest pre-assigned number  Commit Protocols Two-Phase  Three-Phase  All sites must commit or all sites have to rollback  Replicated data only  Two-Phase Commit Phase 1  Send PREPARE to all sites  Sites respond READY or ABORT  Phase 2  If all sites READY,  COMMIT locally - Send COMMITs  If not READY or time expires  ROLLBACK locally - Send ROLLBACK  Two-Phase Commit Coordinator Site Site Site requests commit Site Two-Phase Commit Phase 1 Coordinator Site Site Site Send PREPARE - all sites Two-Phase Commit Phase 1 Coordinator Site Site Sites respond READY Site Two-Phase Commit Phase 2 Coordinator Site Site COMMIT locally Site Two-Phase Commit Phase 2 Coordinator Site Site Send COMMIT - all sites Site Two-Phase Commit Phase 1 Coordinator Site Site Site responds ABORT or does not respond Site Two-Phase Commit Phase 2 Coordinator Site Site ROLLBACK locally Site Two-Phase Commit Phase 2 Coordinator Site Site Site Send ROLLBACK - all sites Site Failure - Recovery COMMIT and ROLLBACK as normal  If READY only  Check with coordinator or other sites  Either COMMIT or ROLLBACK  If no one found, ROLLBACK  Coordinator Failure Ask the sites  If one has COMMIT, then REDO  If one has ROLLBACK, then UNDO  If one doesn’t have READY, UNDO  If all READY only  Coordinator must decide  Sites must wait and locks are held  “Blocking” occurs  Three-Phase Commit    Phase 1  Sent PREPARE  Sites respond READY or ABORT Phase 2  If all sites READY, send PRECOMMIT  Else, ROLLBACK  Sites must ACKNOWLEDGE Phase 3  If at least K sites ACKNOWLEDGE, send COMMIT Coordinator Failure Three-Phase Commit prevents blocking  If coordinator fails  New coordinator is selected  Sites queried to determine status  New coordinator resumes  Network Partitioning Network split creates two separate networks  Each “half” selects a coordinator  Coordinators make independent decisions  Result could be different decisions  Resolution of network problem may create need to resolve database problems  Concurrency Control Single Lock Manager  Multiple Lock Managers  Single Lock Manager  One site for all locking  All other sites must go to it  Can read from anywhere  Updates must be to all copies  Advantages: Simple, Easy deadlock detection Disadvantages: Bottleneck, Vulnerability  Simple Multiple Lock Mgrs  Each site locks a unique partition of the data  non-replicated data  Advantages: Fairly simple, reduced bottlenecks Disadvantages: Complicated deadlock detection  Majority Protocol      Each site locks its own data  replication possible Request owner for lock on data that isn’t local When multiple owners, n/2 + 1 (majority) must provide the lock Advantages: No bottlenecks Disadvantages: More messages sent, Complicated deadlock detection, More deadlocks (each gets 1/2) Biased Protocol Reduced form of Majority Protocol  For a READ, only need any single lock  For a WRITE, need all locks    Advantages: No bottle necks, Reduced traffic Disadvantages: Update traffic, Deadlocks Primary Copy Site designated to hold “primary” copy  Multiple sites  Replicated Data  All locks through that site    Advantages: Fairly simple, reduced bottlenecks Disadvantages: Vulnerability, Complicated deadlock detection Other Than Locking  Timestamps  Centralized generation  Local generation  Timestamp tests determine ability to read or write Deadlocks & Distributed Data Centralized  One Site  Distributed   Centralized - same advantages and disadvantages as other centralized control (database or locking) Distributed Deadlock Detection    Each site tracks all transactions accessing its own data Dummy transaction for transactions that originated here but are executing elsewhere If deadlock found that includes dummy transaction  Must send deadlock information to other sites  They check for deadlock  May have to pass on to another site Homework #9 Continuuing with the Carnegie Library  Client/Server  Distrributed Database 

Information Resources Management April 17, 2001

Related documents

Products

Support

Information Resources Management April 17, 2001

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib