Database Management Systems Chapter 10 Distributed Databases and the Internet Jerry Post McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved. D A T A B A S E Distributed Databases SELECT Sales FROM Britain.Sales UNION SELECT Sales FROM France.Sales UNION SELECT Sales FROM Italy.Sales Definition Advantages / Uses Problems / Complications Client-Server / SQL Server Microsoft Access Germany Britain France Italy 2 D A T A B A S E Distributed Database Definition Multiple independent databases Each DBMS is a complete DBMS (engine, queries, locking, transactions, etc.) Usually on different machines. Usually in different locations. Connected by a network. Might be different environments Hardware Operating System DBMS Software Database Apollo Database Zeus England France Database Athena United States 3 D A T A B A S E Distributed Database Rules C.J. Date Rule 0: Transparency: the user should not know or care that the database is distributed. Local autonomy. No reliance on a central site. Continuous operation. Location independence. Fragmentation independence (physical storage). Replication independence. Distributed query processing. Distributed transaction management. Hardware independence. Operating system independence. Network independence. DBMS independence. 4 D A T A B A S E Distributed Features Each database can continue to run even if portion fails. Data and hardware can be moved without affecting operations or users. Expanding operations. Performance issues. System expansion and upgrades. Add new section without affecting others. Upgrade hardware, network and DBMS. 5 D A T A B A S E Advantages and Applications Business operations are often distributed local transactions Work and data are segmented by department. Work and data are segmented by geographical location. Improved performance Most updates and queries are performed locally. Maintain local control and responsibility over data. future expansion Can still combine data across the system. Scalability and expansion Add on, not replacement. 6 D A T A B A S E Creating a Distributed Database Design administration plan. Choose hardware and DBMS vendor, and network. Set up network and DBMS connections. Choose locations for data. Choose replication strategy. Create backup plan and strategy. Create local views and synonyms. Perform stress test: loads and failures. 7 D A T A B A S E Distributed Query Processing Networks are slow Drives: 20 - 60 MB per sec. LANs: 1-10 MB per sec (10-100 mbps). WANs: 0.01 - 5 MB per sec. Faster is possible but expensive! SANs: 10-100 MB per sec. Goal is to minimize transmissions. WAN 0.1 - 5 MB Each system must be capable of evaluating queries--preferably SQL. Results depend heavily on how the system joins tables. 10 - 20 MB Disk drive 10-100 MB LAN 8 D A T A B A S E Example Distributed Query Processing NY NY: Customers: 1 M rows Customers(C#, …) LA: Production: 10 M rows 1,000,000 C# list from Chicago: Sales: 20 M rows desired P# Query: List customers who Chicago Matching bought blue products on March 1 Sales(S#, C#, Sdate) Customer Bad idea #1 data 20,000,000 SaleItem(S#, P#,…) Transfer all rows to Chicago 50,000,000 Then JOIN and select. Better idea #2 (probably) P# sold on Transfer blue products from LA March 1 to Chicago Blue P# sold on Better idea #3 LA March 1 Get sale items on March 1 Products(P#, Color…) Get blue products from LA 10,000,000 Send C# to NY 9 D A T A B A S E Data Replication Goals Minimize transmissions Improve performance Support heavy multiuser access. Problems Updating copies Britain Britain: Customers & Sales Market research & data corrections. France: Customers & Sales Spain: Customers & Sales Periodic updates Bulk transmissions Site unavailable Concurrency Easier for two people to change the same data at the same time. Decision support systems. Data warehouse. Spain Britain: Customers & Sales France: Customers & Sales Spain: Customers & Sales Update data. 10 D A T A B A S E Concurrency and Locks Each DBMS must maintain lock facility. To update, each DBMS must utilize and recognize other lock mechanisms and return codes. Each DBMS must have a deadlock resolution protocol that recognizes the distributed databases. Random wait. Optimistic updates. Two-phase commit. DBMS #1 Accounts Jones 8898 Transaction A Locked Waiting DBMS #2 Accounts Jones 3561 Transaction B Waiting Locked 11 D A T A B A S E Transactions & Two-Phase Commit Two (or more) separate lock managers. DBMS initiating update serves as the coordinator. Two phases Database 1 Initiate Transaction 1. Prepare to commit. All agree? Coordinator sends message 2. Commit and data to all machines to “get ready.” Local machines save data in logs, verify update status and return message. If all locals report OK, then Database 2 Lock tables. coordinator writes log and Database 3 Save log. instructs others to proceed. Update all tables. If any fail, it sends Rollback message. 12 D A T A B A S E Distributed Transaction Managers Transaction Manager Resource Manager DBMS Transaction Manager Resource Manager DBMS Transaction Manager Resource Manager Transaction Processing DBMS Monitor The distributed transaction coordinator/transaction processing monitor handles the transaction decisions and coordinates across the participating systems. 13 D A T A B A S E Distributed Design Questions Question What level of data consistency is needed? How expensive is storage? What are the shared access requirements? How often are the tables updated? Required speed of updates (transactions)? How important are predictable transaction times? DBMS support for concurrency and locking? Can shared access be avoided? Concurrent High Medium – High Global Often Fast High Good – Excellent No Replication Low – Medium Low Local Seldom Slow Low Poor Yes 14 D A T A B A S E Distributed Databases In Oracle Database Links Full database names. CONNECT command. Linking through synonyms. CREATE SYNONYM … Central control over permissions. Schema.Table@Location Scott.Emp@hq.acme.com Server database Synonym: Employee Procedure: DELETE FROM Employee WHERE ... Linking through Views/queries. CREATE VIEW AS … Can assign local permissions. Linking through stored procedures. DELETE … Strong control over actions. View user permissions User can only run procedure. No other access. 15 D A T A B A S E Client-Server Server Server Shared Database Front-end User Interface Clients Clients 16 D A T A B A S E LAN File Server File Server Not a distributed database. Data file stored on server. Server is passive, appears as giant disk drive to PC. PC processes all data. Retrieves all needed data across the network. Performance improvements. Indexes are crucial. Store some data on each PC (replication). Store applications on PC (graphics & forms). Convert to SQL-Server DBMS data file Application Shared Data All data from all tables are read by PC, which performs JOIN and WHERE test. If available, reads index first. SELECT Name, SaleDate FROM Customer INNER JOIN Sales ON Customer.C# = Sales.C# WHERE SaleDate BETWEEN #1-Mar-97# AND #9-Mar-97#; 17 D A T A B A S E LAN File Server: Slow File Server MyFile.mdb CustID Name … 115 Jenkins … Forms 125 Juarez ... Order ... DBMS software transferred. Application and query transferred. SELECT * FROM Customer WHERE City = “Sandy” One row at a time transferred, until all rows are examined. 18 D A T A B A S E Client-Server Databases File Server One machine machine is dominant (server) and handles data for many clients. Client machines handle front-end tasks and small data tables that are not shared. DBMS SQL Server Send SQL statement. Shared Data Return matching data. application 19 D A T A B A S E ADO and Direct Connections Server Computer The Database vendor provides its own data transport (e.g,. Oracle or SQL Server) installed on the server and the client. Database Server DBMS transport ADO provides a driver that connects your application to the transport services. ODBC can serve as the data transport if nothing else is available DBMS transport ADO Visual Basic application Client Computer 20 D A T A B A S E Three-Tier Client-Server Databases. Server Databases Transactions. Client front-end Legacy applications. Middle Locate databases Business rules Program code Database links. Business rules. Program code. Application. Front-end. User Interface. Database Servers Middleware Client 21 D A T A B A S E Database Independence on the Client Original DBMS ADO New DBMS ADO Application 22 D A T A B A S E Database Independence with Queries Independent Application Query: works with any DBMS SELECT SaleID, SaleDate, CustomerID, CustomerName FROM SaleCustomer Saved Oracle Query SELECT SaleID, SaleDate, CustomerID, LastName || ‘, ‘ || FirstName AS CustomerName FROM Sale, Customer WHERE Sale.CustomerID=Customer.CustomerID Saved SQL Server Query SELECT SaleID, SaleDate, CustomerID, LastName + ‘, ‘ + FirstName AS CustomerName FROM Sale INNER JOIN Customer ON Sale.CustomerID = Customer.CustomerID 23 D A T A B A S E The Internet as Client-Server information Internet Router Router Server Client Browser request http://server.location/page Web Server HTML pages Forms Graphics 24 D A T A B A S E HTML Limited Clients <HTML> <HEAD> <TITLE>My main page</TITLE></HEAD> <BODY BACKGROUND=“graphics/back0.jpg”> <P>My text goes in paragraphs.</P> <P>Additional tags set <B>boldface</B> and <I>Italic</I>. <P>Tables are more complicated and use a set of tags for rows and columns.</P> <TABLE BORDER=1> <TR><TD>First cell</TD><TD>Second cell</TD></TR> <TR><TD>Next row</TD><TD>Second column</TD></TR> </TABLE> <P>There are form tags to create input forms for collecting data. But you need CGI program code to convert and use the input data.</P> </BODY> </HTML> 25 D A T A B A S E HTML Output My text goes in paragraphs. Additional tags set boldface and Italic. Tables are more complicated and use a set of tags for rows and columns. First cell Second cell Next row Second column There are form tags to create input forms for collecting data. But you need CGI program code to convert and use the input data. 26 D A T A B A S E Web Server Database Fundamentals 0 Request Server/Form.html 3 Client/Browser Database 1 2 Data 3 2 DBMS HTML Form Result 1 Query Web Server Result Page 1 HTML form Form.html 2 Query Template + Code Program code 27 D A T A B A S E Database Example: Client Side 0 Request Server/Form.html Server 1 Initial form 3 Results 2 28 D A T A B A S E Client-Server Data Transfer Order Form Order ID 1015 Customer Jones, Martha Order Date 12-Aug What if there are 10,000 customers? How much time to load the combo box? How do you refresh/reload the combo box? Alternatives? 29 D A T A B A S E Latency Server Generate form Receive form data Transmission delay Transmission delay time Form received Client User delay 30 D A T A B A S E XML: Transferring Data Order: OrderID, OrderDate, ShippingCost, Comment Item: ItemID, Description, Quantity, Cost Item: ItemID, Description, Quantity, Cost Item: ItemID, Description, Quantity, Cost Many XML files contain hierarchical data. 31 D A T A B A S E XML: Schema Definition xsd <?xml version="1.0" encoding="utf-8"?> <xs:schema id="OrderList" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xs:element name="OrderList" msdata:IsDataSet="true"> <xs:complexType> Partial file, <xs:choice maxOccurs="unbounded"> generated by <xs:element name="Order"> <xs:complexType> .NET xsd.exe <xs:sequence> <xs:element name="OrderID" type="xs:string" minOccurs="0" /> <xs:element name="OrderDate" type="xs:date" minOccurs="0" /> <xs:element name="ShippingCost" type="xs:string" minOccurs="0" /> <xs:element name="Comment" type="xs:string" minOccurs="0" /> <xs:element name="Items" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="ItemID" nillable="true" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:simpleContent msdata:ColumnName="ItemID_Text" msdata:Ordinal="0"> <xs:extension base="xs:string"> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="Description" nillable="true" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:simpleContent msdata:ColumnName="Description_Text" msdata:Ordinal="0"> <xs:extension base="xs:string"> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> 32 D A T A B A S E XML Data Example <?xml version="1.0"?> <!DOCTYPE OrderList SYSTEM "orderlist.dtd"> <OrderList> <Order> <OrderID>1</OrderID> <OrderDate>3/6/2004</OrderDate> <ShippingCost>$33.54</ShippingCost> <Comment>Need immediately.</Comment> <Items> <ItemID>30</ItemID> <Description>Flea Collar-DogMedium</Description> <Quantity>208</Quantity> <Cost>$4.42</Cost> <ItemID>27</ItemID> <Description>Aquarium Filter &amp; Pump</Description> <Quantity>8</Quantity> <Cost>$24.65</Cost> </Items> </Order> </OrderList> XML: extensible markup language 33 D A T A B A S E XML Example in Explorer 34 D A T A B A S E Java and JDBC Connection con = DriverManager.getConnection( "jdbc.myDriver:myDBName", “myLogin”, “myPassword”); Statement smt = con.CreateStatement(); ResultSet rst = smt.executeQuery( “SELECT AnimalID, Name, Category, Breed FROM Animal”); while (rst.next()) { int iAnimal = rst.getInt(“AnimalID”); String sName = rst.getString(“Name”); String sCategory = rst.getString(“Category”); String sBreed = rst.getString(“Breed”); \\ Now do something with these four variables } 35 Database Management Systems End of Chapter 10 McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved.