Chapter 17 QUESTIONS 1. What is the role of clients and servers in distributed processing? Ans: In distributed processing, clients make requests to a server. Servers perform the request and communicate the result to the clients. 2. Briefly define the terms flexibility, interoperability, and scalability. How does client-server processing support interoperability, flexibility, and scalability? Ans: Flexibility refers to the ease of maintaining and adapting a system. Interoperability refers to the ability of two or more systems to exchange and use software and data (open standards). Scalability refers to the ability to add and remove capacity in small units. Client-server processing supports flexibility because volatile sections of code can be isolated from more stable sections. Client-server processing supports interoperability because open standards promote a marketplace of suppliers, leading to lower costs and higher quality. Client-server processing supports scalability by allowing the movement of processing between clients and servers and the addition of capacity on the client and server. 3. Discuss some of the pitfalls of developing client–server systems. Ans: The pitfalls of developing client–server systems are that it may be more complex because of architectural choices. In addition to architectural issues, the designer may face a difficult decision about building a client-server system on proprietary methods versus open standards. 4. Briefly define the terms scaleup and speedup and the measurement of these terms. Ans: Scaleup measures the increased size of a job which can be done while holding the time constant. Scaleup is measured as the ratio of the amount of work completed with the larger configuration to the amount of work completed with the original configuration. With added computing capacity, speedup measures time reduction while holding the task constant. Speedup is measured by the ratio of the completion time with the original configuration to the completion time with the additional capacity. Chapter 17: End of Chapter Question Solutions 1 5. Briefly high availability and indicate how parallel database processing supports high availability. Ans: With highly available or fault resilient computing, a system experiences little downtime and recovers quickly from failures. Parallel database processing allows resources to fail without halting processing. 6. How can a distributed database improve data control? Ans: A distributed database allows the location of data to match an organization’s structure. Decisions about sharing and maintaining data can be set locally to provide control closer to the data usage. 7. How can a distributed database reduce communication costs and improve performance? Ans: A distributed database can lead to lower communication costs and improved performance. Data should be located so that 80 percent of the requests are local. Local requests incur little or no communication costs and delays compared to remote requests. Increased data availability also can lead to improved performance. 8. Discuss some of the pitfalls when developing distributed databases. Ans: Distributed database design issues are very difficult. A poor design can lead to higher communication costs and poor performance. 9. Discuss why distributed processing is more mature and more widely implemented than distributed databases. Ans: Distributed databases are very difficult to implement. A poor design can lead to higher communication costs and poor performance. In addition, there are more mature tools in the market for distributed processing than for distributed databases. 10. Why are division of processing and process management important in client–server architectures? Ans: Division of processing is the allocation of tasks to clients and servers. The load must be balanced for the efficiency of the architectures. Process management ensures interoperability among clients and servers. Chapter 17: End of Chapter Question Solutions 2 11. Explain how two-tier architectures address division of processing and process management. Ans: A PC client contains the presentation code and SQL statements for data access; the database server processes the SQL statements and sends query results. The database server also handles process management. 12. Explain how three-tier architectures address division of processing and process management. Ans: Three-tier architectures have the same division of processing as the twotier approach. Middleware is introduced to handle process management. 13. Explain how multiple-tier architectures address division of processing and process management. Ans: The additional application servers, which can be invoked from PC clients, middleware, or database servers, provide a finer division of processing. Middleware handles process management. 14. What is a thin client? How does a thin client relate to division of processing in client–server architectures? Ans: A thin client is a client with inexpensive hardware to handle minimal tasks for an architecture that has more loads on a server. 15. List some reasons for choosing a two-tier architecture. Ans: Simplicity. Good for systems with stable requirements and moderate number of clients. 16. List some reasons for choosing a three-tier architecture. Ans: Improves performance over two-tier approach. Reduces load on the database server. 17. List some reasons for choosing a multiple-tier architecture. Ans: The multiple-tier architecture especially with the generality and power of software buses and frameworks is the most general client-server architecture. It can be the most difficult to implement because of its generality especially when using open standards. However, a good design and implementation of a multiple-tier architecture can provide the most benefits in terms of scalability, interoperability, and flexibility. Chapter 17: End of Chapter Question Solutions 3 18. What is the Web Services Architecture? Ans: An architecture that supports electronic commerce among organizations. A set of related Internet standards supports high interoperability among service requestors, service providers, and service registries. The most important standard is the Web Service Description Language used by service requestors, service providers, and service registries. 19. How does the Web Services Architecture support interoperability? Ans: To support interoperability, the Web Services Architecture uses a collection of interrelated Internet standards as depicted in Table 17-2. XML (eXtensible Markup Language) is the underlying foundation for most of the standards. XML is a meta language that supports the specification of other languages. In the Web Services Architecture, the WSFL, UDDI, WSDL, and SOAP standards are XML compliant languages. For a service requestor and provider, the WSDL is the standard directly used to request and bind a Web service. A WSDL document provides an interface to a service enabling the service provider to hide the details of providing the service. 20. Briefly describe the basic architectures for parallel database processing. Ans: The degree of resource sharing determines architectures for parallel database processing. The standard classification of architectures are known as shared everything (SE), shared disks (SD), and shared nothing (SN) as depicted in Figure 17.8. In the SE approach, memory and disks are shared among a collection of processors. The SE approach is usually regarded as a symmetric multiprocessing computer rather than a parallel database architecture. In the SD architecture, each processor has its private memory, but disks are shared among all processors. In the SN architecture, each processor has its own memory and disks. Data must be partitioned among the processors in the SN architecture. Partitioning is not necessary in the SD and SE architectures because each processor has access to all data. 21. Briefly describe the clustering extensions to the basic distributed database architectures. Ans: In the clustered disk architecture, the processors in each cluster share all disks, but nothing is shared across clusters. In the clustered nothing architecture, the processors in each cluster share no resources, but each cluster can be manipulated to work in parallel to perform a task. Chapter 17: End of Chapter Question Solutions 4 22. What are the primary design issues for parallel database processing? Identify the architecture most affected by the design issues. Ans: The primary design issues that influence the performance of the parallel database architectures are load balancing, cache coherence, and interprocessor communication. The clustered nothing architecture is most sensitive to load balancing because of the need for data partitioning. It can be difficult to partition a subset of a database to achieve equal division of work because data skew is common among database columns. By definition, the cache coherence problem is limited to shared disk architecture. 23. What is the cache coherence problem? Ans: Cache coherence involves synchronization among local memories and common disk storage. After a processor addresses a disk page, the image of this page remains in the cache associated with the given processor. An inconsistency occurs if another processor has changed the page in its own buffer. To avoid inconsistencies, when a disk page is accessed, a check of other local caches should be made to coordinate changes produced in the caches of these processors. 24. What is load balancing? Ans: Load balancing involves the amount of work allocated to different processors in a cluster. Ideally, each processor has the same amount of work to fully utilize the cluster. 25. What parallel database architecture is supported by Oracle Real Application Clusters? What is a key technology in Oracle Real Application Clusters? Ans: Oracle RAC uses the clustered disk architecture. Cache Fusion technology in RAC enables synchronized access to the cache across all the nodes in a cluster, without incurring expensive disk I/O operations. 26. What parallel database architecture is supported by IBM DB2 DPF option? What is a key technology in the DPF? Ans: DPF uses the clustered nothing architecture. Automatic partitioning is a key technology of DPF. 27. What is a global request? Ans: A global request uses data stored at more than one site. Chapter 17: End of Chapter Question Solutions 5 28. How does the integration level of the distributed DBMS affect the component architecture? Ans: A tightly integrated distributed DBMS can access the internal state of local data managers and thus can be more efficient. A loosely integrated distributed DBMS, on the other hand, only acts as middleware to coordinate local data managers. 29. When is a tightly integrated distributed DBMS appropriate? When is a loosely integrated distributed DBMS appropriate? Ans: When there is a need to work with legacy systems, a loosely integrated distributed DBMS is more appropriate, otherwise a tightly integrated distributed DBMS is more efficient and is a better choice. 30. Discuss the differences in the schema architecture for tightly and loosely integrated distributed DBMS. Ans: In tightly integrated distributed DBMS all internal schema formats are the same. The two new schemas are for fragmenting and locating the fragments. For loosely integrated distributed DBMS, each site can have different schema format. A mapping schema is needed to transform data from a local format into a global format to match the global conceptual schema. 31. How is distributed database transparency related to data independence? Ans: If database distribution is transparent, users can write queries with no knowledge of data distribution. In other words, higher transparency means a higher level of data independence. 32. Is a higher level of distribution transparency always preferred? Briefly explain why or why not. Ans: A higher level of transparency is not necessarily preferred. With higher transparency, users might not perceive the underlying distributed database processing and could use excessive resource consumption with queries or transactions. In addition, DBMS support for high levels of transparency may be inefficient. 33. What is a derived horizontal fragment and why is it useful? What is the relationship of the semi-join operator and derived horizontal fragmentation? Ans: A derived horizontal fragment is a fragment defined with a restriction and semi-join operation. Because some fragments should have rows related to other fragments, the semi-join operator is important for defining fragments. Chapter 17: End of Chapter Question Solutions 6 34. What is the larger difference in query formulation: (1) fragmentation transparency to location transparency or (2) location transparency to local mapping transparency? Justify your answer. Ans: Fragmentation transparency is more different from location transparency (1) than location transparency is from local mapping transparency (2). For some queries/transactions that operate across fragments, they must be rewritten in a more complex form when going from fragmentation to location transparency. However, from location to local mapping transparency, it is just a matter of adding the site names. 35. Why is fragment design and allocation a complex task? Ans: Data need to be carefully fragmented and allocated so that most requests are local. A poor design can cause poor performance and poor resource utilization. Designing and allocating fragments is similar to index selection. Data about the frequency of queries, the frequency of parameter values in queries, and the behavior of the global query optimizer are needed. In addition, data about the frequency of originating sites for each query is needed. The originating site for a query is the site in which the query is stored. Just as for index selection, optimization models and tools can aid decision making about fragment design and allocation. 36. Why is global query optimization important? Ans: Global optimization involves data movement and site selection decisions that could affect response time and resource consumption. 37. What are differences between global and local optimization in distributed query processing? Ans: In a centralized environment, minimizing resource usage is consistent with minimizing response time but in a distributed environment, minimizing resources may conflict with minimizing response time because of parallel processing opportunities. 38. Why are there multiple objectives in distributed query processing? Which objective seems to be more important? Ans: Because minimizing resources is not the same as minimizing response time as in centralized processing, both objectives are equally important. Depending on the application, a designer must find a balance between the two. Chapter 17: End of Chapter Question Solutions 7 39. What are the components of performance measures for distributed query processing? What factors influence how these components can be combined into a performance measure? Ans: The components are communication costs, local processing costs, and local input-output costs. Communication costs involved fixed message delays and variable data transfer delays. The weighting of communication costs versus local costs (input-output and processing) depends on network characteristics. For wide area networks, communication costs can dominate local costs. For local area networks, communication costs are more equally weighted with local costs. 40. How does two phase locking for distributed databases differ from two phase locking for centralized databases? Ans: For distributed query processing, lock management must be done across all sites. This can be done either by designating a central coordinating site or by distributing among all sites. 41. Why is the primary copy protocol widely used? Ans: The primary copy protocol reduces overhead at the cost of a non-current secondary copy. Usually performance is more important. 42. What kinds of additional failures occur in a distributed database environment? How can these failures be detected? Ans: The new failures are communication related. They can be detected by coordinating among sites. 43. What is the difference between the voting and the decision phases of the two phase commit protocol? Ans: In the voting phase, the coordinator writes Begin-Commit and waits for the participant to send in a response (ready or abort). The decision phase begins when the coordinator either receives votes from each participant or a timeout occurs. If a timeout occurs or at least one participant votes to abort, the coordinator sends abort messages to each participant. Otherwise, the coordinator sends commit messages to each participant and waits for responses. 44. Discuss the trade-offs between centralized and distributed coordination in distributed concurrency control and recovery. Ans: Centralized coordination is simpler than distributed coordination but can be less reliable. Chapter 17: End of Chapter Question Solutions 8 45. What level of transparency is provided by Oracle distributed databases? Ans: Oracle supports local mapping transparency. With the usage of synonyms, Oracle supports location transparency. With the usage of views, Oracle supports fragmentation transparency. 46. What are database links in Oracle distributed database processing? Ans: A database link provides a one-way connection from a local database to a remote database. A local database is the database in which a user connects. A remote database is another database in which a user wants to access in a global request. Database links allow a user to access another user's objects in a remote database without having an account on the remote site. Chapter 17: End of Chapter Question Solutions 9