Transaction Processing Monitors An Overview Module 2 COP 6730 Overview • A reference architecture of transaction-oriented system • role of a TP monitor within this framework. – services provided by a TP monitor – structure of this system component 2 The Role of TP Monitors Operating systems, communication systems, etc. are usually not designed for the needs of a transaction-oriented environment: – A TP monitor provides • either essential services absent from the host system, or • services the host performed so poorly that a new implementation was required. – The main function of a TP monitor is to integrate other system components to make them work together to support transaction-oriented processing. 3 Characteristics of TRANSACTIONORINETED PROCESSING (1) Data sharing: Computations read and update databases shared among all users. Repetitive workload: Users do not run arbitrary programs, but rather request the system to execute certain functions out of a predefined set. Mostly simple functions: Consume 105 – 107 instructions and do some 10 disk I/Os. Variable requests: exhibits some statistical regularity, but cannot be preplanned. 4 Characteristics of TRANSACTIONORINETED PROCESSING (2) Some batch transactions: have the size and duration of typical batch jobs. Many concurrent users: 103 – 106 High availability: Because of the large number of users, the system must be highly reliable and available. System does recovery. Automatic load balancing: The system should deliver high throughput with guaranteed low response time (soft real-time system). 5 Transaction Types – Direct vs queued – Simple vs complex local/distributed Transaction types are distinguished by three categories – Local vs distributed 6 Direct vs Queued Transactions Direct: The terminal and the process running the server program (handling the request) are associated with each other. Queued: Transactions are put in a queue and scheduled for processing according to the queuing discipline. Server Program Server Program 7 Simple vs Complex Transactions Simple – Single message: There is a single input message from the terminal; and upon commit, a single output message is delivered. – Short: The number of object it touches is in the tens. Complex – Conversational: It allows for repeated exchange of messages between the user and the application. – Long: The number of objects it touches is in the tens of thousands (batch-like transaction). 8 Local vs Distributed Transactions • Local: Transactions run entirely on the network node where the request originated (centralized processing). • Distributed: In addition to the local node, transactions may also invoke services from other nodes 9 A TAXONOMY OF TRANSACTION EXECUTION Transaction Direct Single Message Local Distributed Direct OLTP Transaction Queued Conversational Local Distributed Complex Online Transaction Short Long Local Distribute Local Distribute Queued OLTP Transaction Long Batch Transaction (e.g., ad hoc queries) OLTP: Online Transaction Processing 10 Transaction Processing Services • Transaction services must provide a programming environment that integrates transaction control in a seamless manner. – The program needs not worry about concurrency, failures, clean-up, and so forth. • As far as data sharing is concerned, applications can use the services provided by a database service. 11 Transaction Processing Services Apart from the technical issue of access to shared data, more system services are required Manage heterogeneity: Local transaction mechanisms in each subsystem are not sufficient to ensure the ACID properties for the whole function. Control communication: Status of communication sessions must also be subject to transaction control (e.g., Transactional RPC) Terminal management: Since the ACID properties must be perceived by the user, sending and receiving the message must be part of the transaction (e.g., Response delivered to user before failure ?) Presentation services: If the terminal uses sophisticate presentation services, then reestablishing the window environment after a crash is also a part of the transaction guarantee. Context management: Storing and Recovering context must be bound to the SoC Start/restart: TP monitor must also handle restart after any failure. By doing so, all the subsystems are brought up in a state that is consistent with respect to the ACID rules 12 Integrated Control Database transaction control is not all there is to transaction processing Note: • All components integrated by the transaction services must implement a basic set of protocols that enable them to cooperate in transaction processing • Subsystems that support these protocols are called resource managers 13 Server and Sever Class • Typically, a number of services are bunched together in one application. • Server class is a group of processes (servers) that are able to run the code of a given application program. • At run time, a server class is maintained for each application program. Server Class Server Server Server Server Server • Execution of a service request requires the request to be sent to a process (a server) of the right server class – service invocation. 14 One Process Per Terminal • All applications are linked together to form one application program. • At logon, each terminal is given its own process for the entire session (e.g., time-sharing systems) Process 100 Applications Process 100 Applications Process 100 Applications Process 100 Applications 15 One Process Per Terminal Problem: 1. Too many capabilities per processor: Each process comes with more capabilities then a terminal needs. 2. Too many process switches: Process switches are very expensive operations in most operating systems (2,000 – 5,000 instructions) Limitation: Acceptable only for small systems of less than 200 clients. 16 Only One Terminal Process • All terminals talk to one process which can be the TP monitor process itself. • The TP monitor process receives the function requests and route them to the programs that can service them One Process 100 Applications EXAMPLE: CICS (Customer Information Control System) is a transaction server that runs primarily on IBM mainframe systems. 17 Only One Terminal Process Advantages & Disadvantages • Advantages: – Makes transaction processing simpler. – The TP monitor can check the function requests, schedule them according to its own polices, and so on. • Disadvantages: – Each page fault or other exception in the process will stop the whole TP environment. – Since a single process can employ only one CPU at a time, the TP system can uses only one CPU. – The process is confined within one address space, which can be a serious limitation for large application. 18 Many Servers, One Scheduler • One (data communication) process handles all request and response messages. • There is a group of processes (i.e., a server class) for each application program. – Different applications are fenced off against each other. – The data communication process routes the service 19 request to the appropriates server. Many Servers, One Scheduler Advantages & Disadvantages • Example: IMS/DC IMS (Information Management System) is a joint hierarchical database and information management system with extensive transaction processing capabilities • Advantages: Simplicity! There is one place for scheduling and load control. • Disadvantages: The data communication resource can become a bottleneck. 20 Many Servers, Many Schedulers A number of (functionally identical) data communication processes do the terminal handling – There is a server class for data communication services. – The communication service must multiplex itself among the terminals it is attached to (i.e., multi-threaded process). Many Data Communication Processes Presentation Services Terminals Monitor Process Many Application Servers Application 1 Application n 21 Many Servers, Many Schedulers Load control, activation/deactivation of processes, etc. must be coordinated by a separate instance, the monitor process • The application server classes are set up as in “many servers one scheduler” scenario. • The application servers can be simple, single-threaded processes. The presentation service process should be multi-threaded to support multiple terminals 22 Many Servers, Many Schedulers Advantages & Disadvantages Example: Tandem’s Pathway, DEC’s ACMS (Application Control Management System). Advantage: • The data communication process is no longer a bottleneck. • Expensive process switches can generally be replaced by much cheaper processinternal thread changes. Disadvantage: Load balance become more difficult. 23 Tasks of TP Monitors (1) • Scheduling: Service requests must be mapped to the proper servers. • Server class management: The TP monitor is responsible for setting up the server class. • Recovery: After a crash, the TP monitor is responsible for bringing up the TP environment. – It starts all the system processes, – brings up the server classes, and then – passes control to the transaction manager. 24 The Tasks of TP Monitors (2) • Resource administration: Information about the terminals, databases, application programs, users, etc. is kept in a system repository managed by the TP monitor. • Authentication and authorization: Service requests must be cleared by the TP monitor before they are executed. • System operation: The TP monitor must – provide the operators with sufficient information to tune the system, and – inform them about any problems that occur during normal operations. 25 Resource Managers A resource manager is a software subsystem that ties into the TP monitor to provide protected actions on its state. It must be able to participate in transaction-oriented recovery Start SoC DB2 participate in transaction TRID: used to tag all subsequent messages BEGIN WORK receive (input message) < some SQL > send (statistics menu) to (window w1); COMMIT WORK; Many server, one scheduler 26 Context-Sensitive Scheduling • The completion of a request typically frees the server so that it can be reassigned to another request. • However, there are cases in which a server is reserved for a special user. Example: For chained transactions, the server must be reserved for the “next” transaction, because it may refer to local context variables available only in that server process. 27 Transaction Manager (TM) Once the transaction program has started, TP monitor has little to do with transaction management. The coordination of the resource mangers is done by the transaction manager. 28 Transaction Manager (TM) cont’ We want to separate • the components exercising transaction control (transaction manager) from • those that do transaction-oriented resource scheduling (TP monitor). Reasons: There are transactions that do not come in though the TP monitor. Query Examples: • Ad hoc query interface of SQL system. • CAD applications run their own terminal environment. DBMS TP Environment 29 Responsibilities of TP Monitors (1) • The TP monitor brings up the resource managers upon startup. • For restart, the TP monitor only has to bring up the resource managers. The actual recovery protocol is completely handled among the resource managers and the transactions manager. 30 Responsibilities of TP Monitors (2) • To dispatch a server for a request, the TP monitor creates a process (or reuse an existing one) and load the code into it. • All the calls among resource managers are so-called transactional remote procedure calls (TRPCs). The mechanisms to handle them are provided by the TP monitor. Example: BEGIN_WORK is a TRPC to the transaction manager. 31 Transactional Remote Procedure Call (TRPC) Remote Procedure Call (RPC) A RPC system enables a client program to communicate with sever programs on different computers by calling procedures in a similar way to the conventional use of procedure calls in highlevel language. Server program Client program Computer 1 Server program Computer 2 33 Export/Import Service – Export Procedures: At the RPC level a service may be viewed as a module with an interface that exports a set of procedures appropriate for operating on some data abstraction or resource. Procedure 1 Procedure 2 Resource Procedure 3 Client Server Export my procedures 34 Export/Import Service – Import Procedures: From the perspective of client programs, a service provides the same facilities as a software module – enabling clients to import its procedures. Import a procedure Procedure 1 Procedure 2 Resource Procedure 3 Client Server 35 Marshalling • Marshalling is the process of taking a collection of data items and assembling them into a form suitable for transmission in a message. – Flatten structured data items into a sequence of basic data items. Marshalling – Translate those data items into an external data representation. 36 Unmarshalling • Unmarshalling is the process of disassembling them on arrival to produce an equivalent collection of data items at the destination. – Translate the external data representation to the local one. Marshalling Unmarshalling – Unflatten the data item. 37 Message Destinations • Potential clients need to know an identifier for communicating with a server. • In the Internet protocols, the destination addresses for messages are specified as – a port number used by a process and – the Internet address of the computer on which it runs. Send (p, message) port p Receive (p, message) port q Message Internet Address 38 RPC: Main Tasks The software that supports remote procedure calling has three main tasks: – Binding: Locating an appropriate server for a particular service. – Communication handling: Transmitting and receiving request and reply messages. – Interface processing: Integrating the RPC mechanism with client and server programs in convention programming languages. • dispatching of request messages to the appropriate procedure in the server. • marshalling and unmarshalling of arguments in the client and the server. 39 Stub Procedure Client computer Server computer Server process Client process Local call Local return Client Marshall arguments Send request Receive Request Unmarshall arguments Select procedure Unmarshall results Client stub procedure Receive Reply Communication module Send Reply Communication module Execute procedure Return Marshall results Dispatcher Service procedure Server stub An RPC system provides a stub procedure to stand in for each remote procedure that is called by the 40 client program. Client Stub Procedure Client computer Client process Local call Local return Client Marshall arguments Unmarshall results Client stub procedure Send request Receive Reply The purpose of a client stub procedure is to convert a local procedure call to a remote procedure call to the server. – marshal the arguments and to pack them up with the procedure identifier into message, – send the message to the server and then await the reply message, Communication module – unmarshal it and return the results. 41 Server Stub Procedure An RPC system provides a despatcher and a set of server stub procedures. Despatcher: uses the procedure identifier in the request message to select one of the server stub procedures and pass on the arguments. Server stub procedure: – unmarshals the arguments, – calls the appropriate service procedure, and – when it returns, marshals the output arguments into a reply message. Server computer Server process Receive Request Service procedure Unmarshall arguments Select procedure Send Reply Dispatcher Execute procedure Return Marshall results Server stub 42 Remote Procedure Calls (RPCs) CALLEE (server) CALLER (client) : Procedure Call 1. Subroutine Call : RPC stub RPC stub 3. Subroutine Call 2. Request massage Service Routine • RPC makes the invocation of services at remote nodes look like local subroutine calls. • The RPC stub on the callee acts fully complementary to the stub at the caller’s side. 43 Interface Definition • The types of the arguments and results in the client stub must conform to those expected by the server stub. This is achieved by the use of a common interface definition. • An RPC interface definition specifies those characteristics of the procedures provided by a server that are visible to the server’s clients: – names of the procedures, and – types of their parameters. 44 Interface Compilers Interface Definition (in Interface Definition Language) INTERFACE COMPILER A Client program INTERFACE COMPILER B Client stub Client computer COMPILER A COMPILER A Server stub Server computer COMPILER B Server process LINKERUnmarshall arguments Marshall LINKER arguments Receive Request Send request RPC CLIENT Local return COMPILER B COMPILER B Client process Local call Server program Dispatcher Receive Reply Select procedure SERVER Send Reply Execute procedure Return Marshall results Interface compilers can be designed to process interfaces for use with different languages enabling clients and servers written 45 in different languages to communicate by using RPCs. Unmarshall results Interface Compilers Interface Definition (in Interface Definition Language) INTERFACE COMPILER A Client program INTERFACE COMPILER B Client stub COMPILER A COMPILER A Server stub COMPILER B Server program COMPILER B COMPILER B LINKER LINKER CLIENT Dispatcher RPC SERVER Interface compilers can be designed to process interfaces for use with different languages enabling clients and servers written 46 in different languages to communicate by using RPCs. Invocation of SQL Resource Manager (SQL Pre-compiler) 1. Server Side: SQL pre-compiler parses and translates the SQL statement into an internal representation that can be interpreted directly by the SQL executor. Internal alsorepresentation generates 2. Client Side: The pre-compiler code for the host language to call the SQL server: !sqlselect(‘fastsql’, format_CB, expression_CB, SELECT … Query Precompiler &variable_CB); FROM … Executor WHERE … A resource manager invocation (recognized by the stub compiler) Entry point Resource manager name (RMNAME) Parameters 47 Invocation of SQL Resource Manager (SQL Pre-compiler) 1. Server Side: SQL pre-compiler parses and translates the SQL statement into an internal representation that can be interpreted directly by the SQL executor. 2. Client Side: The pre-compiler also generates code for the host language to call the SQL server: !sqlselect(‘fastsql’, format_CB, expression_CB, &variable_CB); A resource manager invocation (recognized by the stub compiler) Entry point Resource manager name (RMNAME) In host language Parameters 48 Execution Plans Embedded SQL is compiled once, and from then on the generated query plan is executed. • At compile time, the client has to issue rmCall to the SQL server for it to compile the statement. • The SQL server compiles the statement and generates the access plan, and hands back an ID for that plan. • At run time, the rmCalls from the client refer to the access plan ID and thereby ask the server to run that pre-compiled query. 49 Binding An interface definition specifies a textual service name for a server. However, client request message must be addressed to a server port. Look Up: When a client process starts, it sends a message to the binder requesting it to look up the identifier of the server port of a named service. CLIENT Binder (Name Service) Service Name SERVER Server Port Registration: When a server process starts executing, it sends a message to the binder requesting it to register its service name and server port. 50 Transactional RPC (TRPC) TP Monitors provide the mechanism to handle RPCs. In addition, TP monitors turn each RPC into a TRPC: • Bind RPCs to transaction: Each RPC is tagged with a TRID. • Inform the transaction manager: It makes sure that the transaction manager knows the callee is participating in a transaction (i.e., expanding the sphere of control). • Binding Processes to transactions: When dispatching a server, the TP monitor remembers the transaction for which the server is running and thus can inform the transaction manager if that process crashes. 51 TP Monitors & O.S. TP monitors allocate resources for other system components to do the work, rather then doing the work itself. – Their tasks are similar to the duties of an operating system. – Some believe it would be best if the operating system just swallowed the TP monitor. 52 Summary The sum of TP monitor’s functioning is twofold: 1. It extends standard RPC mechanisms to include server class management. 2. It provides the transaction manager with enough information to keep the dynamically expanding web of resource managers participating in a transaction within a sphere of control. 53 Dynamic of TRPC (1) 1. Bind the RMNAME in the invocation to a NODEID and an RMID; information is obtained from the name server. 2. Look up the callee’s interface prototype description (in the repository). 3. Coerce* the local parameter representation into the one expected by the invoked resource manager. 4. Pack all the transformed parameter values into a byte string (parameter marshalling). *e.g., mapping the data type from Big Endian (most significant byte in smallest address) to Little Endian (least significant byte in smallest address) 54 Dynamic of TRPC (2) 5. Send the message to the peer TRPC stub. 6. The caller is now suspended until the response from the server arrives. 7. When the response from the server arrives, unpack the byte string (reveres marshalling). 8. Coerce the parameter values received into the representation used by the caller. Note: Client makes it right: coercing the parameter values is done at the caller’s site. Server makes it right: coercing is done at the server’s site. 55