The Whirlwind Tour 9:00 11:00 13:30 15:30 18:00 Aug. 2 Intro & terminology Reliability Fault tolerance Transaction models Reception Aug. 3 Aug. 4 Aug. 5 Aug. 6 TP mons Logging & Files & Structured & ORBs res. Mgr. Buffer Mgr. files Locking Res. Mgr. & COM+ Access paths theory Trans. Mgr. Locking CICS & TP CORBA/ Groupware techniques & Internet EJB + TP Queueing Advanced Replication Performance Trans. Mgr. & TPC Workflow Cyberbricks Party FREE Chapter 1a © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 Transactions: Where It All Started [Cuneiform] documents now number about half a million, threequarters of them more or less directly related to the history of law dealing, as they do, with contracts, acknowledgment of debts, receipts, inventories, and accounts, as well as containing records and minutes of judgments rendered in courts, business letters, administrative and diplomatic correspondence, laws, international treaties, and other official transactions. The total evidence enables the historian to reach back as far as the beginnings of writing, to the dawn of history.[ ... ] Moreover, because of the inconvenience of writing in stone or clay, Mesopotamians wrote only when economic or political necessity demanded it. (Encyclopaedia Britannica, 1974 edition) © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2 From Transactions to Transaction Processing Systems - I The Sumerian way of doing business involved two components: Database. An abstract system state, represented as marks on clay tablets, was maintained. Today, we would call this the database. Transactions. Scribes recorded state changes with new records (clay tablets) in the database. Today, we would call these state changes transactions. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 3 From Transactions to Transaction Processing Systems - II Reality Abstraction DB Transaction DB' Query Change Answer The real state is represented by an abstraction, called the database, and the transformation of the real state is mirrored by the execution of a program, called a transaction, that transforms the database. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 4 Transactions Are In ... Communications: Each time you make a phone call, there is a call setup transaction that allocates some resources to your conversation; the call teardown is a second transaction, freeing those resources. The call setup increasingly involves complex algorithms to find the callee (800 numbers could be anywhere in the world) and to decide who is to be billed (800 and 900 numbers have complex billing). The system must deal with features like call forwarding, call waiting, and voice mail. After the call teardown, billing may involve many phone companies. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 5 Transactions Are In ... Finance: Each time you purchase gas using a credit card, the point-of-sale terminal connects to the credit card company's computer. In case that fails, it may alternatively try to debit the amount to your account by connecting to your bank. This generalizes to all kinds of point-of-sale terminals such as cash registers, ATMs, etc. When banks balance their accounts with each other (electronic fund transfer), they use transactions for reliability and recoverability. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 6 Transactions Are In ... Travel: Making reservations for a trip requires many related bookings and ticket purchases from airlines, hotels, rental car companies, and so on. From the perspective of the customer, the whole trip package is one purchase. From the perspective of the multiple systems involved, many transactions are executed: One per airline reservation (at least), one for each hotel reservation, one for each car rental, one for each ticket to be printed, on for setting up the bill, etc. Along the way, each inquiry that may not have resulted in a reservation is a transaction, too. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 7 Transactions Are In ... Manufacturing: Order entry, job and inventory planning and scheduling, accounting, and so on are classical application areas of transaction processing. Computer integrated manufacturing (CIM) is a key technique for improving industrial productivity and efficiency. Just-in-time inventory control, automated warehouses, and robotic assembly lines each require a reliable data storage system to represent the factory state. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 8 Transactions Are In ... Real-Time Systems: This application area includes all kinds of physical machinery that needs to interact with the real world, either as a sensor, or as an actor. Traditionally, such systems were custom made for each individual plant, starting from the hardware. The usual reason for that was that 20 years ago off-the-shelf systems could not guarantee real-time behavior that is critical in these applications. This has changed, and so has the feasibility of building entire systems from scratch. Standard software is now used to ensure that the application will be portable. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 9 A Transaction Processing System A transaction processing system (TP-system) provides tools to ease or automate application programming, execution, and administration of complex, distributed applications. Transaction processing applications typically support a network of devices that submit queries and updates to the application. Based on these inputs, the application maintains a database representing some real-world state. Application responses and outputs typically drive real-world actuators and transducers that alter or control the state. The applications, database, and network tend to evolve over several decades. Increasingly, the systems are geographically distributed, heterogeneous (they involve equipment and software from many different vendors), continuously available (there is no scheduled downtime), and have stringent response time requirements. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 10 ACID Properties: First Definition Atomicity: A transaction’s changes to the state are atomic: either all happen or none happen. These changes include database changes, messages, and actions on transducers. Consistency: A transaction is a correct transformation of the state. The actions taken as a group do not violate any of the integrity constraints associated with the state. This requires that the transaction be a correct program. Isolation: Even though transactions execute concurrently, it appears to each transaction T, that others executed either before T or after T, but not both. Durability: Once a transaction completes successfully (commits), its changes to the state survive failures. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 11 Structure of a Transaction Program The application program declares the start of a new transaction by invoking BEGIN_WORK(). All subsequent operations will be covered by the transaction. Eventually, the application program will call COMMIT_WORK(), if a new consistent state has been reached. This makes sure the new state becomes durable. If the application program cannot complete properly (violation of consistency constraints), it will invoke ROLLBACK_WORK(), which appeals to the atomicity of the transaction, thus removing all effects the program might have had so far. If for some reason the application fails to call either commit or rollback (there could be an endless loop, a crash, a forced process termination), the transaction system will automatically invoke ROLLBACK_WORK() for that transaction. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 12 The End User’s View of a Transaction Processing System Operations on Mail and Mailboxes Mailboxes and Mail Logon Bruce Name______ Password___ Chris Headers From Jim Chris Betty Subject hi it's raining more bugs Read Message from: Jim subject: hi <text> © Jim Gray, Andreas Reuter Andreas Delete Message Cancel Message Send Message to: Jim subject: dinner <text, sound, image> Transaction Processing - Concepts and Techniques Betty Jim WICS August 2 - 6, 1999 13 The Administrator's/Operator’s View of a TP System Administrator & Operator Other Mail Systems Mail Gateway Hong Kong Data Base Application Data Comm © Jim Gray, Andreas Reuter New York Repository Berlin Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 14 Performance Measures of Interactive Transactions Performance/ Small/Simple Medium Complex Transaction ________________________________________________________________ Instr./transaction 100k 1M 100M Disk I/O / TA 1 10 1000 Local msgs. (B) 10 (5KB) 100 (50KB) 1000 (1MB) Remote msgs. (B) 2 (300B) 2 (4KB) 100 (1MB) Cost/TA/second 10k$/tps 100k$/tps 1M$/tps Peak tps/site 1000 100 1 © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 15 Client-Server Computing: The Classical Idea Host Server(s) Workstation Client Services Presentation In Workstation Logon Delete Headers Send Read © Jim Gray, Andreas Reuter Transactional Remote Procedure Call Data communications Transaction Processing - Concepts and Techniques Logon Headers Read Send Data Base TP Monitor Delete WICS August 2 - 6, 1999 16 Client-Server Computing: The CORBA Idea Object Implementation: Jim´s Mailbox Client on WS Presentation Services etc IDL Skeleton IDL Stub Request: Delete Object Request Broker © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 17 Client-Server Computing: The WWW Idea HTTP Server WWWBrowser Java-Applet + Java Database Connection (JDBC) Driver Code Javaapplet JDBCdriver code proprietary protocol JDBC-ODBCbridge ODBC driver prop. protocol Database Server JDBC network public protocol JDBC driver driver (e.g. TCP/IP) © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 18 © Jim Gray, Andreas Reuter Network TP Monitor Service (server ) Database Network Client Time User Screen Using Transactional Remote Procedure Calls (TRPCs) Transaction Processing - Concepts and Techniques Another TP-Monitor and Server WICS August 2 - 6, 1999 19 Terms We Have Introduced So Far Resource manager: The system comes with an array of transactional resource managers that provide ACID operations on the objects they implement. Database systems, persistent programming languages, and queue managers are typical examples. Durable state: Application state represented as durable data stored by the resource managers. TRPC: Transactional remote procedure calls allow the application to invoke local and remote resource managers as though they were local. They also allow the application designer to decompose the application into client and server processes on different computers. Transaction program: Inquiries and state transfor-mations are written as programs in conventional or specialized programming languages. The programmer brackets the successful execution of the program with a Begin-Commit pair and brackets a failed execution with a Begin-Rollback pair. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 20 Terms We Have Introduced So Far Atomicity: At any point before the commit, the application or the system may abort the transaction, invoking rollback. If the transaction is aborted, all of its changes to durable objects will be undone (reversed), and it will be as though the transaction never ran. Consistency: The work within a Begin-Commit pair must be a correct transformation. Isolation: While the transaction is executing, the resource managers ensure that all objects the transaction reads are isolated from the updates of concurrent transactions. Durability: Once the commit has been successfully executed, all the state transformations of that transaction are made durable and public. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 21 The World According to the Resource Manager Transaction Application Servers Application Servers Transaction Manager Application Resource Managers © Jim Gray, Andreas Reuter Resource Managers Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 22 Where To Split Client/Server? Thin Fat Presentation Flow Control Application Logic (=business objects) Data Access Fat © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques Thin Server WICS August 2 - 6, 1999 23 Client/Server Infrastructure Client Server Middleware Objects GUI SQL Files ORB OOUI System Mgmt. TRPC Groupware Mail Security TP-Mon. WWW DBMS Transport OS © Jim Gray, Andreas Reuter etc. OS Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 24 Transactional Core Services Begin_Work() transid Work Requests Application Work Requests Resource Manager Normal Funcitons Recovery Manager Lock Lock Requests Manager Join_Work Log Records Log Manager Commit_Work() Transaction Recovery Functions © Jim Gray, Andreas Reuter Commit Phase 1? Yes/No Commit Phase 2 ack Transaction Processing - Concepts and Techniques Write Commit Log Record & Force Log WICS August 2 - 6, 1999 25 The X/Open TP-Model TM Transaction Manager Begin Commit Abort Prepare, Commit, Abort Join Application Requests RM Resource Manager © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 26 The X/Open Distributed Transaction Processing Model Begin Commit Abort TM Transaction Manager Application Outgoing Incoming CM Communications Manager CM Communications Manager Prepare, Commit, Abort Requests Remote Requests Server Prepare, Commit, Abort Requests RM Resource Manager RM Resource Manager © Jim Gray, Andreas Reuter TM Transaction Manager Start Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 27 The OTS Model transmitted with request transaction originator TAcontext creation termination TAcontext © Jim Gray, Andreas Reuter recoverable server commit coordination Transaction service Transaction Processing - Concepts and Techniques invocation TAcontext WICS August 2 - 6, 1999 28 Transaction Processing System Feature List Application development features Application generators; graphical programming interfaces; screen painters; compilers; CASE tools; test data generators; starter system with a complete set of administrative and operations functions, security, and accounting. Repository features Description of all components of the system, both hardware and software. Description of the dependencies among components (bill-of-material). Description of all changes to all components to keep track of different versions. The repository is a database. Its role in the system must be complete, extensible, active and allow for local autonomy. TP-Monitor Features Process management; server classes; transactional remote procedure calls; request-based authentication and authorization; support for applications and resource managers in implementing ACID operations on durable objects. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 29 Transaction Processing System Feature List Data communications features Uniform I/O interfaces; device independence; virtual terminal; screen painter support; support for RPC and TRPC; support for context-oriented communication (peer-to-peer). Database features Data independence; data definition; data manipulation; data control; data display; database operations. Operations features Archiving; reorganization; diagnosis; recovery; disaster recovery; change control; security; system extension. Education and testing features Imbedded education; online documentation; training systems; national language features; test database generators; test drivers. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 30 Data Communications Protocols Applications Standard Interface To All Networks add: transactions, rpc, naming, security, reliable messaeges, and uniform interface. SNA LU0 © Jim Gray, Andreas Reuter X.25 TCP IP Named Pipes Transaction Processing - Concepts and Techniques SNA LU6.2 PU2.1 OSI WICS August 2 - 6, 1999 31 Presentation Management Form Description Repository Application OUR BANK NAME_____ PM 1 LOGON 2 NAME PIC X(20) 2 PIN PIC 9(4) PASSWORD_ Device Description © Jim Gray, Andreas Reuter READ TERMINAL CHECK PIN DISPLAY HELLO OR NO Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 32 SQL Data Definition TABLE (=File) VIEW employee name dept loc emp view DEFINE VIEW emp_view AS SELECT dept,loc FROM employee where loc = 7; dept loc TUPLE (=record) DOMAIN (= type) COLUMN (=field) © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 33 SQL Data Manipulation PROJECT (column subset) employee name dept loc SELECT (row subset) employee name dept loc JOIN (matching values) address dept mgr employee name dept loc a a a project © Jim Gray, Andreas Reuter select Transaction Processing - Concepts and Techniques join WICS August 2 - 6, 1999 34 Summary of Chapter 1 A transaction processing system is a large web of application generators, system design and operation tools, and the more mundane language, database, network, and operations software. The repository and the applications that maintain it are the mechanisms needed to manage the TP system. The repository is a transaction processing application. It represents the system configuration as a database and supplies change control by transactions that manipulate the configuration and the repository. The transaction concept, like contract law, is intended to resolve the situation when exceptions arise. The first order of business in designing a system is, therefore, to have a clear model of system failure modes. What breaks? How often do things break? © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 35 Basic Terminology 9:00 11:00 13:30 15:30 18:00 Aug. 2 Intro & terminology Reliability Fault tolerance Transaction models Reception Aug. 3 Aug. 4 Aug. 5 Aug. 6 TP mons Logging & Files & Structured & ORBs res. Mgr. Buffer Mgr. files Locking Res. Mgr. & COM+ Access paths theory Trans. Mgr. Locking CICS & TP CORBA/ Groupware techniques & Internet EJB + TP Queueing Advanced Replication Performance Trans. Mgr. & TPC Workflow Cyberbricks Party FREE Chapter 1b © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 A Word About Words (Chapter 2) Humpty Dumpty: “When I use a word, it means exactly what I chose it to mean; nothing more nor less.” Alice: “The question is, whether you can make words mean so many different things.” Humpty Dumpty: “The question is, which is to be master, that’s all.” Lewis Carroll © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 37 Basic Computer Terms To get any confusion that might be caused by the many synonyms in our field out of the way, let us adopt the following conventions for the rest of this class: domain = data type = ... field = column = attribute = ... record = tuple = object = entity = ... block = page = frame = slot = ... file = data set = table = ... process = task = thread = actor = ... function=request=method=... All the other terms and definitions we need will be briefly introduced and explained during the session. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 38 Basic Hardware Architecture I In Bell and Newell’s classic taxonomy, hardware consists of three types of modules: Processors, memory, and communications (switches or wires). Processors execute instructions from a program, read and write memory, and send data via communication lines. Computers are generally classified as supercomputers, mainframes, minicomputers, workstations, and personal computers. However, these distinctions are becoming fuzzy with current shifts in technology. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 39 Basic Hardware Architecture II Today’s workstation has the power of yesterday’s mainframe. Similarly, today’s WAN (wide area network) has the communications bandwidth of yesterday’s LAN (local area network). In addition, electronic memories are growing in size to include much of the data formerly stored on magnetic disk. These technology trends have deep implications for transaction processing. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 40 Basic Hardware Architecture III Distributed processing: Processing is moving closer to the producers and consumers of the data (workstations, intelligent sensors, robots, and so on). Client-server: These computers interact with each other via request-reply protocols. One machine, called the client, makes requests to another, called the server. Of course, the server may in turn be a client to other machines. Clusters: Powerful servers consist of clusters of many processors and memories, cooperating in parallel to perform common tasks. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 41 Basic Hardware Architecture IV The Network processor processor processor processor Memory processor processor Memory Memory © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 42 Memories - The Economic Perspective I The processor executes instructions from virtual memory, and it reads and alters bytes from the virtual memory. The mapping between virtual memory and real memory includes electronic memory, which is close to the processor, volatile, fast, and expensive, and magnetic memory, which is "far away" from the processor, non-volatile, slow, and cheap. The mapping process is handled by the operating system with some hardware assistance. Memory performance is measured by its access time: Given an address, the memory presents the data at some later time. The delay is called the memory access time. Access time is a combination of latency (the time to deliver the first byte), and transfer time (the time to move the data). Transfer time, in turn, is determined by the transfer size and the transfer rate. This produces the following overall equation: memory access time = latency + ( transfer size / transfer rate ) © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 43 Memories - The Economic Perspective II Memory price-performance is measured in one of two ways: Cost/byte. The cost of storing a byte of data in that media. Cost/access. The cost of reading a block of data from that media. This is computed by dividing the device cost by the number of accesses per second that the device can perform. The actual units are cost/access/second, but the time unit is implicit in the metric’s name. These two cost measures reflect the two different views of a memory’s purpose: it stores data, and it receives and retrieves data. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 44 Typical large system capacity Memories- The Economic Perspective III © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 45 Memories- The Economic Perspective VI $ / MB © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 46 Magnetic Memory There are two types of magnetic storage media: disk and tape. Disks rotate, passing the data in the cylinder by the electronic read-write heads every few milliseconds. This gives low access latency. The disk arm can move among cylinders in tens of milliseconds. Tapes have approximately the same storage density and transfer rate, but they must move long distances if random access is desired. Consequently, tapes have large random access latencies—on the order of seconds. Disk Access Time = © Jim Gray, Andreas Reuter Seek_Time + Rotational_Latency + (Transfer_Size/ Transfer_Rate) Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 47 Magnetic Memory Compare the times required for two access patterns to 1MB stored in 1000 blocks on disk: Sequential access: Read or write sectors [x, x + 1, ..., x + 999] in ascending order. This requires one seek (10 ms) and half a rotation (5 ms) before the data in the cylinder begins transferring the megabyte at 10 MBps (the transfer takes 100 ms, ignoring one-cylinder seeks). The total access time is 115ms. Random access: Read the 1000 sectors [x, ..., x + 999] in random order. In this case, each read requires a seek (10 ms), half a rotation (5 ms), and then the 1 kb transfer (.1 ms). Since there are 1000 of these events, the total access time is 15.1 seconds. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 48 Memory Hierarchies processor cache registers current data cache main memory electronic storage online external storage near line (archive) storage block addressed non-volatile electronic or magnetic tape or disc robots off line memory capacity © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 49 Memory Hierarchies The hierarchy uses small, fast, expensive cache memories to cache some data present in larger, slower, cheaper memories. If hit ratios are good, the overall memory speed approximates the speed of the cache. At any level of the memory hierarchy, the hit ratio is defined as: hit ratio = references satisfied by cache / all references to cache Suppose a cache memory with access time C has hit rate H, and suppose that on a miss the secondary memory access time is S. Further, suppose that C = .01 • S. The effective access time of the cache will be as follows: Effective memory access time = H • C + (1 - H) • S = H • (.01 • S) + ( 1 - H) • S = (1 - .99 • H) • S (1 - H) • S © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 50 The Five Minute Rule Assume there are no special response time (real-time) requirements; the decision to keep something in cache is, therefore, purely economic. To make things simple, suppose that data blocks are 10 KB. At 1995 prices, 10 KB of main memory cost about $1. Thus, we could keep the data in main memory forever if we were willing to spend a dollar. With 10 KB of disk costing only $.10, we could save $.90 if we kept the 10 KB on disk. In reality, the savings are not so great; if the disk data is accessed, it must be moved to main memory, and that costs something. How much, then, does a disk access cost? A disk, along with all its supporting hardware, costs about $3,000 (in 1995) and delivers about 30 acc./sec.; the cost, therefore, is about $100. At this rate, if the data is accessed once a second, it costs $100.10 to store it on disk (disk storage and disk access costs). That is considerably more than the $1 to store it in main memory. The break-even point is about one access per 100 seconds. At that rate, the main memory cost is about the same as the disk storage cost plus the disk access costs. At a more frequent access rate, diskstorage is more expensive. At a less frequent rate, disk storage is cheaper. Anticipating the cheaper main memory that will result from technology changes, this observation is called the five-minute rule rather than the two-minute rule. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 51 The Five Minute Rule Keep a data item in electronic memory if its access frequency is five minutes or higher; otherwise keep it in magnetic memory. Similar arguments apply to objects stored on tape and cached on disk. Given the object size, the cost of cache, the cost of secondary memory, and the cost of accessing the object in secondary memory once per second, the frequency at the break-even point in units of accesses per second (a/s) is given by the following formula: Frequency ((Cache_Cost/Byte - Secondary_Cost/Byte) . Object_Bytes) / (Object_Access_Per_Second_Cost) a/s © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 52 The Rules of Exponential Growth Electronic memory: MemoryChipCapacity(year) = 4((year-1970)/3) Kb/chip for year in [1970...2000] Moore’s Law Magnetic memory: MagneticAreaDensity(year) = 10 ((year-1970)/10) Mb/inch2 for year [1970...2000] Hoagland’s Law Processors: (year-1984) SunMips(year) = 2 MIPS for year in [1984...2000] Joy’s Law © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 53 Communication Hardware The early 90s The definition of the four kinds of networks by their diameters. These diameters imply certain latencies (based on the speed of light). In 1990, Ethernet (at 10 Mbps) was the dominant LAN. Metropolitan networks typically are based on 1 Mbps public lines. Such lines are too expensive for transcontinental links at present; most longdistance lines are therefore 50 Kbps or less. As you will get from the news, these things are changing fast. Cluster LAN (local area network) MAN (metro area network) WAN (wide area network) © Jim Gray, Andreas Reuter 100 m .5 µs 1 Gbps 10 µs 1 km 5. µs 10 Mbps 1 ms 100 km .5 ms 1 Mbps 10 ms 10,000 km 50. ms 50 Kbps 210 ms Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 54 Communication Hardware Scenario 2000 Point-to-point bandwidth likely to be common among computers by the year 2000. Type of Network Cluster LAN (local area network) MAN (metro area network) WAN (wide area network) © Jim Gray, Andreas Reuter Diameter Latency Bandwidth Send 1 KB 100 m .5 µs 1 Gbps 5 µs 1 km 5. µs 1 Gbps 10 µs 100 km .5 ms 100 Mbps .6 ms 10,000 km 50. ms 100 Mbps 50 ms Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 55 Processor Architectures The Network processor processor processor processor processor processor processor Private Memories Private Memory © Jim Gray, Andreas Reuter Global Memory Shared Disks / tapes Transaction Processing - Concepts and Techniques Shared Memory WICS August 2 - 6, 1999 56 Processor Architectures Shared nothing: In a shared-nothing design, each memory is dedicated to a single processor. All accesses to that data must pass through that processor. Processors communicate by sending messages to each other via the communications network. Shared global: In a shared-global design, each processor has some private memory not accessible to other processors. There is, however, a pool of global memory; shared by the collection of processors. This global memory is usually addressed in blocks (units of a few kilobytes or more) and is RAM disk or disk. Shared memory: In a shared-memory design, each processor has transparent access to all memory. If multiple processors access the data concurrently, the underlying hardware regulates the access to the shared data and provides each processor a current view of the data. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 57 Address Spaces process address space process process address space address space © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques shared data segments shared code segments segments WICS August 2 - 6, 1999 58 Address Spaces Memory segmentation and sharing: A process executes in an address space—a paged, segmented array of bytes. Some segments may be shared with other address spaces. The sharing may be execute-only, read-only, or read-write. Most of the segment slots are empty (lightly shaded boxes), and most of the occupied segments are only partially full of programs or data. To simplify memory addressing, the virtual address space is divided into fixed-size segment slots, and each segment partially fills a slot. Typical slot sizes range from 2**24 to 2**32 bytes. This gives a two-dimensional address space, where addresses are {segment_number, byte}. Again, segments are often partitioned into virtual memory pages, which are the unit of transfer between main and secondary memory. If an object is bigger than a segment, it can be mapped into consecutive segments of the address. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 59 Processes A process is a virtual processor. It has an address space that contains the program the process is executing and the memory the process reads and writes. One can imagine a process executing Java programs statement by statement, with each statement reading and writing bytes in the address space or sending messages to other processes. Processes provide an ability to execute programs in parallel; they provide a protection entity; and they provide a way of structuring computations into independent execution streams. So they provide a form of fault containment in case a program fails. Processes are building blocks for transactions, but the two concepts are orthogonal. A process can execute many different transactions over time, and parts of a single transaction may be executed by many processes. Each process executes on behalf of some user, or authority, and with some priority. The authority determines what the process can do: which other processes, devices, and files the process can address and communicate with. The process priority determines how quickly the process’s demand for resour-ces will be serviced if other processes make competing demands. Short tasks typically run with high priority, while large tasks are given lower priority. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 60 Protection Domains There are two ways to provide protection : Process = protection domain: Each subsystem executes as a separate process with its own private address space. Applications execute subsystem requests by switching processes, that is, by sending a message to a process. Address space = protection domain: A process has many address spaces: one for each protected subsystem and one for the application. Applications execute subsystem requests by switching address spaces. The address space protection domain of a subsystem is just an address space that contains some of the caller’s segments; in addition, it contains program and data segments belonging to the called subsystem. A process connects to the domain by asking the subsystem or OS kernel to add the segment to the address space. Once connected, the domain is callable from other domains in the process by using a special instruction or kernel call. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 61 Protection Domains process Application DataBase Network OS Kernel A process may have many protection domains. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 62 Threads There is a need for multiple processes per address space: For example, to scan through a data stream, one process is appointed the producer, which reads the data from an external source, while the second process processes the data. Further examples of cooperating processes are file read-ahead, asynchronous buffer flushing, and other housekeeping chores in the system. Processes can share the same address space simply by having all their address spaces point to the same segments. Most operating systems do not make a clean distinction between address spaces and processes. Thus a new concept, called a thread or a task, is introduced. But note: Several operating systems do not use the term process at all. For example, in the Mach operating system, thread means process, and task means address space; in MVS, task means process, and so on. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 63 Threads The term thread often implies a second property: inexpensive to create and dispatch. Threads are commonly provided by some software that found the operating system processes to be too expensive to create or dispatch. The thread software multiplexes one big operating system process among many threads, which can be created and dispatched hundreds of times faster than a process. The term thread is used in the following to connote these lightweight processes. Unless this light-weight property is intended, “process” is used. Several threads usually share a common address space. Typically, all the threads have the same authorization identifier, since they are part of the same address space domain, but they may have different scheduling priorities. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 64 Messages and Sessions There are two styles of communication among processes: Datagrams: The sender of a message determines the recipient's address (e.g. the process name) and constructs an envelope consisting of the sender's name and address, the recipient's name and address, and the message text. This envelope is delivered to the capable hands of the communication system. It is analogous to sending letters by mail. Sessions: Before any messages are sent, a fixed connection is established between sender and receiver, a so-called session. Once it has been established, both parties can send and receive messages via this session. This symmetry is often referred to as "peer-to-peer". Establishing a session requires a datagram. A session must at some point be closed down explicitly. It is analogous to a phone conversation. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 65 Advantages of Sessions Shared state: A session represents shared state between the client and the server. A datagram might go to any process with the designated name, but a session goes to a particular instance of that name. Authorization: Processes do not always trust each other. The server often checks the client’s credentials to see that the client is authorized to perform the requested function. The authentication protocols require multi-message exchanges. Once the session key is established, it is shared state. Error correction: Messages flowing in each session direction are numbered sequentially. These sequence numbers can detect lost messages and duplicate messages. Performance: The operations described are fairly costly. Each of the steps often involves several messages. By establishing a session, this information is cached. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 66 Clients and Servers The question of how computations consisting of many interacting processes should be structured has no simple answer. Currently, two styles are particularly popular: peer-to-peer and client-server. The debate about which style is "better" often creates the impression that they are radically different. But in reality, peer-to-peer is more general and more complex, and it subsumes client-server. Here is a brief characterization: Peer-to-peer: The two processes are independent peers, each executing its computation and occasionally exchanging data with the other. Client-server: The two processes interact via request-reply exchanges in which one process, the client, makes a request to a second process, the server, which performs this request and replies to the client. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 67 Clients and Servers The limitation of the client-server model lies in the fact that it implies a synchronous pattern of one request/one response. There are, however, cases in which one request generates thousands of replies, or where thousands of requests generate one reply. Operations that have this property include transferring a file between the client and server or bulk reading and writing of databases. In other situations, a client request generates a request to a second server, which, in turn, replies to the client. Parallelism is a third area where simple RPC is inappropriate. Because the client-server model postulates synchronous remote procedure calls, the computation uses one processor at a time. However, there is growing interest in schemes that allow many processes to work on problems in parallel. The RPC model in its simplest form does not allow any parallelism. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 68 Remote Procedure Calls (RPCs) LOCAL PROCDURE CALL z = add(x,y) z add(int x,y) { return x + y } REMOTE PROCDURE CALL z = add(x,y) Server pack & send add, x, y unpack & call add(int x,y) { return x + y } z unpack,return © Jim Gray, Andreas Reuter x + y pack and send Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 69 Naming Naming has to do with the problem of how a client denotes a server it wants to invoke. Typical naming schemes distinguish between an object's name, its address, and its location. The name is an abstract identifier for the object, the address is the path to the object, and the location is where the object is. An object can have several names. Some of these names may be synonyms, called aliases. Let us say that Bruce and Lindsay are two aliases for Bruce Lindsay. For this to be explicit, all names, addresses, and locations must be interpreted in some context, called a directory. For example, in our RPC context, Bruce means Bruce Nelson, and in our publishing context, Bruce means Bruce Spatz. Within the 408 telephone area, Bruce Lindsay’s address is 927-1747, and outside the United States it is +1-408-927-1747. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 70 Name Servers Names are grouped into a hierarchy called the name space. An international commission has defined a universal name space standard, X.500, for computer systems. The commission administers the root of that name space. Each interior node of the hierarchy is a directory. A sequence of names delimited by a period (.) gives a path name from the directory to the object. No one stores the entire name space—it is too big, and it is changing too rapidly. Certain processes, called name servers, store parts of the name space local to their neighborhood; in addition, they store a directory of more global name servers. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 71 Authentication Techniques Passwords are the simplest technique. The client has a secret password, a string of bytes known only to it and the server. The client sends his password to the server to prove the client’s identity. A second password is then needed to authenticate the server to the client. Thus, two passwords are required, and they must be sent across the wire. Challenge-response uses only one password or key. In this scheme, the client and the server share a secret encryption key. The server picks a random number, N, and encrypts it with the key as EN. The server sends EN to the client and challenges the client to decrypt it using the secret key. If the client responds with N, the server believes the client knows the secret encryption key. The client can also authenticate the server by challenging it to decrypt a second random number. The shared secret is stored at both ends, but random numbers are sent across the wire. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 72 Authentication Techniques Public key system: Each authid has a pair of keys—a public encryption key, EK, and a private decryption key, DK. The keys are chosen so that DK(EK(X)) = X, but knowing only EK and EK(X) it is hard to compute X. Thus, a process’s ability to compute X from EK(X) is proof that the process knows the secret DK. Each authid publishes its public key to the world. Anyone wanting to authenticate the process as that authid goes through the challenge protocol: The challenger picks a random number X, encrypts it with the authid’s public key EK, and challenges the process to compute X from EK(X). Secrets are stored in one place only, and they do not go across the wire. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 73 Scheduling The purpose of scheduling is to make sure all requests get processed, i.e. are assigned to a specific server process. There are basically two additional constraints: Short response times: The requests should not wait longer than necessary before they get serviced. Economic usage of resources: The required throughput should be achieved with the minimum number of resources (processors, nodes, links, etc.). Throughput and response time at resource utilization r are related by the following formula: Average_Response_Time(r) = (1/ (1 - r)) • Service_Time © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 74 Response Time (in multiples of service time) The Scheduling Problem Response Time vs Utilization 30 20 10 0 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1 Utilization: © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 75 File Organizations File unstructured structured direct entry sequenced © Jim Gray, Andreas Reuter associative relative key sequenced Transaction Processing - Concepts and Techniques hash WICS August 2 - 6, 1999 76 SQL in a Distributed Environment Client Application Program SQL : set oriented logic File System: record logic SQL Servers SQL: set oriented logic SQL: set oriented logic File Server: SQL: set records oriented and logic files Network: msg. transport SQL:records set and oriented File Server: files logic Network: File Server: message records transport and files Network: message transport File Server: records and files Network: message transport Network: message transport © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 77 Software Performance MICROSECONDS (with 10 mips and Ethernet) INSTRUCTIONS 1,000,000 100,000 10,000 1,000 100 10 100,000 process create simple database transaction 10,000 main memory transation null transaction WAN rpc random read/write disc record random write memory record 1,000 LAN rpc random read memory record sequential write record local rpc 100 process dispatch sequential read record domain switch procedure call 1KB on Ethernet 1KB memory copy 10 1 1 © Jim Gray, Andreas Reuter WAN transmit delay disc access .1 Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 78 Protocol Standards Porting and Installation Steps Portable Program API compiler linker/loader "local" compiled program Operation and Inter-Operation Unix Operating System Client Machine Client process FAP message formats protocol machine © Jim Gray, Andreas Reuter VMS Operating System Transaction Processing - Concepts and Techniques Server Server Machine protocol machine WICS August 2 - 6, 1999 79 Relevant FAP-Standards CSMA/CD, Token Ring, etc.: Low-level protocols that specify how bits are physically transmitted across a shared medium. IP/TCP, NetBIOS, HTTP: Transport level protocols. LU6.2: SNA´s peer-to-peer protocol that allows both session oriented and client-server-style communication under transaction protection. OSI-TP: ISO´s rendering of a protocol that provides a functionality very similar to LU6.2. ASN.1: Protocol for exchanging data formatting and structuring information. Required for RPCs in a heterogeneous environment. DRDA: Interoperability standard for IBM SQL-systems. ODBC, JDBC: Interoperability standards for general SQL-systems. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 80 Relevant API-Standards SQL: Portability standard for accessing relational databases (lots of proprietary extensions). APPC, CPI-C: Two of IBM´s APIs for the LU6.2 protocol. X/Open-XA, X/Open-XA+, etc.: APIs by the X/Open consortium on ISO´s OSI-TP protocols. IDL: OMG´s interface definition language to let objects be integrated through an object request broker. STDL: Language for programming TP-applications; based on the ACMS TP-monitor. Java: The web´s favorite programming language; comes with its own FAP-component. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 81 OSI Standards and X/Open APIs OSI/TP and CCR protocols TM TM prepare, commit, abort Transaction Transaction begin +ack, -ack, restart Manager Manager start commit new transid is transid is leaving abort arriving this node CM CM CommuniCommuniApplication cations cations Server prepare, prepare, Manager Manager commit, commit, abort abort remote requests requests requests RM Resource Manager © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques RM Resource Manager WICS August 2 - 6, 1999 82 A Last Glance at TP-Standards PARTICIPANTS application : TM application : RM application:server TM : RM TM: CM TM-TM PROTOCOL / API TX RM specific (e.g. SQL, Queues) RPC or ROSE XA XA+ OSI-TP + CCR DEFINER X/Open DTP various OSI + application X/Open DTP X/Open DTP OSI Each resource manager (RM) registers with its local transaction manager (TM). Applications start and commit transactions by calling their local TM. At commit, the TM invokes every participating RM. If the transaction is distributed, the communications manager informs the local and remote TM about the incoming or outgoing transaction, so that the two TMs can use the OSI-TP protocol to commit the transaction. © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 83 Summary Transaction processing systems comprise all parts of a system, software and hardware. Building such a system requires to consider end-to-end arguments at all levels of abstraction. The performance of distributed TP systems is influenced by the hardware architecture (what is shared), by software issues (which protocols are used), and by configuration aspects (what limits scaleability). The multitude of those influences gives rise to a constant dilemma: Should one restrict the variety to few (proprietary) components for better tuning and performance, or should one embrace all the standards for openness - at the risk of poor scaleability and performance? © Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 84