Q1: JAWS: Understanding High Performance Web Systems Introduction The emergence of the World Wide Web (Web) as a mainstream technology has forced the issue on many hard problems for network application writers, with regard to providing a high quality of service (QoS) to application users. Client side strategies have included client side caching, and more recently, caching proxy servers. However, the other side of the problem persists, which is that of a popular Web server which cannot handle the request load that has been placed upon it. Some recent implementation of Web servers have been designed to deal specifically with high load, but they are tied down to a particular platform (e.g., SGI WebFORCE) or employ a specific strategy (e.g., single thread of control). I believe that the key to developing high performance Web systems is through a design which is flexible enough to accommodate different strategies for dealing with server load and is configurable from a high level specification describing the characteristics of the machine and the expected use load of the server. There is a related role of server flexibility, namely that of making new services, or protocols, available. The Service Configurator pattern has been identified as a solution towards making different services available, where inetd being cited as an example of this pattern in use. While a per process approach to services may be the right abstraction to use some of the time, a more integrated (yet modular!) approach may allow for greater strategic reuse. That is, a per process model of services requires each server to redesign and reimplement code which is common to all, and at the same time making it difficult to reuse strategies developed for one service in another. To gain ground in this area, the server should be designed so that new services can be easily added, and can easily use strategies provided by the adaptive server framework. But generalizing the notion of server-side service adaptation, one can envision a framework in which clients negotiate with servers about how services should handled. Most protocols today have been designed so that data manipulation is handled entirely on one side or the other. An adaptive protocol would enable a server and a client to negotiate which parts of a protocol should be handled on each end for optimal performance. Motivation Web servers are synonymous with HTTP servers and the HTTP 1.0 and 1.1 protocols are relatively straightforward. HTTP requests typically name a file and the server locates the file and returns it to the client requesting it. On the surface, therefore, Web servers appear to have few opportunities for optimization. This may lead to the conclusion that optimization efforts should be directed elsewhere (such as transport protocol optimizations, specialized hardware, and client-side caching). Empirical analysis reveals that the problem is more complex and the solution space is much richer. For instance, our experimental results show that a heavily accessed Apache Web server (the most popular server on the Web today) is unable to maintain satisfactory performance on a dual-CPU 180 Mhz UltraSPARC 2 over a 155 Mbps ATM network, due largely to its choice of process-level concurrency. Other studies have shown that the relative performance of different server designs depends heavily on server load characteristics (such as hit rate and file size). The explosive growth of the Web, coupled with the larger role servers play on the Web, places increasingly larger demands on servers. In particular, the severe loads servers already encounter handling millions of requests per day will be confounded with the deployment of high speed networks, such as ATM. Therefore, it is critical to understand how to improve server performance and predictability. Server performance is already a critical issue for the Internet and is becoming more important as Web protocols are applied to performance-sensitive intranet applications. For instance, electronic imaging systems based on HTTP (e.g., Siemens MED or Kodak Picture Net) require servers to perform computationally-intensive image filtering operations (e.g., smoothing, dithering, and gamma correction). Likewise, database applications based on Web protocols (such as AltaVista Search by Digital or the Lexis Nexis) support complex queries that may generate a higher number of disk accesses than a typical Web server. Modeling Overview of the JAWS Model Underlying Assumptions Infinite network bandwidth. This is consistent with my interests in high-speed networks. For a model of Web servers which limites the network bandwidth, see [Slothouber:96] Fixed network latency. We assume the contribution of network latency to be negligible. This is will be more true with persistent HTTP connections, and true request multiplexing. Client requests are "serialized". Simply meaning that the server will process successive requests from a single client in the order they are issued from the client. Research questions What is performance when average server rate is constant. What is performance when average server rate degrades with request rate? What degradation best models actual performance? Benchmarking Benchmarking Testbed Overview Hardware Testbed Our hardware testbed consisted of one Sun Ultra-1 and four Sun Ultra-2 workstations. The Ultra-1 has 128MB of RAM with an 167MHz UltraSPARC processor. Each Ultra-2 has 256MB of RAM, and is equipped with 2 UltraSPARC processors running at 168MHz. Each processor has 1MB of internal cache. All the machines are connected to a regular Ethernet configuration. The four Ultra-2 workstations are connected via an ATM network running through a Bay Networks LattisCell 10114 ATM, with a maxmimum bandwidth of 155Mbps. One of the Ultra-2 workstations hosted the Web server, while the three remaining Ultra-2 workstations were used to generate requests to benchmark the server. The Ultra-1 workstation served to coordinate the startup of the benchmarking clients and the gathering of data after the end of benchmarking runs. Software Request Generator Request load was generated by the WebSTONE webclient, that was modified to be multithreaded. Each ``child'' of the webclient iteratively issues a request, receives the requested data, issues a new request, and so on. Server load can be increased by increasing the number of webclient ``children''. The results of the tests are collected and reported by the webclients after all the requests have completed. Experiments Each experiment consists of several rounds, one round for each server in our test suite. Each round is conducted as a series of benchmarking sessions. Each session consists of having the benchmarking client issue a number of requests (N) for a designated file of a fixed size (Z), at a particular load level beginning at l. Each successive session increases the load by some fixed step value (d) to a maximum load (L). The webclient requests a standard file mix distributed by WebSTONE, which is representative of typical Web server request patterns. Findings By far, the greatest impediment to performance is the host filesystem of the Web server. However, factoring out I/O, the primary determinant to server performance is the concurrency strategy. For single CPU machines, single-threaded solutions are acceptable and perform well. However, they do not scale for multi-processor platforms. Process-based concurrency implementations perform reasonably well when the network is the bottleneck. However, on high-speed networks like ATM, the cost of spawning a new process per request is relatively high. Multi-threaded designs appear to be the choice of the top Web server performers. The cost of spawning a thread is much cheaper than that of a process. Additonal information is available in this paper. Adaptation Concurrency Strategies Each concurrent strategy has positive and negative aspects, which are summarized in the table below. Thus, to optimize performance, Web servers should be adaptive, i.e., be customizable to utilize the most beneficial strategy for particular traffic characteristics, workload, and hardware/OS platforms. In addition, workload studies indicate that the majority of the requests are for small files. Thus, Web servers should adaptively optimize themselves to provide higher priorities for smaller requests. These techniques combined could potentially produce a server capable of being highly responsive and maximizes throughput. The next generation of the JAWS server plans to implement the prioritized strategy. Strategy Advantages Disadvantages Single No context switching overhead. Does not scale for multi-processor Threaded Highly portable. systems. Process-per- More portable for machines Creation cost high. Resource request without threads. intensive. Process pool Avoids creation cost. Requires mutual exclusion in some operating systems. Thread-perrequest Much faster than fork. May require mutual exclusion. Not as portable. Thread pool Avoids creation cost. Requires mutual exclusion in some operating systems. Summary of Concurrency Strategies Protocol Processing There are instances where the contents being transferred may require extra processing. For instance, in HTTP/1.0 and HTTP/1.1 files may have some encoding type. This generally corresponds to a file having been stored in some compressed format (e.g., gzip). In HTTP, it has been customary for the client to perform the decoding. However, there may be cases where the client lacks the proper decoder. To handle such cases, it would be nice if the server would do the decoding on behalf of the client. A more advanced server may detect that a particularly large file would transfer more quickly for the client in some compressed form. But this kind of processing would require negotiation between the client and the server as to the kinds of content transformations are possible by the server and acceptable to the client. Thus, the server would be required to adapt to the abilities of the client, as well as the conditions of the network connection. JAWS Adaptive Web Server Here we will breifly describe the object-oriented architecture of the JAWS Web server framework. In order to understand the design, it is important to motivate the need for framework architectures. Solutions to the Reuse Problem Software reuse is a vital issue in successful development of large software systems. Software reuse can reduce development effort and maintenance costs. Thus, much effort in software engineering techniques has been devoted to the problem of creating reusable software. The techniques for developing reusable software have evolved through several generations of language features (e.g., structured programming, functional programming, 4GLs, object-oriented programming), compilation tools (e.g., source file inclusion, compiled object files, class libraries, components), and system design methods (e.g., functional design, complexity analysis, formal methods, object-oriented design, design patterns). While each of these techniques help to facilitate the development and integration of reusable software, their roles are passive. This means that the software developer must make the decisions of how to put together the software system from the repository of reusable software. The figure below illustrates the passive nature of these solutions. Application development with class libraries and design patterns. The advantages of this approach is that it maximizes the number of available options to software developers. This can be important in development environments with openended requirements, so that design flexibility is of premium value. However, the disadvantage is that every new project must be implemented from the ground up every single time. To gain architectural reuse, software developers may utilize an application framework to create a system. An application framework provides reusable software components for applications by integrating sets of abstract classes and defining standard ways that instances of these classes collaborate. Thus, a framework provides an application skeleton which can be customized by inheriting and instantiating from reuseable components in the framework. The result is pre-fabricated design at the cost of reduced design flexibility. An application framework architecture is shown in the figure below. Application development with an application framework. Frameworks can allow developers to gain greater reuse of designs and code. This comes from leveraging the knowledge of an expert applications framework developer who has pre-determined largely what libraries and objects to use, what patterns they follow, and how they should interact. However, frameworks are much more difficult to develop than a class library. The design must provide an adequate amount of flexibility and at the same time dictate enough structure to be a nearly complete application. This balance must be just right for the framework to be useful. The JAWS Web Server Framework The figure below illustrates the object-oriented software architecture of the JAWS Web server framework. As indicated earlier, our results demonstrate the performance variance that occurs as a Web server experiences changing load conditions. Thus, performance can be improved by dynamically adapting the server behavior to these changing conditions. JAWS is designed to allow Web server concurrency and event dispatching strategies to be customized in accordance with key environmental factors. These factors include static characteristics, such as support for kernel-level threading and/or asynchronous I/O in the OS, and the number of available CPUs, as well as dynamic factors, such as Web traffic patterns, and workload characteristics. JAWS Framework Overview JAWS is structured as a framework that contains the following components: an Event Dispatcher, Concurrency Strategy, I/O Strategy, Protocol Pipeline, Protocol Handlers, and Cached Virtual Filesystem. Each component is structured as a set of collaborating objects implemented with the ACE C++ communication framework. The components and their collaborations follow several design patterns which are named along the borders of the components. Each component plays the following role in JAWS: Event Dispatcher: This component is responsible for coordinating the Concurrency Strategy with the I/O Strategy. The passive establishment of connections with Web clients follows the Acceptor Pattern. New incoming requests are serviced by some concurrency strategy. As events are processed, they are dispensed to the Protocol Handler, which is parametized by I/O strategy. The ability to dynamically bind to a single concurrency strategy and I/O strategy from a number of choices follows the Strategy Pattern. Concurrency Strategy: This implements concurrency mechanisms (such as single-threaded, thread-per-request, or thread pool) that can be selected adaptively at run-time, using the State Pattern or pre-determined at initialization-time. Configuring the server as to which concurrency strategies are available follows the Service Configurator Pattern. When concurrency involves multiple threads, the strategy creates protocol handlers that follow the Active Object Pattern. I/O Strategy: This implements the I/O mechanisms (such as asynchronous, synchronous and reactive). Multiple I/O mechanisms can be used simultaneously. Asynchronous I/O is implemented utilizing the Asynchronous Completion Token Pattern. Reactive I/O is accomplished through the Reactor Pattern. Both Asynchronous and Reactive I/O utilize the Memento Pattern to capture and externalize the state of a request so that it can be restored at a later time. Protocol Handler: This object allows system developers to apply the JAWS framework to a variety of Web system applications. A Protocol Handler object is parameterized by a concurrency strategy and an I/O strategy, but these remain opaque to the protocol handler. In JAWS, this object implements the parsing and handling of HTTP request methods. The abstraction allows for other protocols (e.g., HTTP/1.1 and DICOM) to be incorporated easily into JAWS. To add a new protocol, developers simply write a new Protocol Handler implementation, which is then configured into the JAWS framework. Protocol Pipeline: This component provides a framework to allow a set of filter operations to be incorporated easily into the data being processed by the Protocol Handler. This integration is achieved by employing the Adapter Pattern. Pipelines follow the Streams Pattern for input processing. Pipeline components are made available with the Service Configurator Pattern. Cached Virtual Filesystem: The component improves Web server performance by reducing the overhead of filesystem accesses. The caching policy is strategized (e.g., LRU, LFU, Hinted, and Structured) following the Strategy Pattern. This allows different caching policies to be profiled for effectiveness and enables optimal strategies to be configured statically or dynamically. The cache is instantiated using the Singleton Pattern. Tilde Expander: This mechanism is another cache component that uses a perfect hash table that maps abbreviated user login names (e.g. ~schmidt to user home directories (e.g., /home/cs/faculty/schmidt). When personal Web pages are stored in user home directories, and user directories do not reside in one common root, this component substantially reduces the disk I/O overhead required to access a system user information file, such as /etc/passwd. Q2: ECE1770: Trends in Middleware Systems Course Lecture Jan 18, 2001 This lecture includes the following: o o o o Examples of distributed applications Problems in developing distributed applications Middleware Platforms, definitions and characteristics Categories of middleware Examples of Distributed Application Distributed applications are applications that are designed to run on distributed systems. Our concern of distributed systems in this course is network-based or connected systems such as LANs and WANs and not clusters or multiprocessor machines. Some of the most popular distributed applications are: OLTP (Online Transaction Processing) Online reservation systems in traveling agencies are an example of OLDP. In these kind of systems there is a central system and there are many other machines that are connected to the central system from different geographical locations. Banking Applications Database Management Applications Groupware applications like Lotus Note In the above systems there are usually clients and servers and there is network in between. On the server side there may be databases and database management systems and on the client side there may be any number of clients with different applications. Client applications may be thick or thin. Thin client programs have very little business logic in them like an applet or an HTML browser but thick client programs have a lot of business logic and computations in them like banking or simulation applications. Some problems in developing distributed applications: o o o o o Data Sharing and Concurrency Heterogeneity: different operating systems, different platforms, different architectures, different programming languages and so on. For instance transferring data between machines with different architectures needs conversion, because different machines use different data formats. Imagine there is a 64-bit architecture in one side and a 32-bit architecture on the other side or a machine with little-endian binary format in one side and big-endian format on the other side. Since these machines interpret information differently, for exchanging numbers or other information between these systems an especial program is needed to do the required conversions. Reliability in communication: If there is a failure in a single machine, the machine simply crashes but in distributed systems it is more complicated. For example if one side send a request and doesn’t get the response from the other side, what is supposed to be done as reaction? The connection is down or the server is down? Should the caller application wait or should it send another request? Session management tracking Securities: That is more severe in distributed systems than the local systems There are many other problems that must be addressed for a distributed application to be developed. If one is going to implement a distributed system all by himself/herself, he/she has to solve all such problems. Due to theses problems we need to come up with a standard solution. With modules, libraries, functionality and services to address these problems and this is what generally refer to as Middleware. Middleware tries to address these problems. Middleware Platforms, definitions and characteristics Although there are many definitions for middleware, still there is not a very clear and exact definition for it. There are some services that used to be part of operating systems and now are considered as part of middleware. On the other hand there are some services that are part of middleware and in future will be part of operating systems. Some people may attribute other services such as TCP/IP protocol that is a part of network layer as middleware or as a middleware layer. The essential role of middleware is to manage the complexity and heterogeneity of distributed infrastructures and thereby provides a simpler programming environment for distributed application developer. It is therefore most useful to define middleware as any software layer that is placed above the distributed system’s infrastructure – the network OS and APIs – and below the application layer. One way of viewing a middleware platform is to look at it as residing in the middle. It means under neat the application and above the operating system. This is possibly where the term "middleware" came from. The classical definition of an operating system is "the software that makes the hardware usable". Similarly, middleware can be considered to be the software that makes a distributed system programmable. Just as a bare computer without an operating system could be programmed with great difficulty, programming a distributed system is in general much more difficult without middleware. Middleware is sometimes informally called "plumbing" because it connects parts of a distributed application with data pipes and then passes data between them. It is also called "glue" technology sometimes, because it is often used to integrate legacy components. Middleware provides transparency with respect to implementation language and to the other heterogeneity issues mentioned above. Transparency in this context means that it is invisible to the application developer and at the implementation level. Any kind of service that can be fit into one of the aforementioned definitions, can be considered as middleware platform. Figure 1. Early distributed systems technologies such as OSF DCE (Open Software Foundation Distributed Computing environment), and Sun’s RPC (Remote Procedure Call) can be viewed as middleware. DCE has a very good security support, it has a time service, a directory service but supports only the C language. Several distributed object platforms have recently become quite popular. These platforms extend earlier distributed systems technologies. These middleware platforms are CORBA (Common Open Request Broker Architecture), DCOM (Distributed Component Object Model) and JAVA RMI (Remote Method Invocation). The differences between the above middlewares are the languages and the platforms that they support. CORBA supports multi languages and multi platforms. One can use different operating systems in his/her distributed system while he/she uses CORBA. It also supports object orientation. JAVA RMI supports just one language – JAVA - but it can run on different platforms provided the existence of JAVA virtual machine on the platforms. It supports object orientation too. DCOM is a Microsoft middleware solution. It only runs on Microsoft's operating systems, Windows et al and it supports only Microsoft's programming languages. Therefore it doesn’t support multi platforms. DCOM is also object oriented. Which is the best? Which one should we use? Which one will survive in future? There is not a simple answer to the above questions. Each of the above middlwares has strengths and weaknesses. The characteristics of the new middleware platforms can be summarized as follow: o o o o o Masking heterogeneity in the underlying infrastructure by cloaking system specifics. Permitting heterogeneity at the application level by allowing the various components of the distributed application to be written in any suitable language Providing structure for distributed components by adopting objectoriented principles Offering invisible, behind-the-scenes distribution as well as the ability to know what’s happening behind the scenes Providing general-purpose distributed services that aid application development and deployment Categories of middleware platforms o Distributed Tuple Spaces A distributed relational database offers the abstraction of distributed tuples. Its Structured Query Language (SQL) allows programmers to manipulate sets of these tuples in a declarative language yet with intuitive semantics and rigorous mathematical foundations based on set theory. Linda is a framework offering a distributed tuple abstraction called Tuple Space (TS). It allows people to publish information into TS. On one side there are publishers and on the other side there are subscribers. Publishers can publish into TS and subscribers can subscribe to an item of interest and use the published materials. The advantage of such a system is offering a spatial decoupling by allowing depositing and withdrawing processes to be unaware of each other’s identities. Publishers or producers and subscribers or consumers are decoupled i.e. they don’t have to be at the system at the same time, however they can work concurrently. Data items flow to and flow out of the system separately. Javaspaces is a concept very closely related to Linda’s TS. Jini is a network technology built on top of the Javapaces. Jini network technology provides a simple infrastructure for delivering services in a network and for creating spontaneous interaction between programs that use these services regardless of their hardware/software implementation. Any kind of network made up of services (applications, databases, servers, devices, information systems, mobile appliances, storage, printers, etc.) and clients (requesters of services) of those services can be easily assembled, disassembled, and maintained on the network using Jini technology. Services can be added or removed from the network, and new clients can find existing services, all without administration. o RPC (Remote Procedure Call) RPC offers the abstraction of being able to invoke a procedure whose body is across the network. RPC is a call sent from one machine or process to another machine or process for some service. An RPC is synchronous, beginning with a request from a local calling program to use a remote procedure and ending when the calling program receives the results from the procedure. An implementation of Distributed Computing Environment, DCE RPC, includes a compiler that translates an interface definition into a client stub, which marshals a procedure call and its parameters into a packet, and a server stub, which unmarshals the packet into a local server call. The client stub can marshal parameters from a language and machine representation different from server stub’s, thereby enabling interoperation. For example if client machine has a 32-bit architecture and server machine has a 64-bit architecture and server machine is to send an integer to client machine, this is the stub that is aware of the differences in data formats and does the conversion. All this process is invisible to the applications (Transparency). An RPC implementation also includes a run-time library which implements the protocol for message exchanges on a variety of network transports, enabling interoperation at that level. Note: There is a problem with transparency! Imagine a client and a server and a simulation application with a lot of computation on the server side and the visualization on the client side. Therefore server computes and sends the result to the client machine. Now imagine the server program includes three nested loops as follows: for i=1 to 100 for j=1 to 100 for k=1 to 100 RPC will be called a million times. Every time that RPC is called there are some operations that must be done, like opening connections, before the actual data is sent, depending on the RPC protocol. In this case a million times a connection will be opened and each time a few bytes of information will be sent. The ideal is opening the connection once and sending a lot of information. Since the RPC call is invisible to the programmer, the code might become very inefficient. The point is that although transparency is a very good feature, it allows programmers to implement inefficient systems. o Distributed Object Middleware This is a refinement of RPC category. RPC category is procedural oriented and languages like C, Pascal and Modula are involved. Distributed Object Middleware is an evolution of RPC with object oriented technology. It provides the abstraction of an object that is remote but its methods can be invoked just like those of an object in the same address space as the caller. CORBA (Common Object Request Broker Architecture) is a standard for distributed object computing. It is part of OMA (Object Management Architecture), developed by OMG (Object Management Group). CORBA is considered by most experts to be the most advanced kind of middleware commercially available, and the most faithful to classical object oriented programming principles. Figure 2. DCOM (Distributed Component Object Model) is a distributed object technology from Microsoft that evolved from its OLE (Object Linking and Embedding) and COM (Component Object Model). Figure 3. Java has a facility called RMI (Remote Method Invocation) that is similar to the distributed object abstraction of CORBA and DCOM. RMI is a specification from Sun Microsystems that enables the programmer to create distributed Java-to-Java applications in which the methods of remote Java objects can be invoked from other Java virtual machines, possibly on different hosts. RMI provides heterogeneity across operating system and Java vendor, but not across language. Figure 4. Note: what is the advantage of object orientation? The advantage of object orientation can be viewed from user and application developer perspectives. Some of the advantages from user perspective are as follows: o o o Application objects are presented as objects that can be manipulated in a way that is similar to the manipulation of the real world objects. Common functionality in different applications that is realized by common shared objects, leading to a uniform and consistent user interface. Existing applications can be embedded in an objectoriented environment and object oriented technology does not make existing applications obsolete. Some of the advantages from application developer’s view are as follows: o o o Through encapsulation of object data, applications are built in a truly modular fashion. It is possible to build applications in an incremental way, preventing correctness during the development process. Cost and lead-time can be saved by making use of existing components. Encapsulation, Inheritance and polymorphism are features of object orientation. o MOM (Message Oriented Middleware) MOM provides the abstraction of a message queue that can be accessed across a network. In this system, information is passed in the form of a message from one program to one or more other programs. In MOM there are queues on client and server sides. These queues can be viewed as mailboxes, storage places or buffers that are managed by the MOM. Such queues can support persistency. Persistent queues store and keep the values and if the message arrives and system is not ready to get the message or request, the queue guarantees that the message will be available to the system. The issues with such queues are managing the information or messages in the queue (database issues), reliability i.e. messages will be delivered in order even if they arrive out of order. In systems with lots of request processing and computations, queues are very useful in order for the system not to get overflowed. Another kind of applications that can benefit from persistent queues are applications that deal with disconnected operations i.e. the system is not online all the time. It may be stalled or migrated or it may be a wireless device that can not receive the information in special situations. By having persistent queues, messages or requests are stored in the queue and whenever the system become available again, it can get the information from the queue. MOM provides temporal and spatial decoupling. NOTE: Persistency and transistency are two terms in programming languages with the following meanings: Persistency: Data units or objects life cycle is beyond the program life cycle. The program is executed completely but data units or objects related to it still exist. This becomes possible by using especial storage like queues for data units or objects. Transistency: Data units or objects do not exist after the program completely executed and just the result may be available. o TP Monitors (Transaction Processing Monitors) The main function of a TP monitor is to coordinate the flow of requests between terminals or other devices and application programs that can process these requests. A request is a message that asks the system to execute a transaction. The application that executes the transaction usually accesses resource managers, such as database and communications systems. A typical TP monitor includes functions for transaction management, transactional interprogram communications, queuing, and forms and menu management. In banking applications imagine if there is a crash or power outage during a withdrawal or deposit activity from or to an account. The system must rollback all the process and doesn’t change the account balance. In such systems the entire workload is one unit of work (transaction) either all of it or none of it gets done. TP monitors provide primitives, functionality and services to realize transactions in a distributed system. It can be said that the concept of middleware started from early TP monitors in OLTP and has evolved to today’s middleware concepts. There can be a combination of message queues and TP that combine the properties of message queue and TP and it is an evolution of each of the message queue and TP concepts. o Directory Services A directory is like a database, but tends to contain more descriptive, attribute-based information. The information in a directory is generally read much more often than it is written. As a consequence, directories don't usually implement the complicated transaction or rollback, schemes regular databases use for doing high-volume complex updates. Directory updates are typically simple, all-or-nothing changes, if they are allowed at all. Directories are tuned to give quick-response to high-volume lookup or search operations. They may have the ability to replicate information widely in order to increase availability and reliability, while reducing response time. When directory information is replicated, temporary inconsistencies between the replicas may be OK, as long as they get in sync eventually. DNS (Domain Name System) is a distributed database that resides in multiple machines on the Internet and is used to convert between names and addresses and provide email routing information. LDAP (Lightweight Directory Access Protocol) is a protocol for accessing online directory services. It runs over TCP/IP and it is based on entries. An entry is a collection of attributes that has a name, called a distinguished name (DN). The DN is used to refer to the entry unambiguously. Each of the entry's attributes has a type and one or more values. The types are typically mnemonic strings, like "cn" for common name, or "mail" for email address. The values depend on what type of attribute it is. For example, a mail attribute might contain the value "babs@umich.edu". A jpeg Photo attribute would contain a photograph in binary JPEG/JFIF format. In LDAP, directory entries are arranged in a hierarchical tree-like structure that reflects political, geographic and/or organizational boundaries. Entries representing countries appear at the top of the tree. Below them are entries representing states or national organizations. Below them might be entries representing people, organizational units, printers, documents, or just about anything else you can think of. In addition, LDAP allows you to control which attributes are required and allowed in an entry through the use of a special attribute called objectclass. The values of the objectclass attribute determine the schema rules the entry must obey. A trader service is like yellow pages. There is a language that can be used to access the trader that allows you to specify categories. For instance one can use a trader to publish a system that has 8 processors with 64-bit architecture and high speed interconnects and someone else is looking for such a machine. He/she can use the language of the trader to submit his/her query and the trader will return the object reference to the address of this machine. o Component Oriented Frameworks Application programming models fall into two categories. The classic model entails the creation of an application as a single standalone entity, whereas the component model allows the creation of an application as a set of reusable components. Perhaps the most significant recent development is components. Component-based middleware evolves beyond object-oriented software: You develop applications by gluing together off-theshelf components that may be supplied in binary form from a range of vendors. This type of development has been strongly influenced by Sun’s java Beans and Microsoft’s COM technologies. EJB (Enterprise Java Beans) is a specification created by Sun Microsystems that defines a framework for server-side Java components. EJB makes distributed object technology more accessible and easier to use by offering an abstraction level that is higher and therefore more efficient for software development. COM+ (Component Object Model) is the next generation of DCOM that greatly simplifies the programming of DCOM. COM refers to both a specification and implementation developed by Microsoft Corporation which provides a framework for integrating components. This framework supports interoperability and reusability of distributed objects by allowing developers to build systems by assembling reusable components from different vendors which communicate via COM. By applying COM to build systems of preexisting components, developers hope to reap benefits of maintainability and adaptability. COM defines an application-programming interface (API) to allow creation of components for use in integrating custom applications or to allow diverse components to interact. However, in order to interact, components must adhere to a binary structure specified by Microsoft. As long as components adhere to this binary structure, components written in different languages can interoperate. CCM (Corba Component Model) o Database Access Technology Mediators ODBC (Open Database Connectivity) was developed to create a single standard for database access in the Windows environment. JDBC is an API for database access. Java's JDBC API provides a shared language through which the applications can talk to database engines. It offers a set of interfaces that create a common point at which database applications and database engines can meet. o Application Servers Application servers are applications that realize most of the business logic. The concept of application servers argues that business logic should be in the thick client instead of the server. The advent of the PC made possible a dramatic paradigm shift from the monolithic architecture (the user interface, business logic, and data access functionality are all contained in one application) of mainframe-based applications. The client/server architecture was in many ways a revolution from the old way of doing things. Despite solving the problems with mainframe-based applications, however, client/server was not without faults. For example because database access functionality and business logic were often contained in the client component, any changes to the business logic, database access, or even the database itself often required the deployment of new client component to all the users of the application. Usually such changes would break earlier versions of the client components, resulting in a fragile application. The problems with the traditional client/server (two-tier client/server) were addressed by the multi-tier client/server architecture. Conceptually an application can have any number of tiers, but the most popular multi-tier architecture is three-tier, which partitions the system into three logical tiers: o o User interface layer Business rules layer o Database access layer Three-tier client/server enhances the two-tier client/server architecture by further insulating the client from changes in the rest of the application and hence creating a less fragile application. DBMS (Database Management System) Web Servers o References Course Lecture Middleware, David E. Bakken, Washington State University Managing Complexity: Middleware Explained, Andrew T. campbell, geoff Coulson, and Michael E. Kounavis Middleware: A model for Distributed System Services The Design of the TAO Real-Time Object Request Broker, Douglas C. Schmidt, David L. Levine, and Sumedh Mungee, Department of Computer Science, Washington University, January 18, 1999 CORBA in 14 Days, Jeremy Rosenberg Communication Networks Fundamental Concepts and Key Architectures, Leon-Garcia, and Widjaja Implementation Remote Procedure Calls, Andrew D. Birrell and Bruce Jay Nelson