EMTM 553: E-commerce Systems Lecture 4: Performance and Scalability Insup Lee Department of Computer and Information Science University of Pennsylvania lee@cis.upenn.edu www.cis.upenn.edu/~lee 12/15/00 EMTM 553 1 System Architecture Tier 1 Tier 2 Tier 3 Tier N DMS Client 12/15/00 Web Server Application Server EMTM 553 Database Server 2 Goals and Approaches • Internet-based E-Commerce has made system growth more rapid and dynamic. • Improving performance and reliability to provide – Higher throughput – Lower latency (i.e., response time) – Increase availability • Approaches – Scaling network and system infrastructure o How performance, redundancy, and reliability are related to scalability – Load balancing – Web caching 12/15/00 EMTM 553 3 Network Hardware Review • Firewall – Restricts traffic based on rules and can “protect” the internal network from intruders • Router – Directs traffic to a destination based on the “best” path; can communicate between subnets • Switch – Provides a fast connection between multiple servers on the same subnet • Load Balancer – Takes incoming requests for one “virtual” server and redirects them to multiple “real” servers SOURCE: SCIENT 12/15/00 EMTM 553 4 Case Study: Consumer Retail eBusiness Internet Data Circuit Router Switch Web Server Database Server SOURCE: SCIENT 12/15/00 EMTM 553 5 Scaling the existing network Internet Data Circuit Router Switch Web Server Web Server Web Server Web Server Web Server Web Server Database Database Server Server SOURCE: SCIENT 12/15/00 EMTM 553 6 Initial Redesign Internet FIREWALL ROUTER LOAD BALANCER SWITCH VLAN2 Firewall Router Load Balancer Catalyst VLAN2 WEB SERVERS SWITCH VLAN3 APPLICATION FIREWALL SWITCH VLAN4 Catalyst VLAN3 Firewall Catalyst VLAN4 DATABASE SERVERS SOURCE: SCIENT 12/15/00 EMTM 553 7 The Unforseen Redesign: Internet FIREWALL ROUTERS SWITCHES VLAN1 ary n Prim ectio nn Co Firewall Router Primary Red u Con ndant nec tion Firewall Router Backup Catalyst 1 VLAN1 Catalyst 2 VLAN1 LOAD BALANCERS Load Balancer Load Balancer SWITCHES VLAN2 Catalyst 1 VLAN2 Catalyst 2 VLAN2 Catalyst 1 VLAN3 Catalyst 2 VLAN3 WEB SERVERS SWITCHES VLAN3 FIREWALLS Primary FW SWITCHES VLAN4 Catalyst 1 VLAN4 Backup FW Catalyst 2 VLAN4 DATABASE SERVERS SOURCE: SCIENT 12/15/00 EMTM 553 8 How to get rid of the last SPOF: Internet nn ecti o n d Re da un nt e nn Co ctio n Firewall Router Backup Co Firewall Router Primary Redund ary California n nectio ant Co nnectio n New York Prim y Con Primar Firewall Router Primary Firewall Router Backup Catalyst 1 VLAN1 Catalyst 2 VLAN1 Catalyst 1 VLAN1 Catalyst 2 VLAN1 Load Balancer Load Balancer Load Balancer Load Balancer Catalyst 1 VLAN2 Catalyst 2 VLAN2 Catalyst 1 VLAN2 Catalyst 2 VLAN2 Catalyst 1 VLAN3 Catalyst 2 VLAN3 Catalyst 1 VLAN3 Catalyst 2 VLAN3 Primary FW Catalyst 1 VLAN4 Backup FW Catalyst 2 VLAN4 Primary FW Catalyst 1 VLAN4 Backup FW Catalyst 2 VLAN4 SOURCE: SCIENT 12/15/00 EMTM 553 9 Scaling Servers: Two Approaches • Multiple smaller servers – Add more servers to scale – Most commonly done with web servers • Fewer larger servers to add more internal resources – Add more processors, memory, and disk space – Most commonly done with database servers SOURCE: SCIENT 12/15/00 EMTM 553 10 Where to Apply Scalability • To the network • To individual servers • Make sure the network has capacity before scaling by adding servers SOURCE: SCIENT 12/15/00 EMTM 553 11 Performance, Redundancy, and Scalability • People scale because they want better performance • But a fast site that goes down because of one component is a Bad Thing • Incorporate all three into your design in the beginning more difficult to do after the eBusiness is live SOURCE: SCIENT 12/15/00 EMTM 553 12 Lessons Learned • Don’t take the network for granted. • Most people don’t want to think about the network - they just want it to work. • You will never know everything up front so plan for changes. • Warning: as you add redundancy and scalability the design becomes complicated. SOURCE: SCIENT 12/15/00 EMTM 553 13 Approaches to Scalability • Application Service Providers (sites) grow by – scale up: replacing servers with larger servers – scale out: adding extra servers • Approaches – – – – – Farming Cloning RACS Partitioning RAPS • Load balancing • Web caching 12/15/00 EMTM 553 14 Farming • Farm - the collection of all the servers, applications, and data at a particular site. – Farms have many specialized services (i.e., directory, security, http, mail, database, etc.) 12/15/00 EMTM 553 15 Cloning • A service can be cloned on many replica nodes, each having the same software and data. • Cloning offers both scalability and availability. – If one is overloaded, a load-balancing system can be used to allocate the work among the duplicates. – If one fails, the other can continue to offer service. 12/15/00 EMTM 553 16 Two Clone Design Styles •Shared Nothing is simpler to implement and scales IO bandwidth as the site grows. •Shared Disc design is more economical for large or update-intensive databases. 12/15/00 EMTM 553 17 RACS • RACS (Reliable Array of Cloned Services) – a collection of clones for a particular service – shared-nothing RACS o each clone duplicates the storage locally o updates should be applied to all clone’ s storage – shared-disk RACS (cluster) o all the clones share a common storage manager o storage server should be fault-tolerant o subtle algorithms need to manage updates (cache invalidation, lock managers, etc.) 12/15/00 EMTM 553 18 Clones and RACS • can be used for read-mostly applications with low consistency requirements. – i.e., Web servers, file servers, security servers… • the requirements of cloned services: – – – – 12/15/00 automatic replication of software and data to new clones automatic request routing to load balance the work route around failures recognize repaired and new nodes EMTM 553 19 Partitions and Packs •Data Objects (mailboxes, database records, business objects,…) are partitioned among storage and server nodes. •For availability, the storage elements may be served by a pack of servers. 12/15/00 EMTM 553 20 Partition • grows a service by – duplicating the hardware and software – dividing the data among the nodes (by object), e.g., mail servers by mailboxes • should be transparent to the application – requests to a partitioned service are routed to the partition with the relevant data • does not improve availability – the data is stored in only one place - partitions are implemented as a pack of two or more nodes that provide access to the storage 12/15/00 EMTM 553 21 Taxonomy of Scaleability Designs 12/15/00 EMTM 553 22 RAPS • RAPS (Reliable Array of Partitioned Services) – nodes that support a packed-partitioned service – shared-nothing RAPS, shared-disk RAPS • Update-intensive and large database applications are better served by routing requests to servers dedicated to serving a partition of the data (RAPS). 12/15/00 EMTM 553 23 Performance and Summary • What is the right building block for a site? – IBM mainframe (OS390): highly available, .99999 up (less than 5 mins outage per year) – Sun UE1000 – PC servers – Homogenous site is easier to manage (all NT, all FreeBSD, Solaris, OS390) • Summary, – For scalability, replicate. – Shared-nothing clones (RACS) – For databases or update-intensive services, packedpartition (RAPS) 12/15/00 EMTM 553 24 Load Sharing 12/15/00 EMTM 553 25 Load Sharing • What is the problem? – Too much load – Need to replicate servers • Why do we need to balance load? 12/15/00 EMTM 553 26 Load Sharing Strategies • Flat architecture – DNS rotation, switch based, MagicRouter • Hierarchical architecture • Locality-Aware Request Distribution 12/15/00 EMTM 553 27 DNS Rotation - Round Robin Cluster 12/15/00 EMTM 553 28 Flat Architecture - DNS Rotation • DNS rotates IP addresses of a Web site • Pros: • • • • – treat all nodes equally – A simple clustering strategy – Client-side IP caching: load imbalance, connection to down node – expensive, inefficient – – – – Cisco, Foundry Networks, and F5Labs Cluster servers by one IP Distribute workload (load balancing) Failure detection – Not sufficient for dynamic content Cons: Hot-standby machine (failover) Switching products Problem 12/15/00 EMTM 553 29 Switch-based Cluster 12/15/00 EMTM 553 30 Flat Architecture - Switch Based • Switching products – Cluster servers by one IP – Distribute workload (load balancing) o typically round-robin – Failure detection – Cisco, Foundry Networks, and F5Labs • Problem – Not sufficient for dynamic content 12/15/00 EMTM 553 31 Various Architectures for Distributed Web Servers Berkley MagicRouter ‘Rewrite’ packets Emphasis: Fast Packet Interposing 12/15/00 EMTM 553 32 Flat Architectures in General • Problems – Not sufficient for dynamic content – Adding/Removing nodes is difficult o Manual configuration required – limited load balancing in switch 12/15/00 EMTM 553 33 Hierarchical Architecture • Master/slave architecture • Two levels – Level I o Master: static and dynamic content – Level II o Slave: only dynamic 12/15/00 EMTM 553 34 Hierarchical Architecture M/S Architecture 12/15/00 EMTM 553 35 Hierarchical Architecture 12/15/00 EMTM 553 36 Hierarchical Architecture • Benefits – Better failover support o Master restarts job if a slave fails – Separate dynamic and static content o resource intensive jobs (CGI scripts) runs by slave 12/15/00 EMTM 553 37 Locality-Aware Request Distribution • Content-based distribution – Improved hit rates – Increased secondary storage – Specialized back end servers • Architecture – Front-end o distributes request – Back-end o process request 12/15/00 EMTM 553 38 Locality-Aware Request Distribution Naïve Strategy 12/15/00 EMTM 553 39 Various Architectures for Distributed Web Servers 12/15/00 EMTM 553 40 Web Caching 12/15/00 EMTM 553 41 World Wide Web • • • • A large distributed information system Inexpensive and faster accesses to information Rapid growth of WWW (15% per month) Web performance and scalability issues – network congestion – server overloading 12/15/00 EMTM 553 42 Web Architecture • Client (browser), Web server Web server Client (browser) 12/15/00 EMTM 553 43 Web Proxy • Intermediate between clients and Web servers • It is used to implement firewall • To improve performance, caching can be placed Client (browser) 12/15/00 Web server EMTM 553 44 Web Architecture • Client (browser), Proxy, Web server Web server Firewall Proxy Client (browser) 12/15/00 EMTM 553 45 Web Caching System • Caching popular objects is one way to improve Web performance. • Web caching at clients, proxies, and servers. Web server Proxy Client (browser) 12/15/00 EMTM 553 46 Advantages of Web Caching • Reduces bandwidth consumption (decrease network traffic) • Reduces access latency in the case of cache hit • Reduces the workload of the Web server • Enhances the robustness of the Web service • Usage history collected by Proxy cache can be used to determine the usage patterns and allow the use of different cache replacement and prefetching policies. 12/15/00 EMTM 553 47 Disadvantages of Web Caching • Stale data can be serviced due to the lack of proper updating • Latency may increase in the case of a cache miss • A single proxy cache is always a bottleneck. • A single proxy is a single point of failure • Client-side and proxy cache reduces the hits on the original server. 12/15/00 EMTM 553 48 Web Caching Issues • • • • Cache replacement Prefetching Cache coherency Dynamic data caching 12/15/00 EMTM 553 49 Cache Replacement • Characteristics of Web objects – different size, accessing cost, access pattern. • Traditional replacement policies do not work well – LRU (Least Recently Used), LFU (Least Frequently Used), FIFO (First In First Out), etc • There are replacement policies for Web objects: – key-based – cost-based 12/15/00 EMTM 553 50 Two Replacement Schemes • Key-based replacement policies: – Size: evicts the largest objects – LRU-MIN: evicts the least recently used object among ones with largest log(size) – Lowest Latency First: evicts the object with the lowest download latency • Cost-based replacement policies – Cost function of factors such as last access time, cache entry time, transfer time cost, and so on – Least Normalized Cost Replacement: based on the access frequency, the transfer time cost and the size. – Server-assisted scheme: based on fetching cost, size, next request time, and cache prices during request intervals. 12/15/00 EMTM 553 51 Prefetching • The benefit from caching is limited. – Maximum cache hit rate - no more than 40-50% – to increase hit rate, anticipate future document requests and prefetch the documents in caches • documents to prefetch – considered as popular at servers – predicted to be accessed by user soon, based on the access pattern • It can reduce client latency at the expense of increasing the network traffic. 12/15/00 EMTM 553 52 Cache Coherence • Cache may provide users with stale documents. • HTTP commands for cache coherence – GET : retrieves a document given its URL – Conditional GET: GET combined with the header IFModified-Since. – Progma: no-cache : this header indicate that the object be reloaded from the server. – Last-Modified : returned with every GET message and indicate the last modification time of the document. • Two possible semantics – Strong cache consistency – Weak cache consistency 12/15/00 EMTM 553 53 Strong cache consistency • Client validation (polling-every-time) – sends an IF-Modified-Since header with each access of the resources – server responses with a Not Modified message if the resource does not change • Server invalidation – whenever a resource changes, the server sends invalidation to all clients that potentially cached the resource. – Server should keep track of clients to use. – Server may send invalidation to clients who are no longer caching the resource. 12/15/00 EMTM 553 54 Weak Cache Consistency – Adaptive TTL (time-to-live) o adjust a TTL based on a lifetime (age) - if a file has not been modified for a long time, it tends to stay unchanged. o This approach can be shown to keep the probability of stale documents within reasonable bounds ( < 5%). o Most proxy servers use this mechanism. o No strong guarantee as to document staleness – Piggyback Invalidation o Piggyback Cache Validation (PCV) - whenever a client communicates with a server, it piggybacks a list of cached, but potentially stale, resources from that server for validation. o Piggyback Server Invalidation (PSI) - a server piggybacks on a reply to a client, the list of resources that have changed since the last access by the client. o If access intervals are small, then the PSI is good. But, if the gaps are long, then the PCV is good. 12/15/00 EMTM 553 55 Dynamic Data Caching • Non-cacheable data – authenticated data, server dynamically generated data, etc. – how to make more data cacheable – how to reduce the latency to access non-cacheable data • Active Cache – allows servers to supply cache applets to be attached with documents. – the cache applets are invoked upon cache hits to finish necessary processing without contacting the server. – bandwidth savings at the expense of CPU costs – due to significant CPU overhead, user access latencies are much larger than without caching dynamic objects. 12/15/00 EMTM 553 56 Dynamic Data Caching • Web server accelerator – resides in front of one or more Web servers – provides an API which allows applications to explicitly add, delete, and update cached data. – The API allows static/dynamic data to be cached. – An example - the official Web site for the 1998 Olympic Winter Games o whenever new content became available, updated Web reflecting these changes were made available within seconds. o Data Update Propagation (DUP, IBM Watson) is used for improving performance. 12/15/00 EMTM 553 57 Dynamic Data Caching • Data Update Propagation (DUP) – maintains data dependence information between cached objects and the underlying data which affect their values – upon any change to underlying data, determines which cached objects are affected by the change. – Such affected cached objects are then either invalidated or updated. – With DUP, about 100% cache hit rate at the 1998 Olympic Winter Games official Web site. – Without DUP, 80% cache hit rate at the 1996 Olympic Games official Web site. 12/15/00 EMTM 553 58 Q&A 12/15/00 EMTM 553 59