Team 1: Box Office 17-654: Analysis of Software Artifacts 18-846: Dependability Analysis of Middleware JunSuk Oh, YounBok Lee, KwangChun Lee, SoYoung Kim, JungHee Jo Electrical &Computer ENGINEERING Team Members JunSuk Oh YounBok Lee KwangChun Lee SoYoung Kim JungHee Jo http://www.ece.cmu.edu/~ece846/team1/index.html 2 Baseline Application • System description – Box Office is a system for users to search movie tickets and reserve tickets • Base Features – A user can login – A user can search movies – A user can reserve tickets • Configuration – Operating System • Server: Windows 2000 Server, Windows XP Professional • Client: Windows XP Professional – Language • Java SDK 1.4.2 – Middleware • Enterprise Java Beans – Third-party Software • • • • Database: MySQL Web Application Server: Jboss Java-IDE: Eclipse, Netbean J2EE Eclipse Plug-in: Lomboz 3 Baseline Application - Configuration Selection Criteria • Operating System – Easier to set up the development environment than Linux Cluster – Easier to handle by ourselves • JBoss – Environment is supported by teaching assistants • EJB – Popular technology in the industry, members’ preference • MySQL – Easy to install and use – Easy to get the developing document • Eclipse – All team members have experience in this technology • Lomboz – Enables Java developers to build, test and deploy using J2EE 4 Baseline Architecture Client Tier Middle Tier DB Tier Entity Beans JNDI cardinfo login movie DataBase reserv session user Client Pool Session Bean DB JNDI Lookup RPC Client Entity Bean Table DB Access 5 Fault-Tolerance Goals • Replication Style – Passive Replication • Approach – Replication • 2 replicas are located on separate machines – Sacred Components • Replication manager • Database • Client – Fault Detector • Client – State • All beans are stateless • States are stored in the database 6 FT-Baseline Architecture Client side (Sacred) Fault Tolerant Sacred Machine 1 Primary Replica Client 1 Client 2 Client n JNDI Database Factory Machine 2 Backup Replica JNDI Replication Manager Factory 7 Mechanisms for Fail-Over (1) • Fault Injector – Periodically, fault injector kills replica in turn (1 min) • Replication manager – 10 seconds after server fails, Replication Manager invokes factory to relaunch the failed replica . • Fail-over mechanism – – – – Fault detection Replica location Connection establishment Retry Fail-Over Mechanism Replication Manager Fault Injector Replicate Inject fault Factory Factory Primary Replica 1. Request Backup Replica 3. Connection established 2. Server Failed 4. Retry Client 8 Mechanisms for Fail-Over (2) • Fault Detection – Exception handling by Client • RemoteException – NoSuchObjectException, ConnectException (RMI) • NameNotFoundException (JNDI Failure) • Replica location – Client knows the servers from whom it should request service • Connection establishment – Get a connection to new replica – Server reference should be looked up: • When client request the service for the first time • When client detects server failure and try request to other server – Client retries the request to backup replica until service becomes available • Retry – Request service again 9 Failover Mechanism (3) - Avoid Duplicate Transaction • Target case – Transaction is stored in the DB but it cannot be informed to client 1. Service request 2. Store to DB Server Client 3. Return result 4. Inform client • Database Mechanism Replica 1 Client Database Replica 2 10 Fail-Over Measurements – – Round Trip Time in Failover (14 Fault Injections) High Peak: RemoteException Low Peak: NameNotFoundException 10000 RTT (ms) 1000 200 100 10 0 10 20 30 40 50 60 70 80 90 100 # of Invocations 11 Fail-Over Measurements Decomposition of RTT in Failover (Low Peaks) FD, 16ms , 7% Retry, 116ms , 55% Decomposition of RTT in Failover (High Peaks) CE, 93ms , 1% Retry, 123ms , 2% CE, 82ms , 38% FD, 7661 , 97% FD: Fault Detect CE: Connection Establishment 12 RT-FT-Baseline Architecture • Two steps to the Optimization – Step 1: Reduce the connection establishment time • Client needs to reconnect to available replica after fault detection • Pre-established connection: Connector on the client side will maintain the connection to each replica in the background ► Reconnection time disappeared but still graph shows spikes due to the time for catching connection exception – Step 2: Reduce the fault detection time • Reducing the catching exception time – RemoteException – NoSuchObjectException, ConnectException • Having fault detector on client side • Fault detector will update the status of replicas periodically. • Clients can know the status of replicas beforehand. ► Getting rid of fault detection time as well as spikes!! 13 RT-FT-Baseline Architecture statusServer1 statusServer2 checking Client update Replica 1 Local FD Connector Replica 2 : Establishing connection as background : Pinging for checking status 14 Bounded “Real-Time” Fail-Over Measurements • Fail-over graphs after optimization step1 200 15 Bounded “Real-Time” Fail-Over Measurements • Fail-over graphs after optimization step2 200 16 Analysis on Fail-over Optimization Failover Optimization (Low Peaks) Failover Optimization (High Peaks) FD, 16ms , 7% Retry, 116ms , 55% CE, 93ms , 1% Retry, 123ms , 2% CE, 82ms , 38% :Reduced part FD, 7661 , 97% Low Peaks FD CE Retry 7661ms 93ms 123ms After 0ms 0ms 104ms FD:optimization Fault Detect CE: Connection Establishment 0ms 0ms 104ms Reduction 100% 100% 15.45% Before optimization 16ms 100% CE High Peaks 82ms 100% Retry 116ms 10.34% FD 17 High Performance: Load Balancing • Distributed clients’ requests among multiple servers • Having separate load balancer to control the access to the servers • Strategy – Static load balancing • Round Robin way • Assign server in turns – Dynamic load balancing • Load balancer periodically checks the current number of client of each server • Dynamically assign the server to each client – Simulation strategy • • • • • Measurements on the actual server A&B RTT Move to the SIMULATION environment Find the working load balancing strategy Confirm the load balancing strategy in the actual environment Find alternative load balancing strategies 18 Load Balancing Strategy Strategy 1 (Round Robin) Replica A Replica B Strategy 2 (Check for # of clients) Replica A Replica B 2. How many Clients? 3. Two Client 1 Load Balancer 5. Server B 4. Which Server? Load Balancer 3. Ten Client 2 … Client N Client 1 Client 2 … Client N 19 Performance Measurements Load Balance Test - RTT of aClient 1Client 1600 replica Single Server 1400 Load Balance1 (Round Robin) Load Balance2 RTT (ms) 1200 1000 800 600 400 200 0 0 10 20 30 # of Clients 40 50 20 Load Balancing Strategy • Load balancing strategy by using historical data and simulation systems • Testing load balancing strategy under the simulation environment • Predict load balancing strategy performance Server A 60 80 100 120 100 80 60 40 Client 50 : Sample50,1 , Sample50, 2 , Sample50,n Min Max Load Balancing Algorithm 40 Client 2 : Sample2,1 , Sample2, 2 , Sample2,n Min Max Load Balancing 120 Random Load Balancing Client1 : Sample1,1 , Sample1, 2 , Sample1,n Client # 4 Histogram of islands 40 30 20 Frequency 0 0 5000 10000 15000 0 0 0 0.08 0.10 0.06 11 0.02 0.04 Density 19 5 3 2 1 0 0 0.00 20 40 60 80 sqrt(islands) 100 120 140 1 15000 islands 25 30 35 Frequency 1 Histogram of sqrt(islands) 10 15 20 0 Load Balancing Algorithm Development 1 10000 islands 0 Server B 1 5000 Histogram of sqrt(islands) 5 Client 50 : Sample50,1 , Sample50, 2 , Sample50,n 1 0 2 Client 2 : Sample2,1 , Sample2, 2 , Sample2,n Data Collection 41 10 20 Frequency 30 40 Histogram of islands 10 Client1 : Sample1,1 , Sample1, 2 , Sample1,n Round Robin Algorithm Client # 4 0 20 40 2 60 3 80 2 100 120 140 sqrt(islands) Algorithm Performance Prediction 21 More on Strategy Consider X clients Client1 : Sample1,1 , Sample1, 2 , Sample1,n Client 2 : Sample2,1 , Sample2, 2 , Sample2,n Min Max Algorithm Random Client 50 : Sample50,1 , Sample50, 2 , Sample50,nSamples Server A ALLOCATE Repeat 1000 Y clients from Server A Average Y Clients RTT Average X-Y Clients RTT Average RTT Comparison of Load Balancing Strategy 800 Random Samples Server B Client 50 : Sample50,1 , Sample50, 2 , Sample50,n 200 400 Average RTT Client 2 : Sample2,1 , Sample2, 2 , Sample2,n 600 Client1 : Sample1,1 , Sample1, 2 , Sample1,n X-Y clients from Server B 10 20 30 40 50 Client # 22 Server A & B Performance Measurements (RTT) 1500 0 500 1000 Server B RTT[ms] 1000 500 0 Server A RTT[ms] 1500 2000 Server B 2000 Server A 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Client # 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Client # 23 Performance Measurements (II) 2000 1000 Server B RTT[ms] 500 0 0 0 500 500 1000 Server A RTT[ms] 1500 2000 Server B 1500 2000 1500 1000 Server B RTT[ms] 1500 1000 500 0 1 7 14 22 30 38 1 7 14 22 30 38 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Client # 46 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 Comparison of Load Balancing Strategy Client # Comparison of Load Balancing Strategy Client # 600 400 Min Max Loading Balancing Average RTT 400 Random Load Balancing 300 LP Load Balancing 10 20 30 Client # 40 50 200 Average RTT 500 800 600 Client # 46 200 1 100 Server A RTT[ms] Server A Server B 2000 Server A 10 20 30 Client 40 50 24 Other Features Server B Server A Experimental Data from Server Algorithm Testing With Empirical Data & Parameter Updates Histogram of islands 30 10 20 Frequency 30 20 Frequency 120 100 10 120 100 Load Balancer Intelligence Update Histogram of islands 1 1 1 1 0 0 2 0 5000 10000 15000 0 5000 10000 0 0 1 15000 islands 80 islands 80 Server B 41 40 Min Max Load Balancing 40 Random Load Balancing 0.08 0.10 0.06 0.02 0.04 Density 20 40 60 80 sqrt(islands) Client # 4 19 11 5 3 2 1 0 0 0.00 0 0 Client # 4 Histogram of sqrt(islands) 25 30 35 Frequency 10 15 20 60 40 40 60 Histogram of sqrt(islands) 5 Server A 100 120 140 0 20 40 2 60 3 80 Client Client 2 100 120 140 sqrt(islands) 25 Insights from Measurements • FT – Two different types of peak were measured according to different exception. • RT-FT – Connection Establishment time was removed • Pre-connection before the failover. • But, still high peak remained. – Fault Detection time was removed • Watchdog before catching exception. • RT-FT Performance – Round Robin is good for our situation • Servers have similar capacity. – Load balancing algorithm can be selected considering running environment • Test Environment – Keep clean environment to reduce jitter. 26 What we learned & accomplished • What we learned? – How to handle JBoss • First experience for majority of team members – Careful analysis of the test result definitely save the time – How to control the factors to get the better data • What we accomplished? – FT • Passive replication strategy • Avoid duplicate transaction – RT-FT • Pre-established connection strategy • Local Fault Detector for checking status of server beforehand – Performance • Implement Static Load Balancing • Implement Dynamic Load Balancing • Simulate several load balancing strategy 27 Open Issues & Future Challenge • Open Issue – FindAll() doesn’t work on Jboss on Linux • It works well on Windows OS – Implementing several load balancing strategy • Min Max, LP (Linear Programming) algorithm • Future Challenge – – – – Separate JNDI service Get server list from Replication Manager dynamically Try Active Replication Try development without IDE tool 28