Distributed Bingo 17-654: Analysis of Software Artifacts 18-841: Dependability Analysis of Middleware Team 5: Jack Yao Bubba Beasley Kai Liao Alex Berendeyev http://www.ece.cmu.edu/~ece846/team5 Team Members Kai Liao Bubba Beasley Alex Berendeyev 2 Real-World Demonstration 3 Real-World Characteristics ¾ Each player gets a Bingo card to start ¾ A player joining mid-game can catch up with knowledge of previous draws ¾ The host announces each draw ¾ The winning player announces Bingo ¾ The host verifies the win ¾ The host announces the win 4 Baseline Application ¾ A distributed, online version of Bingo ¾ The clients pull data from the servers and the servers push data to the clients ¾ DB: SQL Server 2000, Windows XP, MSE Cave machine Servers: JBoss (JNDI, JMS) on Linux, ECE Game cluster Clients: Windows, Linux (theoretically anywhere) Automated command-line client Interactive GUI-based client 5 Publish/Subscribe with JMS Server Server JMS JMS Client Client Client Key JMS Connection Server Client JMS 6 Baseline Architecture BingoClient Join() GetSnapshot() DeclareBingo() BingoServer HS AS JNDI JMS DBServer Key Remote Method Call JMS Connection DB Call Server Client DBServer 7 From the Real to the App ¾ Each player gets a Bingo card to start ¾ A player joining mid-game can catch up with knowledge of previous draws The Client asks the AnsweringServer to join and receives a bingo card and all previous draws. ¾ The host announces each draw At regular intervals (5 seconds), the HostServer broadcasts the draws via the JMS. 8 From the Real to the App ¾The winning player announces Bingo ¾ The host verifies the win ¾ The host announces the win The Client asks the AnsweringServer to verify Bingo The AnsweringServer verifies and stores the winner's ID in the database The HostServer checks for a winner in the DB and broadcasts that there is a win 9 GUI Interface Java GUI on top of a Java command-line interface 10 Miscellany ¾ Each game: 100 draws, rather than 75 ¾ Approximately 1.4 x 1030 card combinations (1.4 billion trillion trillion cards) ¾ Duplicate cards are not a problem, so theoretically no limit on the number of players in a single game ¾ No guarantee of fairness in declaring a winner 11 Fault Tolerance Goals (1) Server Faults “Sacred Machine” Assumptions • JBoss Process crashes • Replication Manager • Machine crashes • Global Fault Detector Network Faults • DB Server Replicas N replicas Tested with 3 replicas Round-robin recovery of JBoss servers 12 FT Baseline Architecture One Connection to the primary JMS Server BingoClient BingoClient BingoClient RepMan FD (HealthChecker class) One JBoss server up BingoServer BingoServer JBoss Servers on replicas are not running HS AS JNDI JMS Key Remote Method Call JMS Connection DB Call SSH Connection DBServer Server Client DBServer 13 Baseline Fail-Over Measurements FT Baseline Fail-Over Slow Server 90 80 Time (seconds) Second 70 60 Fast Server 50 FD SN 40 30 20 10 0 0 50 100 150 200 Number of faults injected 14 Baseline Fail-Over Measurements FT Baseline Client Time B/w Messages 120 Slow Server 100 Second Time (seconds) 80 60 Fast Server 40 20 0 0 1000 2000 3000 4000 5000 Number of faults injected 15 RT-FT Optimized 1 Architecture BingoClient BingoClient BingoClient Pre-established Connections to all JMS Servers RepMan FD (HealthChecker class) All JBoss servers on replicas are up BingoServer BingoServer HS AS JNDI JMS Key Remote Method Call JMS Connection DB Call SSH Connection DBServer Server Client DBServer 16 RT Optimized 1 Fail-Over Measurements RT Optimization 1 Fail-Over 30 Slow Server 25 Second Time (seconds) 20 FD SN 15 10 Fast Server 5 0 0 10 20 30 40 50 60 70 80 90 100 Number of faults injected 17 RT-FT Optimized 2 Architecture BingoClient BingoClient BingoClient RM__FD BingoServer BingoServer FD runs as a daemon HS AS JNDI JMS Key Remote Method Call JMS Connection DB Call SSH Connection DBServer Server Client DBServer 18 RT Optimized 2 Fail-Over Measurements RT Optimization 2 Fail-Over (Repman=1s, HealthCheckerDaemon=0.5s) 5 4.5 4 Time (seconds) Second 3.5 3 FD SN 2.5 2 1.5 1 0.5 0 0 5 10 15 20 25 30 Number of faults injected 19 RT-FT Optimized 3 Architecture BingoClient BingoClient BingoClient RM__FD BingoServer BingoServer FD runs as a daemon + Local FD Local FD HS AS JNDI JMS Key Remote Method Call JMS Connection DB Call SSH Connection DBServer Server Client DBServer 20 RT Optimized 3 Fail-Over Measurements RT Optimization 3 Fail-Over (Repman=0.1s, Local Checker=0.4s) 3.5 3 Time (seconds) Second 2.5 2 FD SN 1.5 1 0.5 0 0 5 10 15 20 25 30 Number of faults injected 35 40 45 21 Measurements Insights (1) 6 4 2 FD (Second) 8 0 0.1 0.4 0.7 1.0 1.3 1.6 1.9 Repman Period (Second) 0.4 (MAX) 0.4 (AVG) 0.4 (MIN) 0.7 (MAX) 0.7 (AVG) 0.7 (MIN) 1.0 (MAX) 1.0 (AVG) 1.0 (MIN) Local Checker Period (Second) 22 Measurements Insights (2) Average Fault Detection Time Comparison Real Time Tuning: •Repman period •Local FD period •Broadcast period 0.7 1 1.3 Repman Period (Second) Repman Period (Second) 0.4 (AVG) 0.7 (AVG) 1.6 1.9 1.0 (A V G ) 0.4 0.7 (A V G ) 0.1 0.4 (A V G ) FD=3.1s Local FD Period (Second) Local FD=0.7s Repman=1.0s Local FD=1.0s Repman=0.1s Local FD Period (Second) 1.0 (AVG) 23 Lessons Learned • Potentially Publish/Subscribe (JMS) can hide server errors from clients Typical Pull Architecture FD Start New Replica Our Push Architecture FD Start New Replica Reconnection • JBoss Server should be run in the minimal configuration. (default configurations are not suited for RT) 24 Questions? 25