Distributed Bingo Team 5: Jack Yao Bubba Beasley

advertisement
Distributed Bingo
17-654: Analysis of Software Artifacts
18-841: Dependability Analysis of Middleware
Team 5:
Jack Yao
Bubba Beasley
Kai Liao
Alex Berendeyev
http://www.ece.cmu.edu/~ece846/team5
Team Members
Kai Liao
Bubba Beasley
Alex Berendeyev
2
Real-World Demonstration
3
Real-World Characteristics
¾ Each player gets a Bingo card to start
¾ A player joining mid-game can catch up with
knowledge of previous draws
¾ The host announces each draw
¾ The winning player announces Bingo
¾ The host verifies the win
¾ The host announces the win
4
Baseline Application
¾ A distributed, online version of Bingo
¾ The clients pull data from the servers and the servers
push data to the clients
¾ DB: SQL Server 2000, Windows XP, MSE Cave machine
Servers: JBoss (JNDI, JMS) on Linux, ECE Game cluster
Clients: Windows, Linux (theoretically anywhere)
Automated command-line client
Interactive GUI-based client
5
Publish/Subscribe with JMS
Server
Server
JMS
JMS
Client
Client
Client
Key
JMS Connection
Server
Client
JMS
6
Baseline Architecture
BingoClient
Join()
GetSnapshot()
DeclareBingo()
BingoServer
HS
AS
JNDI
JMS
DBServer
Key
Remote Method Call
JMS Connection
DB Call
Server
Client
DBServer
7
From the Real to the App
¾ Each player gets a Bingo card to start
¾ A player joining mid-game can catch up with
knowledge of previous draws
The Client asks the AnsweringServer to join and receives a
bingo card and all previous draws.
¾ The host announces each draw
At regular intervals (5 seconds), the HostServer broadcasts
the draws via the JMS.
8
From the Real to the App
¾The winning player announces Bingo
¾ The host verifies the win
¾ The host announces the win
The Client asks the AnsweringServer to verify Bingo
The AnsweringServer verifies and stores the winner's ID in
the database
The HostServer checks for a winner in the DB and
broadcasts that there is a win
9
GUI Interface
Java GUI on top of a Java
command-line interface
10
Miscellany
¾ Each game: 100 draws, rather than 75
¾ Approximately 1.4 x 1030 card combinations
(1.4 billion trillion trillion cards)
¾ Duplicate cards are not a problem, so theoretically
no limit on the number of players in a single game
¾ No guarantee of fairness in declaring a winner
11
Fault Tolerance Goals (1)
Server Faults
“Sacred Machine” Assumptions
• JBoss Process crashes
• Replication Manager
• Machine crashes
• Global Fault Detector
Network Faults
• DB Server
Replicas
N replicas
Tested with 3 replicas
Round-robin recovery of JBoss servers
12
FT Baseline Architecture
One Connection
to the primary
JMS Server
BingoClient
BingoClient
BingoClient
RepMan
FD (HealthChecker class)
One JBoss
server up
BingoServer
BingoServer
JBoss Servers
on replicas are
not running
HS
AS
JNDI
JMS
Key
Remote Method Call
JMS Connection
DB Call
SSH Connection
DBServer
Server
Client
DBServer
13
Baseline
Fail-Over Measurements
FT Baseline Fail-Over
Slow Server
90
80
Time (seconds)
Second
70
60
Fast Server
50
FD
SN
40
30
20
10
0
0
50
100
150
200
Number of faults injected
14
Baseline Fail-Over Measurements
FT Baseline Client
Time B/w Messages
120
Slow Server
100
Second
Time (seconds)
80
60
Fast Server
40
20
0
0
1000
2000
3000
4000
5000
Number of faults injected
15
RT-FT Optimized 1 Architecture
BingoClient
BingoClient
BingoClient
Pre-established
Connections to
all JMS Servers
RepMan
FD (HealthChecker class)
All JBoss
servers on
replicas are up
BingoServer
BingoServer
HS
AS
JNDI
JMS
Key
Remote Method Call
JMS Connection
DB Call
SSH Connection
DBServer
Server
Client
DBServer
16
RT Optimized 1 Fail-Over Measurements
RT Optimization 1 Fail-Over
30
Slow Server
25
Second
Time (seconds)
20
FD
SN
15
10
Fast Server
5
0
0
10
20
30
40
50
60
70
80
90
100
Number of faults injected
17
RT-FT Optimized 2 Architecture
BingoClient
BingoClient
BingoClient
RM__FD
BingoServer
BingoServer
FD runs as a
daemon
HS
AS
JNDI
JMS
Key
Remote Method Call
JMS Connection
DB Call
SSH Connection
DBServer
Server
Client
DBServer
18
RT Optimized 2 Fail-Over Measurements
RT Optimization 2 Fail-Over (Repman=1s, HealthCheckerDaemon=0.5s)
5
4.5
4
Time (seconds)
Second
3.5
3
FD
SN
2.5
2
1.5
1
0.5
0
0
5
10
15
20
25
30
Number of faults injected
19
RT-FT Optimized 3 Architecture
BingoClient
BingoClient
BingoClient
RM__FD
BingoServer
BingoServer
FD runs as a
daemon
+
Local FD
Local FD
HS
AS
JNDI
JMS
Key
Remote Method Call
JMS Connection
DB Call
SSH Connection
DBServer
Server
Client
DBServer
20
RT Optimized 3 Fail-Over Measurements
RT Optimization 3 Fail-Over (Repman=0.1s, Local Checker=0.4s)
3.5
3
Time (seconds)
Second
2.5
2
FD
SN
1.5
1
0.5
0
0
5
10
15
20
25
30
Number of faults injected
35
40
45
21
Measurements Insights (1)
6
4
2
FD (Second)
8
0
0.1
0.4
0.7
1.0
1.3
1.6
1.9
Repman Period
(Second)
0.4
(MAX)
0.4
(AVG)
0.4
(MIN)
0.7
(MAX)
0.7
(AVG)
0.7
(MIN)
1.0
(MAX)
1.0
(AVG)
1.0
(MIN)
Local Checker Period (Second)
22
Measurements Insights (2)
Average Fault Detection Time Comparison
Real Time Tuning:
•Repman period
•Local FD period
•Broadcast period
0.7
1
1.3
Repman Period (Second)
Repman Period (Second)
0.4 (AVG)
0.7 (AVG)
1.6
1.9
1.0 (A V G )
0.4
0.7 (A V G )
0.1
0.4 (A V G )
FD=3.1s
Local FD Period (Second)
Local FD=0.7s
Repman=1.0s
Local FD=1.0s
Repman=0.1s
Local FD
Period
(Second)
1.0 (AVG)
23
Lessons Learned
• Potentially Publish/Subscribe (JMS) can hide
server errors from clients
Typical Pull Architecture
FD
Start New
Replica
Our Push Architecture
FD
Start New
Replica
Reconnection
• JBoss Server should be run in the minimal configuration.
(default configurations are not suited for RT)
24
Questions?
25
Download