Team 4: IJAMS ENGINEERING Team 4 Members &

advertisement
Team 4: IJAMS
17-654: Analysis of Software Artifacts
18-846: Dependability Analysis of Middleware
IJAMS
(Integrated Job Applicants Management System)
Daehyun Jung // Inhong Kim // Jungjin Suh
Hyoungiel Park // Huiyeong Choi
Electrical &Computer
ENGINEERING
Team 4 Members
dj
un
g
DaeHyun Jung
@
an
dr
ew
.cm
u.
ed
u
ihk@andrew.cmu.edu
hip@andrew.cmu.edu
Hyoungiel Park
InHong Kim
an
dr
ew
.cm
u.
ed
u
h@
jjs
u
JungJun Suh
Huiyeong Choi
hchoi1@andrew.cmu.edu
Team 4 Homepage:
http://www.ece.cmu.edu/~ece846/team4/index.html
2
1
Application Overview
‹
What does the application do?
• Integrated Job Applicants Management System (IJAMS)
• IJAMS connects registered headhunting agencies to integrated HR database
‹
What makes it interesting?
• LDAP (Lightweight Directory Access Protocol) Server as Database
• Heavy data transmission per transaction
( 100 applicants entries searched for query)
• Remote LDAP (in Korea) and Local LDAP (in MSE Cave)
‹
LDAP & RDBMS
• Reading time is faster than RDBMS
• Less resources are used (memory resource)
• Connection to external DB is easier.
• http://www.openldap.org/
3
Development Environment
‹ Middleware : CORBA 2.3.1 (embedded in JDK 1.4.2, idlj 3.1)
Light-weight (No administrative privilege required for handling server,
less time is taken for restart and less resource consumed at runtime
compared to EJB server)
‹ Language : Java 1.4.2
‹ API : Netscape Directory SDK 4.0 for Java
‹ Platform :Linux
ECE cluster
(ssh is used for building replication manager)
‹ Main Database : SunOne LDAP 5.1
Back-end data tier, high performance in data retrieving
‹ Checkpointing Database : MySQL 4.0 (Sun Solaris 2.9) (for FT-Baseline)
4
2
Baseline Architecture
Client
Look up
Headhunter
Client
11
Agency
Headhunter
Client
Agency2 2
Naming
Service
Bind
Requ
est
CORBA
SERVER
Headhunter
Agency n
LDAP
Query
LEGEND
Client
Naming Service
CORBA
SERVER
LDAP
Look up
Request
Bind
LDAP
LDAP Query
5
Fault-Tolerance Strategies
‹
Passive Replication
–
–
Replicating the middle tier on different machines in ECE cluster
State information in CORBA servant
• Saved to MySQL for checkpointing (user id/password, user level, transaction id,
operation flag)
–
On the Sacred Machine:
• CORBA Naming Service, LDAP Server, Replication Manager
‹
The elements of fault-tolerance framework
–
–
–
Replication Manager: Main process
Fault detector and automatic recovery : Thread
Fault injector : Thread
6
3
FT-Baseline Architecture (1)
Data Retrieving Operation
2
Primary
Replica
Fault
Injection
1
3
Fault Injector
MySQL
LDAP
Client
Naming Service
4
Fault Detector
Checkpointing
Replication
Manager
Process
Backup
Replica
5
Fault injection point is not determined for this scenario because whenever an
error occurs during this operation, the operation is re-tried from scratch
7
FT-Baseline Architecture (2)
Data Updating Operation
2
Fault
Injection
3
Primary
Replica
1
Fault Injector
4
MySQL
Client
Naming Service
Checkpointing
5
LDAP
Fault Detector
Replication
Manager
Process
Backup
Replica
Simplified a set of <transaction>
Client
Server
Backend
Since data update operation affects data on LDAP server, the operation must not be
duplicated. Recovery is possible by having new primary replica continue processing
without starting from scratch
8
4
Mechanisms for Fail-Over
‹
Fault Detection
Client obtains the names of the replica reference when it starts
We use one of CORBA Exceptions: COMM_FAILURE
The client gets a new CORBA replica from Naming Service
–
–
–
‹
Fail-over
Backup replica waits to take over
The client retries the operation with the new replica
The user of the client is reauthenticated on the new replica with the user data
in the checkpointing DB
If fault occurs, then the backup replica takes over with saved checkpointing
information
–
–
–
–
9
Fail-Over Measurements
3500
3000
2000
Phase I
1500
1000
1065
989
1027
951
913
875
837
799
761
723
685
647
609
571
533
495
457
419
381
343
305
267
229
191
153
77
1
0
115
500
39
msec
2500
Trials
15%
22%
14%
1.Fault Detection
0%
2.Get Replica Location
3.Reconnection
4.LDAP Query
5.Parse
49%
10
5
RT-FT-Performance Strategy
‹
The primary reason of fail-over “spike”
–
‹
TCP/IP reconnection between client and server
RT-FT strategy is to reduce the time of TCP/IP reconnection
–
–
–
Client pre-establishes connections with both replicas
3 Replicas ( Primary and 2Backup)
Locator : Choose replica which is most likely to have established its
connection.
»
»
–
Send dummy request to the replicas for adding number which
represents the age of replica.
Oldest replica is prepared for fail-over
CORBA Object Factory : Return CORBA object reference to client
The object reference of the oldest replica
11
RT-FT Baseline Architecture
Primary
Replica
Client
Naming Service Check pointing
CORBA Object
Factory Thread
Locator
(Replica Age Checker)
Thread.
LDAP
Fault Detector
Replication
Manager
Process
Backup
Replica
Client
Process
★
Fault Injector
dummy request
for adding
replica age
CORBA Object
Factory Object
Locator : Decide which replica is more likely to have established its connection.
Increase replica age by sending dummy request to the replica periodically.
★CORBA Object Factory : Returns
CORBA object reference to the client
12
6
RT-FT-Performance Measurements
3500
Response Ti me (msec)
3000
2500
2000
Phase II
1500
1000
500
0
1
45
89
133 177 221 265 309 353 397 441 485 529 573 617 661 705 749 793 837 881 925 969 1013 1057
Tr i a l s
<Contribution Factors for Failover (msec)>
91.2
604
881.2
527.6
262.4
Fault Detection
Get Replica Location
Reconnection
LDAP Query
Parse
13
Bounded “Real-Time” Fail-Over Measurements
1200
Highlight of RT-FT
Performance
Process Time (msec)
1000
800
600
400
200
0
Fault Detection Get Replica Location
Reconnection
LDAP Query
Phase I
Parse
Phase II
‹
Reconnection time is reduced by 49.5 %
•
LDAP query time : Different location of LDAP server (Phase1 : In Korea, Phase2 :Local)
Fault Detection Time : Client side load caused from background TCP/IP connection
Parsing time : Affected by variability of system environment.
•
•
14
7
Performance Strategy
‹
Load balancing
Number of clients connected to a server
CPU load of a server
–
–
Test Object
1: CPU Based Load balancing
2: Request Based Load balancing
Server
Naming Service
Load Balanced
Connection
allocation
Server
Client Worker
Thread
Client
Worker
Client
Worker
Client
Worker
Thread.
Thread
Client
Worker
Client
Worker
Thread
Thread
Thread
(Concurrent Users)
Server
<Legend>
LDAP
Load Balancing
Process
Data Flow
Client
Process
Client Worker
threads
15
Performance Measurements
Total
Response Time (msec)
3500
18000
3000
평균:Tim e
16000
~10
14000
2500
12000
2000
데이터
평균:msec
Count of msec
1500
10000
Total
8000
6000
1000
~5
500
4000
2000
0
0
2
12
22
37
No. of Clients
49
1
No. of Clients(Threads)
11
21
31
41
51
61
101 111 121 131 141 151 162 173 183 194
Chart Title
25
~15
14000. 00
91
Pitch Size
평균:msec
16000. 00
81
NofClient
Total
18000. 00
71
Two servers
Single server
Average of Pitch
20
12000. 00
15
y = 4.7571x
R2 = 0.9947
10000. 00
Total
8000. 00
Total
10
6000. 00
4000. 00
5
2000. 00
0
0. 00
1
11
21
31
41
51
62
72
82
94
104 114 124 134 144 154 164 174 184 194
No. of Clients
Three servers
1
2
3
No. of Server
Scalability Trend
No. of Server
16
8
Superimposed Performance Data
<Data for 1,2, and 3 Servers>
7000.00
Response Time (millisecond)
6000.00
5000.00
Average of msec One Server
4000.00
Average of msec Two (CPU)
Average of msec Two (Req)
Average of msec Three (CPU)
3000.00
Average of msec Three (Req)
2000.00
1000.00
0.00
1
5
9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97
No. of Clients (Threads)
17
Insights from Measurements
‹
Fault Tolerance
–
‹
RT-FT
–
‹
TCP/IP connection is most significant factor for latency in fault recovery
Background connection pre-establishment helps to reduce failover time
Performance
–
Thread pool and bottleneck tier must be considered
18
9
Open Issues
‹
Issues
–
–
‹
What is the exact source of the bottleneck ?
We suspect the backend database
Additional features if we have more time
–
FT
• Replication of sacred services such as Replication Manager or Naming Service
• Active replication
–
RT-FT
• Saving LDAP connection object as checkpointing for saving time for
reauthentication.
• Optimization of CORBA server
• CORBA persistent reference to reduce Naming Service load
–
Performance
• Reducing delay time for checking CPU load using “ssh” command
19
Conclusions
‹
Accomplishments
–
–
–
–
‹
Lessons learned
–
–
‹
FT: Passive replication strategy and selection criteria
RT-FT: Background connection pre-establishment strategy
Performance: Load balancing strategy (CPU load and # of connections)
RT and Performance analysis
Identifying exact points of the bottleneck
Checkpointing strategy
Considerations if we restarted afresh
–
–
Set replication point after a complete performance analysis at each tier
Analyze WAN vs LAN impact
20
10
Download