slides - National University of Singapore

advertisement
Conservative Simulation using
Distributed-Shared Memory
Teo, Y. M., Ng, Y. K. and Onggo, B. S. S.
Department of Computer Science
National University of Singapore
PADS 2002
1
Objectives

Improve performance of SPaDES/Java by
reducing overhead:
Synchronization of events
 Distributed communications


Study the memory requirements in parallel
simulations.
PADS 2002
2
Presentation Outline

Parallel Simulation
 Null Message Protocol
 Performance Improvement
 Memory Requirement
 Conclusion
PADS 2002
3
Parallel Simulation

Sequential simulations execute on a single
thread in one processor.
 Ideally, parallelizing the simulation should
enhance its real-time performance since the
workload is distributed.
 The need to maintain causality throughout a
parallel simulation
=> Event synchronization protocols.
=> Adds to inter-process communications.
=> New bottleneck! PADS 2002
4
Null Message Protocol
 First designed by Chandy and Misra (1979).
 Prevents deadlock situations between LPs.
 LPi sends null messages to each of its neighbours
at the end of every simulation pass, with
timestamp = local virtual time of LPi.
 Timestamp on null message, T, indicates that the
source LP will not send any messages to other LPs
before T.
PADS 2002
5
Null Message Protocol
Clock = 4
4
4
LP
4
LP
4
LP
4
7
LP
FEL
PADS 2002
6
Performance Improvement
 Chandy-Misra-Byrant’s (CMB) protocol performs
poorly due to high null message overhead. It
transmits null msgs on every simulation pass
NMR ~> 1 for nearly all [0, T).
 Optimizations incorporated:
Carrier-null message scheme
Flushing mechanism
Demand-driven null message algorithm
Remote communications using JavaSpace
PADS 2002
7
Carrier-Null Message Algorithm
Problem with cyclic topologies
Use carrier-null message algorithm (Wood,
Turner, 1996)
Avoids transmissions of redundant null
messages in such cycles.
PADS 2002
8
Performance Improvement
Demand driven null messaging + flushing
Output Channel (A)
Flusher
20
25
30
35
Logical
Process
(A)
FEL
20
18
35
Logical
Process
(B)
REQ
Request Channel (B)
PADS 2002
9
Performance Evaluation
Experiments conducted using
PC cluster of 8 nodes running RedHat
Linux version 7.0. Each node is a Pentium
II 400 MHz processor with 256 MB of
memory connected through 100 Mbps
switch.
2 benchmark programs
PHOLD system
Linear Pipeline
PADS 2002
10
PHOLD (3x3, m)

Closed system
Node
Node
Node
Node
Node
Node
Node
Node
Node
PADS 2002
11
Linear Pipeline (4, )

Open system
Customer population
Service
Center
Service
Center
Service
Center
Service
Center
Depart
PADS 2002
12
PHOLD (n x n, m)
1
CMB
0.9
0.8
CM B (m=1)
CM B (m=8)
+ Carrier-Null
CM B (m=16)
0.7
Carrier-null (m=1)
NMR
Carrier-null (m=8)
+ Flushing
Carrier-null (m=16)
0.6
Flushing (m=1)
Flushing (m=8)
Flushing (m=16)
0.5
Demand-driven (m=1)
Demand-driven (m=8)
Demand-driven (m=16)
0.4
+ Demand-driven null msging
0.3
0.2
4x4
8x8
Problem Size (n x n)
PADS 2002
16 x 16
13
Linear Pipeline (n, )
1
CMB + Carrier-Null
0.9
CMB / Carrier-null (0.2)
CMB / Carrier-null (0.4)
0.8
CMB / Carrier-null (0.6)
+ Flushing
CMB / Carrier-null (0.8)
NMR
Flushing (0.2)
Flushing (0.4)
0.7
Flushing (0.6)
Flushing (0.8)
Demand-driven (0.2)
Demand-driven (0.4)
0.6
Demand-driven (0.6)
+ Demand-driven null msging
Demand-driven (0.8)
0.5
0.4
4
8
12
16
Problem size (n)
PADS 2002
14
Performance Summary

%tage Reduction in NMR:
 PHOLD system
CMB  Carrier-null  30%
 Flushing incorporated  42%
 Demand-driven null msg  55%
 Linear Pipeline
CMB  Carrier-null  0%
 Flushing incorporated  23%
 Demand-driven null msg  35%
PADS 2002
15
Distributed Communications

Originally, SPaDES/Java uses the RMI library
to transmit messages between remote LPs. But
the serialization phase presents a bottleneck.
 Previous performance optimization effort:
message deflation.
 Only
solution
to
overcome
remote
communications overhead => send less
messages. How?
 Target at null messages.
PADS 2002
16
JavaSpaces

A special Java-Jini service developed by Sun
Microsystems, Inc., built on top of Java’s RMI,
mimicking a tuple space.
 Abstract platform for developing complex
distributed applications.
 Distributed data persistence.
 Holds objects, known as entries, with variable
attribute types.
 Key concept: matching of attribute types/values.
PADS 2002
17
JavaSpaces
 4 generic operations: write, read, take and notify.
read
take
write
Notifier
Client
notify
Client
PADS 2002
18
Distributed Communications

Replace the RMI communication module in
SPaDES/Java with one running on a single
JavaSpace.
 Use a FrontEndSpace: permits crash
recovery of entries in the space.
 Transmission
of processes and null
messages between remote hosts go through
theFrontEndSpace as space entries.
PADS 2002
19
Space Communications :
Processes
Time
==
t >0 0
Time
SProcess
SProcess
SProcess
sender = 2
receiver = 2
receiver = 1
receiver = 1
……..
LP1
LP2
PADS 2002
20
Space Communications :
Null Messages
LP4
Req
sender = 2
LP1
NullMsg
Req
sender = 2
sender = 2
……..
LP2
PADS 2002
LP3
21
Performance Evaluation –
PHOLD(n x n, m)
0.55
0.5
RM I/JavaSpace (1processo r, m=1)
RMI
RM I/JavaSpace (1processo r, m=8)
0.45
RM I/JavaSpace (1processo r, m=16)
RM I (4 processo rs, m=1)
RM I (4 processo rs, m=8)
0.4
RM I (4 processo rs, m=16)
NMR
RM I (8 processo rs, m=1)
RM I (8 processo rs, m=8)
0.35
RM I (8 processo rs, m=16)
JavaSpace (4 procs)
JavaSpace (4 processo rs, m=1)
JavaSpace (4 processo rs, m=8)
JavaSpace (4 processo rs, m=16)
0.3
JavaSpace (8 processo rs, m=1)
JavaSpace (8 processo rs, m=8)
JavaSpace (8 processo rs, m=16)
0.25
JavaSpace (8 procs)
0.2
4x4
8x8
PADS 2002
Problem Size (n x n)
16 x 16
22
Overall Performance Evaluation –
PHOLD(n x n, m)
1
CMB
0.9
CM B (m=1)
CM B (m=8)
CM B (m=16)
0.8
Carrier-null (m=1)
+ Carrier-Null
Carrier-null (m=8)
0.7
Carrier-null (m=16)
NMR
Flushing (m=1)
+ Flushing
0.6
Flushing (m=8)
Flushing (m=16)
Demand-driven (m=1)
Demand-driven (m=8)
0.5
Demand-driven (m=16)
+ Demand-driven null msging
JavaSpace [4 pro cs] (m=1)
0.4
JavaSpace [4 pro cs] (m=8)
JavaSpace (4 procs)
JavaSpace [4 pro cs] (m=16)
JavaSpace [8 pro cs] (m=1)
0.3
JavaSpace [8 pro cs] (m=8)
JavaSpace (8 procs)
JavaSpace [8 pro cs] (m=16)
0.2
4x4
8x8
PADS 2002
Problem Size (n x n)
16 x 16
23
Performance Summary

%tage Reduction in NMR:
CMB  Carrier-null  30%
 Flushing incorporated  42%
 Demand-driven null msg  55%
 JavaSpace (4 processors)  63%
 JavaSpace (8 processors)  74%
PADS 2002
24
Memory Requirement

Mprob

ni=1 MaxQueueSize(LPi)

Mord

ni=1 MaxFELSize(LPi)

Msync

ni=1 MaxNullMsgBufferSize(LPi)
PADS 2002
25
Memory Requirement
Space Usage
0.2
Mprob
Mord
Msy nc (RMI)
Msy nc (JavaSpaces)
M (RMI)
M (JavaSpaces)
98
50
331
305
479
453
PIPELINE (16, p)
p
0.4
0.6
192
52
341
308
585
552
320
54
348
311
722
685
PADS 2002
0.8
740
56
352
312
1148
1108
PHOLD (16x16, m)
m
1
8
16
256
2048
4096
665
347
921
603
651
332
2699
2380
638
317
4734
4413
26
Achievements & Conclusion
 Enhanced
the performance of SPaDES/Java
through various synchronization protocols,
achieving an excellent NMR of < 30%.
 Implemented
a brand new discrete-event
simulation library based on the concept of shared
memory in a JavaSpace.
 Implemented a TSA into SPaDES/Java that can be
used as a bench for memory usage studies in
parallel simulations.
PADS 2002
27
Acknowledgments

Port of Singapore Authority (PSA)
 Ministry of Education, Singapore
 Constructive feed-back from referees
PADS 2002
28
References

SPaDES/Java homepage
http://www.comp.nus.edu.sg/~pasta/spades-java/spadesJava.html

Current project webpage
http://www.comp.nus.edu.sg/~ngyewkwo/HYP.html

MSG homepage
http://www.comp.nus.edu.sg/~rpsim/MSG
PADS 2002
29
Download