Conservative Simulation using Distributed-Shared Memory Teo, Y. M., Ng, Y. K. and Onggo, B. S. S. Department of Computer Science National University of Singapore PADS 2002 1 Objectives Improve performance of SPaDES/Java by reducing overhead: Synchronization of events Distributed communications Study the memory requirements in parallel simulations. PADS 2002 2 Presentation Outline Parallel Simulation Null Message Protocol Performance Improvement Memory Requirement Conclusion PADS 2002 3 Parallel Simulation Sequential simulations execute on a single thread in one processor. Ideally, parallelizing the simulation should enhance its real-time performance since the workload is distributed. The need to maintain causality throughout a parallel simulation => Event synchronization protocols. => Adds to inter-process communications. => New bottleneck! PADS 2002 4 Null Message Protocol First designed by Chandy and Misra (1979). Prevents deadlock situations between LPs. LPi sends null messages to each of its neighbours at the end of every simulation pass, with timestamp = local virtual time of LPi. Timestamp on null message, T, indicates that the source LP will not send any messages to other LPs before T. PADS 2002 5 Null Message Protocol Clock = 4 4 4 LP 4 LP 4 LP 4 7 LP FEL PADS 2002 6 Performance Improvement Chandy-Misra-Byrant’s (CMB) protocol performs poorly due to high null message overhead. It transmits null msgs on every simulation pass NMR ~> 1 for nearly all [0, T). Optimizations incorporated: Carrier-null message scheme Flushing mechanism Demand-driven null message algorithm Remote communications using JavaSpace PADS 2002 7 Carrier-Null Message Algorithm Problem with cyclic topologies Use carrier-null message algorithm (Wood, Turner, 1996) Avoids transmissions of redundant null messages in such cycles. PADS 2002 8 Performance Improvement Demand driven null messaging + flushing Output Channel (A) Flusher 20 25 30 35 Logical Process (A) FEL 20 18 35 Logical Process (B) REQ Request Channel (B) PADS 2002 9 Performance Evaluation Experiments conducted using PC cluster of 8 nodes running RedHat Linux version 7.0. Each node is a Pentium II 400 MHz processor with 256 MB of memory connected through 100 Mbps switch. 2 benchmark programs PHOLD system Linear Pipeline PADS 2002 10 PHOLD (3x3, m) Closed system Node Node Node Node Node Node Node Node Node PADS 2002 11 Linear Pipeline (4, ) Open system Customer population Service Center Service Center Service Center Service Center Depart PADS 2002 12 PHOLD (n x n, m) 1 CMB 0.9 0.8 CM B (m=1) CM B (m=8) + Carrier-Null CM B (m=16) 0.7 Carrier-null (m=1) NMR Carrier-null (m=8) + Flushing Carrier-null (m=16) 0.6 Flushing (m=1) Flushing (m=8) Flushing (m=16) 0.5 Demand-driven (m=1) Demand-driven (m=8) Demand-driven (m=16) 0.4 + Demand-driven null msging 0.3 0.2 4x4 8x8 Problem Size (n x n) PADS 2002 16 x 16 13 Linear Pipeline (n, ) 1 CMB + Carrier-Null 0.9 CMB / Carrier-null (0.2) CMB / Carrier-null (0.4) 0.8 CMB / Carrier-null (0.6) + Flushing CMB / Carrier-null (0.8) NMR Flushing (0.2) Flushing (0.4) 0.7 Flushing (0.6) Flushing (0.8) Demand-driven (0.2) Demand-driven (0.4) 0.6 Demand-driven (0.6) + Demand-driven null msging Demand-driven (0.8) 0.5 0.4 4 8 12 16 Problem size (n) PADS 2002 14 Performance Summary %tage Reduction in NMR: PHOLD system CMB Carrier-null 30% Flushing incorporated 42% Demand-driven null msg 55% Linear Pipeline CMB Carrier-null 0% Flushing incorporated 23% Demand-driven null msg 35% PADS 2002 15 Distributed Communications Originally, SPaDES/Java uses the RMI library to transmit messages between remote LPs. But the serialization phase presents a bottleneck. Previous performance optimization effort: message deflation. Only solution to overcome remote communications overhead => send less messages. How? Target at null messages. PADS 2002 16 JavaSpaces A special Java-Jini service developed by Sun Microsystems, Inc., built on top of Java’s RMI, mimicking a tuple space. Abstract platform for developing complex distributed applications. Distributed data persistence. Holds objects, known as entries, with variable attribute types. Key concept: matching of attribute types/values. PADS 2002 17 JavaSpaces 4 generic operations: write, read, take and notify. read take write Notifier Client notify Client PADS 2002 18 Distributed Communications Replace the RMI communication module in SPaDES/Java with one running on a single JavaSpace. Use a FrontEndSpace: permits crash recovery of entries in the space. Transmission of processes and null messages between remote hosts go through theFrontEndSpace as space entries. PADS 2002 19 Space Communications : Processes Time == t >0 0 Time SProcess SProcess SProcess sender = 2 receiver = 2 receiver = 1 receiver = 1 …….. LP1 LP2 PADS 2002 20 Space Communications : Null Messages LP4 Req sender = 2 LP1 NullMsg Req sender = 2 sender = 2 …….. LP2 PADS 2002 LP3 21 Performance Evaluation – PHOLD(n x n, m) 0.55 0.5 RM I/JavaSpace (1processo r, m=1) RMI RM I/JavaSpace (1processo r, m=8) 0.45 RM I/JavaSpace (1processo r, m=16) RM I (4 processo rs, m=1) RM I (4 processo rs, m=8) 0.4 RM I (4 processo rs, m=16) NMR RM I (8 processo rs, m=1) RM I (8 processo rs, m=8) 0.35 RM I (8 processo rs, m=16) JavaSpace (4 procs) JavaSpace (4 processo rs, m=1) JavaSpace (4 processo rs, m=8) JavaSpace (4 processo rs, m=16) 0.3 JavaSpace (8 processo rs, m=1) JavaSpace (8 processo rs, m=8) JavaSpace (8 processo rs, m=16) 0.25 JavaSpace (8 procs) 0.2 4x4 8x8 PADS 2002 Problem Size (n x n) 16 x 16 22 Overall Performance Evaluation – PHOLD(n x n, m) 1 CMB 0.9 CM B (m=1) CM B (m=8) CM B (m=16) 0.8 Carrier-null (m=1) + Carrier-Null Carrier-null (m=8) 0.7 Carrier-null (m=16) NMR Flushing (m=1) + Flushing 0.6 Flushing (m=8) Flushing (m=16) Demand-driven (m=1) Demand-driven (m=8) 0.5 Demand-driven (m=16) + Demand-driven null msging JavaSpace [4 pro cs] (m=1) 0.4 JavaSpace [4 pro cs] (m=8) JavaSpace (4 procs) JavaSpace [4 pro cs] (m=16) JavaSpace [8 pro cs] (m=1) 0.3 JavaSpace [8 pro cs] (m=8) JavaSpace (8 procs) JavaSpace [8 pro cs] (m=16) 0.2 4x4 8x8 PADS 2002 Problem Size (n x n) 16 x 16 23 Performance Summary %tage Reduction in NMR: CMB Carrier-null 30% Flushing incorporated 42% Demand-driven null msg 55% JavaSpace (4 processors) 63% JavaSpace (8 processors) 74% PADS 2002 24 Memory Requirement Mprob ni=1 MaxQueueSize(LPi) Mord ni=1 MaxFELSize(LPi) Msync ni=1 MaxNullMsgBufferSize(LPi) PADS 2002 25 Memory Requirement Space Usage 0.2 Mprob Mord Msy nc (RMI) Msy nc (JavaSpaces) M (RMI) M (JavaSpaces) 98 50 331 305 479 453 PIPELINE (16, p) p 0.4 0.6 192 52 341 308 585 552 320 54 348 311 722 685 PADS 2002 0.8 740 56 352 312 1148 1108 PHOLD (16x16, m) m 1 8 16 256 2048 4096 665 347 921 603 651 332 2699 2380 638 317 4734 4413 26 Achievements & Conclusion Enhanced the performance of SPaDES/Java through various synchronization protocols, achieving an excellent NMR of < 30%. Implemented a brand new discrete-event simulation library based on the concept of shared memory in a JavaSpace. Implemented a TSA into SPaDES/Java that can be used as a bench for memory usage studies in parallel simulations. PADS 2002 27 Acknowledgments Port of Singapore Authority (PSA) Ministry of Education, Singapore Constructive feed-back from referees PADS 2002 28 References SPaDES/Java homepage http://www.comp.nus.edu.sg/~pasta/spades-java/spadesJava.html Current project webpage http://www.comp.nus.edu.sg/~ngyewkwo/HYP.html MSG homepage http://www.comp.nus.edu.sg/~rpsim/MSG PADS 2002 29