Distributed simulation with MPI in ns-3


Distributed simulation with MPI in ns-3

Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011



• Standard sequential simulation techniques with substantial network traffic – Lengthy execution times – Large amount of computer memory • Parallel and distributed discrete event simulation [1] – Allows single simulation program to run on multiple interconnected processors – Reduced execution time! Larger topologies!


Overview (cont.)

• Important Note – It is mandatory that distributed simulations produce the same results as identical sequential simulations

Overview: terminology

• Logical Process (LP) – An individual sequential simulation • Rank or system id – The unique number assigned to each LP 4 Figure 1. Simple point-to-point topology, distributed


Overview: related work

• • Parallel/Distributed ns (PDNS) [2] Georgia Tech Network Simulator (GTNetS) [3] – Both use a federated approach and a conservative (blocking) mechanism


Implementation Details in ns-3

• LP communication – Message Passing Interface (MPI) standard – Send/Receive time-stamped messages – MpiInterface in ns-3 • Synchronization – Conservative algorithm using lookahead – DistributedSimulator in ns-3


Implementation Details in ns-3 (cont.)

• • Assigning rank to nodes – Handled manually in simulation script Remote point-to-point links – Created automatically between nodes with different ranks through point-to-point helper – When a packet is set to cross a remote point-to-point link, the packet is transmitted via MPI using our interface • Merged since ns-3.8


Implementation Details in ns-3: limitations

• All nodes created on all LPs, regardless of rank – It is up to the user to only install applications on the correct rank • • Nodes are assigned rank manually – An MpiHelper class could be used to assign rank to nodes automatically. This would enable easy distribution of existing simulation scripts.

Pure distributed wireless is currently not supported – At least one point-to-point link must exist in order to divide the simulation


Performance Study

• DARPA NMS campus network simulation – Using nms-p2p-nix-distributed example available in ns-3 – Allows creation of very large topologies – Any number of campus networks are created and connected together – Different campus networks can be placed on different LPs – Tested with 2 CNs, 4 CNs, 6 CNs, 8 CNs, and 10 CNs

Performance Study: campus network topology

200 ms, 10 us 10 Figure 2. Campus network topology block [4]


Performance Study: Georgia Tech clusters used

• • Hogwarts Cluster – 6 nodes, each with 2 quad-core processors and 48GB of RAM Ferrari Cluster – Mix of machines, including 3 quad-core nodes and 8 dual core nodes

Performance Study: simulations on Hogwarts

12 Figure 3. Campus network simulations on Hogwarts with (A) 2 CNs (B) 4 CNs (C) 6 CNs (D) 8 CNs (E) 10 CNs

Performance Study: simulations on Ferrari

13 Figure 4. Campus network simulations on Ferrari with (A) 2 CNs (B) 4 CNs (C) 6 CNs (D) 8 CNs (E) 10 CNs


Performance Study: speedup

Figure 5. Speedup using distributed simulation for campus network topologies on the (A) Hogwarts cluster and (B) Ferrari cluster


Performance Study: speedup (cont.)

Hogwarts Ferrari Table 1: Speedup for Hogwarts and Ferrari

2 CNs



4 CNs



6 CNs



8 CNs



10 CNs



• Linear speedup for Hogwarts, not for Ferrari. Further investigation revealed Ferrari consisted of a mix of machines, with the first two nodes considerably faster


Performance Study: changing the lookahead

• By changing the delay between campus networks, the lookahead was varied (200ms to 10 µs) • For Hogwarts and Ferrari, the 10 µs simulations ran, on average, 25% and 47% slower, respectively • As expected, a smaller lookahead time decreases the potential speedup, as the simulators must synchronize with a greater frequency


Future Work

• • • MpiHelper class to facilitate creating distributed topologies – Nodes assigned rank automatically – Existing simulation scripts could be distributed easily Distributing the topology could occur at the node level, rather than the application – Ghost nodes, save memory Pure distributed wireless support



• Distributed simulation in ns-3 allows a user to run a single simulation in parallel on multiple processors • Very large-scale simulations can be run in ns-3 using the distributed simulator • Distributed simulation in ns-3 offers potentially optimal linear speedup compared to identical sequential simulations


19 [1] R.M. Fujimoto. Parallel and Distributed Simulation Systems. Wiley Interscience, 2000.

[2] PDNS - Parallel/Distributed ns. http://www.cc.gatech.edu/computing/compass/pdns , March 2004.

[3] G. F. Riley. The Georgia Tech Network Simulator. In Proceedings of the

ACM SIGCOMM workshop on Models, methods and tools for

reproducible network research, MoMeTools ’03, pages 5-12, New York, NY, USA, 2003 ACM.

[4] Standard baseline NMS challenge topology. http://www.ssfnet.org/Exchange/gallery/baseline , July 2002