DEVS PhD Dissertation Awards Hybrid Modeling And Simulation of Complex Data Networks Matias Bonaventura mbonaventura@dc.uba.ar Supervisor: Prof. Rodrigo Castro Computer Science Department and ICC-CONICET School of Exact and Natural Sciences, University of Buenos Aires Argentina May 18th, 2020 Outline 1. Challenges, motivation, and research hypothesis 2. Background and theoretical framework 3. Network packet-level simulation 4. Network fluid-flow simulation 5. Network hybrid simulation 6. Conclusions and future work 2 1 Challenges, motivation, and research hypothesis 3 Large Hadron Collider (LHC) at CERN ● ● ● ● World Largest particle accelerator (27 km of superconducting magnets) Collides bunches of particles every 25 ns at an energy of 13 Tev Hosts 4 main detectors at collision points Data center stores >30 PB of data per year 4 Data Network Simulation - Why? Real case study: The ATLAS detector @ CERN ● ● ● ● Detector: 40Mhz proton collisions generate ~64 TB/s Level1: hardware filters down to ~160 GB/s High Level (HLT): software filters down to ~1.6 GB/s Simulation supports: network design, sizing, fine-tuning, predict changes, etc ~2000 multicore servers 1-10 Gbps links 5 Network Simulation Approaches ● Packet-level simulation: individual packet-by-packet ○ Discrete events (continuous time) simulations ○ Detailed results & vast literature and tools ○ Limitations: execution is proportional to #packets Not adequate for large network simulation 6 Network Simulation Approaches ● Packet-level simulation: individual packet-by-packet ○ Discrete events (continuous time) simulations ○ Detailed results & vast literature and tools ○ Limitations: execution is proportional to #packets ● Fluid-flow simulation: averaged dynamics (ODEs) TCP Congestion Window ○ Discrete time (Euler, Runge Kutta) ○ Better execution time & scarce literature ○ Limitation: ODE modeling is hard to adopt 7 Network Simulation Approaches ● Packet-level simulation: individual packet-by-packet ○ Discrete events (continuous time) simulations ○ Detailed results & vast literature and tools ○ Limitations: execution is proportional to #packets ● Fluid-flow simulation: averaged dynamics (ODEs) ○ Discrete time (Euler, Runge Kutta) ○ Better execution time & scarce literature ○ Limitation: ODE modeling is hard to adopt ● Hybrid simulation: combines best of each world (details results & performance) ○ Limited literature available. Ad-hoc solutions. ○ Incompatibility between discrete time (packets) vs. continuous time (fluid) ○ Different set of tools and knowledge (Matlab, ODEs vs. protocols, topologies) 8 Main Research Hypothesis "The packet-level, fluid-flow and hybrid models can be represented within a unified M&S formalism (DEVS) providing advantages of modeling expressiveness and simulation performance" 9 2 Background Theoretical framework 10 DEVS Formalism and Hybrid Systems ● ● DEVS (Discrete EVent Systems specification), Bernard Zeigler, ’76 DEVS allows to: ○ Exactly represent any discrete system ○ Approximate continuous systems with any desired accuracy Fluid-flow models (ODEs) Classic Fluid-flow Simulation (Euler, Runge Kutta) Packet-level network simulation DEVS Fluid-flow Simulation (QSS) New hybrid network simulation 11 Modeling and Simulation-Driven Methodology ● Iterative cycles and incremental phases 12 Solution Proposal ● ● Goal: "Probe the theoretical and practical feasibility of unifying the packet-level, fluid-flow, and hybrid approaches within a unifying formal framework" ○ Unifying formalism: DEVS ○ Unifying tool: PowerDEVS (DEVS+QSS, some existing network models) ○ Real use case: TDAQ system at CERN Phases A DEVS Packet-Level Models C B Fluid-Flow Approximations Hybrid Models 13 3 Packet-level simulation under the DEVS formalism 14 Packet-level Simulation Approach under DEVS ● DEVS-based iterative methodology: bottom-up, emergent behaviour, OSI layers 3-7 ○ Minimum complexity for the desired precision [Robinson] Modular atomic DEVS ● Formal implementation (DEVS) of TCP Reno ○ Validated against OMNET++ and real TDAQ traffic (tcp_dump) Sliding Window Congestion Control pkt pkt ack req ack pkt TCP Receiver TCP Sender TCP state machine SS CA EB FR FR 15 Modeling Complex Network Topologies ● ● ● Library with >30 new network models (DEVS: hierarchical & modular) PowerDEVS GUI: build and design network topologies graphically New modeling tools to ease design of bigger complex topologies: (Laurito, Bonaventura, Pozo, ○ TopoGen: Automatic SDN topology generation Castro, WSC 2016) ○ Py2PDEVS: python binding to define PowerDEVS models programmatically Low-level Model Library PowerDEVS Graphical Topology 16 Modeling TDAQ Applications TDAQ/HLT Model in PowerDEVS (Bonaventura, Foguelman , Castro, CiSE 2016) 2000 servers 100 Khz TDAQ network ~50 Gbps 100-200 servers ~50 K applications ● ● ● ● ● ● 100 KHz Events IDs ~2000 servers ~50000 applications ~50Gb/s data ~50 switches, 2 routers ~450 links (1-10 Gbps) 17 Empirical Validation with the Real TDAQ System ● Emergent Dynamics and the impact in the event filtering time: 1. Reproduce the "TCP Incast" pathology present in TDAQ 2. Reproduce event filtering times for topology changes 3. Fine tuning of traffic control applications 4. Predictive load-balancing simulation (Bonaventura, Jonckheere, Castro, WSC 2018) i. ii. iii. Initial study based on queuing theory Evaluation in the model predicts improvements Implementation in the real system: improvements confirmed Policy: FFFA 18 Hypothesis on the Model and Empirical Results ● Experimental framework: 9 racks (267 DCMs, 6408 PUs), Events 1.7 MB (trigger limited by network capacity) RMSE= 3.24 ● RMSE=15.97 ● average latency reduction of ~25% 10.84 ms Simulation accurately reproduces the real system metrics The new policy does reduce latency (under controlled conditions) Downside: The real load-balancer could not cope with the 100 KHz using the new proposed algorithm 19 Packet-level Simulation Conclusions Packet-level main contributions ● ● ● New library of formal (DEVS) network models New tools for automatic and programmatic topologies (big scale) Simulations: acceptable precision (network and application metrics) ○ Validation: with other simulators, analytical models and real hardware ○ Successful in the design and fine-tuning of the real world network at CERN ● Execution times: grow linearly with respect to transmitted packets ○ Full TDAQ system (50 racks): 60s simulated → 1 day and 9 hours of execution ○ By 2026, TDAQ will increase system size ~50 times ○ Parallel simulation offers gains only when there is light inter-subnetworks traffic 20 4 Fluid-flow Simulations 21 Fluid-flow Simulation Approach Same formalism (DEVS) + same tool (PowerDEVS) as in packet-level simulation ● Approach: decouple 3 areas of knowledge required for fluid approximations 1. Simulation of ODEs with DEVS: numerical solutions with QSS 2. Mathematical model: based on the set of equations proposed by MGT[*] 3. Network modeling: topological description similar to packet-level [*] V. Misra, W.-B. Gong, and D. Towsley. “Fluid-based Analysis if a Network of AQM Routers Supporting TCP Flows with an Application to RED”. ACM/SIGCOMM, 2000 22 Numerical Integration with DEVS: Quantized State System[*] ● ● QSS originally proposed by Zeigler ‘98. Later formalized by Kofman ‘01. Basic idea in QSS: quantize state variables preserving the continuous time domain ● Given an ODE system: xi(t) ΔQi ● QSS quantizes the state variables: xi qi ● ● … asynchronous PowerDEVS provides a set of QSS methods (QSS1-3, LIQSS, etc.) Resulting in a discrete event system [*] Kofman, E., and S. Junco. 2001. “Quantized-State Systems: a DEVS Approach for Continuous System Simulation”. Simulation. Time 23 Mathematical Model: Fluid Queue+Server Subsystem ● Based on the set of equations proposed by Towsley ‘01-06 ● Fluid finite buffer: intuitive analogy a water reservoir with finite capacity a1(t) a2(t) ai(t) 𝜇i(t) q(t) <= Qmax t C d1(t) d2(t) di(t) 24 Mathematical Model: Fluid Queue+Server Subsystem ● Challenges for the numerical solution of the differential system ● Asynchronous discontinuities ● Delayed dynamics In QSS: DQSS (Castro et al., 2011) ● Implicit expression New QSS extension: FDQSS (convergence theorem in this Thesis) 25 Modeling of Fluid-Flow Networks: Modular Topology TCP: queue+server: 26 Fluid-Flow Networks: Experimental Results ● Experiment: ○ Random Early Discard (RED) buffers ○ Multiple ON/OFF competing TCP sessions ○ Packet-level: stochastic packet size and generation times Fluid-flow model Packet-level model 27 Fluid-Flow Networks: Experimental Results (Accuracy) 1. 2. Fluid-flow approximation captures averaged behaviour Same dynamic profile: congestion window, resource sharing 1 host 2 hosts 1 host 28 Fluid-Flow Network: Experimental Results (performance) ● Scalability comparison against packet-level simulations ○ Increasing (by a K factor): bandwidth, buffer size, RED params, #TCP sessions ● Scaling link speed ○ Packet-level: ~linear in #packets ○ Fluid-Flow: ~constant Packet-level model Fluid-flow model ● Speedup ○ e.g. 1Gbps links => speedup x200 29 Fluid-Flow Simulation Conclusions Fluid-flow contributions (Bonaventura, Castro, WSC 2018) Same formalism (DEVS) + Same tool (PowerDEVS) as in packet-level simulation ● Simulation: ○ Acceptable approximation when compared with packet-level simulation ○ Performance advantages for large networks (independence of link speed) ○ New numerical method for FDQSS ● Modeling: 1. New fluid network model libraries (DEVS): generic, reusable, and modular 2. Network modeling: topological description similar to packet-level (same tools) 3. Topology design: don’t need knowledge on ODEs or numerical solvers 30 5 Hybrid Simulation 31 Hybrid Simulation: Packet → Fluid ● ● ● ● Combine packet and fluid simulations affecting each other Idea: augment each packet arriving at a link with a fluid signal ON:sending, OFF:passive DEVS algorithm: for each discrete packet ○ Generate a signal with value C (link speed) together with the packet ○ After time t = packet.size/C: generate a signal with value 0 Optional: smoothing taking a averaged window C= ON ON OFF ON OFF OFF 32 Hybrid Simulation: Hybrid Queue (Fluid ↔ Packet) ● Interaction between models occurs at hybrid queues ○ Packet→Fluid: augmented packet (ON/OFF signal) input to a fluid queue ○ Fluid→Packet: fluid queue metrics (delay and discards) affect each packet ■ Fluid signals: discard Prob. = ; delay = (applied to packets) Packets Fluid ● QSS numerical method ○ Does not require any synchronization ● Dense fluid signals ○ known ∀t ○ bounded error < 𝚫Qi hybrid queue+server Fluid → Packet Fluid queue+server 33 Hybrid Simulation: Topologies Fluid-flow model Hybrid Router Hybrid model Packet-level model Packet ON/OFF hybridization Hybrid queue+server 34 Hybrid simulation: Experimental Results ● ● Topology for experiments: ○ Nf fluid flows; Nc packet flows; sharing the same bottleneck link Hybrid simulation scenarios ○ Hybrid router only with packet flow (Nf =0) ○ Background/Foreground traffic: Nc=1 ; Nf >> Nc ○ Adjust precision vs. performance: Nc+ Nf = 40 bottleneck 35 Hybrid Simulation: Experimental Results ● Adjusting trade-off between precision and performance ○ 40 TCP sessions in total: Npacket (precision) + Nfluid (performance) = 40 ○ Increase precision by increasing the ratio packet/fluid ○ Bottleneck = 100 Mbps; bandwidth packet = fluid = 200 Mbps Execution time (70s simulated) Total Gbits sent (60s simulated) 250 Execution time(s) Total sent (Gb) 4 Packet traffic 3 2 1 Fluid traffic 0 8 16 24 32 # Packet TCP sessions 40 Hybrid model 200 150 100% Packet-level 100 50 0 Hybrid (smooth=10ms) 0 4 8 12 20 28 # Packet TCP sessions 32 36 40 36 Hybrid Simulation: Experimental Results TCP sessions: 20 fluid + 20 packet (1 probe) 37 Conclusions Selected contributions ● Scientific production DEVS as a unifying framework for packet, fluid and hybrid network models 3 Posters 4 Full conference papers ● Modeling advantages: simplifies adoption (no ODE knowledge) and modeling process (unified tool) * 1 Journal article accepted ● Hybrid models allows to adjust the trade-off between precision and performance ● Real world application and validation in CERN networks (1 in preparation) 38 Future Work and Open Problems ● Hybrid simulations in the TDAQ context ○ Design of the TDAQ architecture for Phase II (2026) ○ Mathematical models for trigger data flows (e.g. ROS -> DCM) ● Extend packet and fluid models ○ Huge range of packet features/protocols. Challenging: wireless networks, SDNs. ○ Other mathematical models (discontinuous) that better exploit QSS features (e.g. different ODEs for the different TCP states) ● Extend the study of hybrid network models ○ Parameter sensitivity (e.g. 𝚫Qrel, 𝚫Qmin) and analysis of variance (e.g. Jitter) ○ Different smoothing techniques (e.g. higher orders in QSS) ○ Study the "Ripple effect" according to network topology 39 Current last-minute application of the hybrid network framework: Modeling COVID19 spread on a "network" of Urban Conglomerates in Argentina ● Dynamic model: a Susceptible Infected Removed (SIR) represented as a network of ODEs and solved with QSS in each urban cluster + Individual persons moving among clusters ● Georeferenced SIR models. Interaction graph defined with Python (Py2PDEVS) days 40 Thanks! Questions? 41 Main references - Bonaventura, M., D. Foguelman, and R. Castro. 2016. “Discrete Event Modeling and Simulation-Driven Engineering for the ATLAS Data Acquisition Network”. Computing in Science & Engineering 18:70–83. - Foguelman, D. J., M. Bonaventura, and R. D. Castro. 2016. “MASADA: A Modeling and Simulation Automated Data Analysis Framework for Continuous Data-Intensive Validation of Simulation Models”. - Laurito, A., M. Bonaventura, M. E. Pozo Astigarraga, and R. Castro. 2017. “TopoGen: A network Topology Generation Architecture with Application to Automating Simulations of Software Defined Networks”. In Proceedings of the 2017 Winter Simulation Conference, Volume 50, 1049–1060 - Bonaventura, M., and R. Castro. 2018. “Fluid-Flow and Packet-Level Models of Data Networks Unified Under a Modular/Hierarchical Framework: Speedups and Simplicity, Combined”. 2018 Winter Simulation Conference. - Bonaventura, M., M. Jonckheere, and R. Castro. 2018. “Simulation Study of Dynamic Load Balancing for Processor Sharing Servers with Finite Capacity Under Generalized Halfin-Whitt Regimes”. 2018 Winter Simulation Conference - Zeigler, B. P., A. Muzy, and E. Kofman. 2018. Theory of Modeling and Simulation 3rd Edition: Discrete Event and Iterative System Computational Foundations. Elsevier. - G. A. Wainer and P. J. Mosterman, Discrete-event modeling and simulation: Theory and applications. CRC Press, 2010. - V. Misra, W.-B. Gong, and D. Towsley. “Fluid-based Analysis if a Network of AQM Routers Supporting TCP Flows with an Application to RED”. ACM/SIGCOMM, 2000 42