Simulation for Grid Computing Henri Casanova Univ. of California, San Diego

Simulation for Grid Computing Henri Casanova Univ. of California, San Diego casanova@cs.ucsd.edu 1 Grid Research  Grid researchers often ask questions in the broad area of distributed computing  Which scheduling algorithm is best for this application on a given Grid?  Which design is best for implementing a distributed Grid resource information service?  Which caching strategy is best for enabling a community of users that to distributed data analysis?  What are the properties of several resource management policies in terms of fairness and throughput?  What is the scalability of my publish-subscribe system under various levels of failures  ... Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 2 Grid Research  Analytical or Experimental?  Analytical Grid Computing research  Some have developed purely analytical / mathematical models for Grid computing  makes it possible to prove interesting theorems  often too simplistic to convince practitioners  but sometimes useful for understanding principles in spite of dubious applicability  One often uncovers NP-complete problems anyway  e.g., routing, partitioning, scheduling problems  And one must run experiments  Grid computing research: based on experiments  Most published works Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 3 Grid Computing as Science?  You can tell you are a scientific discipline if  You can read a paper, easily reproduce (at least a subset of) its results, and improve  You can tell to a grad student “Here are the standard tools, go learn how to use them and come back in one month”  You can give a 1-hour seminar on widely accepted tools that are the basis for doing research in the area  We are not there today  But perhaps I can give a 1-hour seminar on emerging tools that could be the basis for doing research in the area, provided a few open questions are addressed  Need for standard ways to run “Grid experiments” Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 4 Grid Experiments  Real-world experiments are good  Eminently “believable”  Demonstrates that proposed approach can be implemented in practice  But... Can be time-intensive  Execution of “applications” for hours, days, months, ...  Can be labor-intensive  Entire application needs to be built and functional  including all design / algorithms alternatives  include all hooks for deployment  Is it a bad engineering practice to build many full-fledge solutions to find out which ones work best? Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 5 Grid Experiments (2)  What experimental testbed?  My own little testbed  well-behaved, controlled, stable  often not representative of real Grids  Real grid platforms  (Still) challenging for many grid researchers to obtain  Not built as a computer scientist’s playpen  other users may disrupt experiments  other users may find experiments disruptive  Platform will experience failures that may disrupt the experiments  Platform configuration may change drastically while experiments are being conducted  Experiments are uncontrolled and unrepeatable  even if disruption from other users is part of the experiments, it prevents back-to-back runs of competitor designs / algorithms Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 6 Grid Experiments (3)  Difficult to obtain statistically significant results on an appropriate testbed  And to make things worse... Experiments are limited to the testbed     What part of the results are due to idiosyncrasies of the testbed? Extrapolations are possible, but rarely convincing Must use a collection of testbeds... Still limited explorations of “what if” scenarios  what if the network were different?  what if we were in 2015?  Difficult for others to reproduce results  This is the basis for scientific advances! Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 7 Simulation  Simulation can solve many (all) of these difficulties      No need to build a real system Conduct controlled / repeatable experiments In principle, no limits to experimental scenarios Possible for anybody to reproduce results ...  Simulation  “representation of the operation of one system (A) through the use of another (B)”  Computer simulation: B  a computer program  Key question: Validation  correspondence between simulation and real-world Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 8 Simulation in Computer Science  Microprocessor Design  A few standard “cycle-accurate” simulators are used extensively  http://www.cs.wisc.edu/~arch/www/tools.html  Possible to reproduce simulation results  Networking  A few standard “packet-level” simulators  NS-2, DaSSF, OMNeT++     Well-known datasets for network topologies Well-known generators of synthetic topologies SSF standard: http://www.ssfnet.org/ Possible to reproduce simulation results  Grid Computing?  None of the above up until a few years ago  Most people built their own “ad-hoc” solutions  Promising recent developments Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 9 Simulation of Distributed Computing Platforms?  Simulation of parallel platforms used for decades  Most simulations have made drastic assumptions  Simplistic platform model  Processors perform work at some fixed rate (Flops)  Network links send data at some fixed rate  Topology is fully connected (no communication interference) or a bus (simple communication interference)  Communication and computation are perfectly overlappable  Simplistic application model  All computation is CPU intensive  Clear-cut communication and computation phases  Application is deterministic  Straightforward simulation in most cases  Just fill in a Gantt chart with a computer rather than by hand  No need for a “simulation standard” Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 10 Grid Simulations?  Simple models:  perhaps justifiable for a switched dedicated cluster running a matrix multiply application  Hardly justifiable for grid platforms  Complex and wide-reaching network topologies  multi-hop networks, heterogeneous bandwidths and latencies  non-negligible latencies  complex bandwidth sharing behaviors  contention with other traffic  Overhead of middleware  Complex resource access/management policies  Interference of communication and computation Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 11 Grid Simulations  Recognized as a critical area  Grid eXplorer (GdX) project (INRIA)  Build an actual scientific instrument      Databases of “experimental conditions” 1000-node cluster for simulation/emulation Visualization tools Still in planning What simulation technology???  Two main questions 1. What does a “representative” Grid look like? 2. How does one do simulation on a synthetic representative Grid? Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 12 Presentation Outline  Introduction  Simulation for Grid Computing?  Generating Synthetic Grids  Simulating Applications on Synthetic Grids  Current Work and Future Directions Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 13 Why Synthetic Platforms?  Two goals of simulations:  Simulate platforms beyond the ones at hand  Perform sensitivity analyses  Need: Synthetic platforms  Examine real platforms  Discover principles  Implement “platform generators”  What Simulation results in my paper?  Results for a few “real” platforms  Results for many synthetic platforms Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 14 Generation of Synthetic Grids  Three main elements  Network Topology  Graph Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 15 Generation of Synthetic Grids  Three main elements  Network Topology  Graph  Bandwidth and Latencies Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 16 Generation of Synthetic Grids  Three main elements  Network Topology  Graph  Bandwidth and Latencies  Compute Resources  And other resources Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 17 Generation of Synthetic Grids  Three main elements  Network Topology  Graph  Bandwidth and Latencies  Compute Resources  And other resources  “Background” Conditions  Load and unavailability Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 18 Generation of Synthetic Grids  Three main elements  Network Topology  Graph  Bandwidth and Latencies  Compute Resources  And other resources  “Background” Conditions X X X X  Load and unavailability  Failures Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 19 Generation of Synthetic Grids  Three main elements  Network Topology  Graph  Bandwidth and Latencies  Compute Resources X X  And other resources  “Background” Conditions X X  Load and unavailability  Failures  What is Representative and Tractable? Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 20 Synthetic Network Topologies  The network community has wondered about the properties of the Internet topology for years  The Internet grows in a decentralized fashion with seemingly complex rules and incentives  Could it have a mathematical structure?  Could we then have generative models?  Three “generations of graph generators”  Random  Structural  Degree-based Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 21 Random Topology Generators  Simplistic  Place N nodes randomly in a square  Vertices u,v connected with prob. P(u,v) =   Waxman [JSAC’88]  Place N nodes randomly in a CxC square  P(u,v) =  e-d / ( C √2), 0 < ,  ≤ 1  d = Euclidian distance between u and v  First model to be widely used for network simulations  Others  Exponential, Locality-based, ...  Shortcoming: Real networks have a non-random structure with some “hierarchy” Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 22 Structural Generators  “... the primary structural characteristic affecting   the paths between nodes in the Internet is the distribution between stub and transit domains... In other words, there is a hierarchy imposed on nodes...” [Zegura et al, 1997] Both at the “AS level” (peering relationships) and at the “router level” (Layer 2) Quickly became accepted wisdom:  Transit-Stub [Calvert et al., 1997]  Tiers [Doar, 1996]  GT-ITM [Zegura et al., 1997] Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 23 Power-Laws!  In 1999, Faloutsos et al. [SIGCOMM’99] rocked topology generation with power laws  Results obtained both for AS-level and router-level information from real networks  Outdegree (number of edges leaving a node)  For each node v, compute its outdegree dv  Node rank, rv: the index of v in the order of decreasing degree  Nodes can have the same rank  Law: dv proportional to rvR Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 24 Power-Laws!  Random Generators do not agree with Power-laws  Structural Generators do not agree with Power-laws Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 25 Degree-based Generators  New common wisdom: A topology that does not  agree with power-laws cannot be representative Flurry of development of “power-law generators” after the Faloutsos paper     CMU power law graph generator [Palmer et al., 2000] Inet [Jin et al., 2000] BRITE [Medina et al., 2001] PLRG [Aiello et al., 2000] Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 26 Structure vs. Power-law  We know network have structure AND power laws  Combine both?  GridG project [Dinda et al., SC’03]  Use a Tiers-generated topology  Add random links to satisfy the power-laws  How about just using power-laws?  Comparison paper [Tangmunarunkit et al., SIGCOMM’02]  Degree-based generators capture the large-scale structure of real networks very well, better than structural generators!  structural generators impose “too strict” a hierarchy  Hierarchy arises naturally from the degree-based generators  e.g., backbone links  Works both for AS-level and router-level Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 27 Short story  What generator?  To model large networks (e.g., > 500 routers)  use degree-based generators  To model small networks (e.g., < 100 routers)  use structural generators  degree-based will not create any hierarchy Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 28 Bandwidths, latencies, traffic, etc.  Topology generators only produce a graph  We need link characteristics as well  Option #1: Model physical characteristics  Some “models” in topology generators  Need to simulate background traffic  No accepted model for generating background traffic  Simulation can be very costly  Option #2: Model end-to-end performance  Models ([Lee, HCW’01]) or Measurements (NWS, ...)  Go from path modeling to link modeling?  Turns out to be a difficult question  DARPA workshop on network modeling Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 29 Bandwidths, latencies, traffic, etc.  Maybe none of this matters?  Fiber inside the network mostly unused  Communication bottleneck is the local link  Appropriate tuning of TCP or better protocols should saturate the local link  Don’t care about topology at all!  Or maybe none of this matters for my application  No network contention Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 30 Compute Resources  What resources do we put at the endpoints?  Option #1: “ad-hoc” generalization  Look at the TeraGrid  Generate new sites based on existing sites  Option #2: Statistical modeling  Examing many production resources  Identify key statistical characteristics  Come up with a generative/predictive model Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 31 Synthetic Clusters?  Many Grid resources are clusters  What is the “typical” distribution of clusters?  “Commodity Cluster synthesizer” [Kee et al., SC’04]  Examined 114 production clusters (10K+ procs)  Came up with statistical models  Validated model against a set of 191 clusters (10K+ procs)  Models allow “extrapolation” for future configurations  Models implemented in a resource generator Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 32 Architecture/Clock Models Processor Fraction (%) Pentium2 Celeron Pentium3 Pentium4 Itanium 4.0 Pentium2 1.4 Celeron 4.1 Pentium3 40.3 3.5 Pentium4 34.6 Itanium 3.9 Clock Rate (Ghz) 3.0 2.5 2.0 1.5 1.0 0.5 0.0 AthlonMP 12.4 AthlonXP Opteron 5000 1998 1999 2000 2001 2002 2003 2004 2005 Year 4000 # of Processors Athlon 0.0 1997 1.3 3000 2000 2.0 1000 0  Current distribution of proc families  Linear fit between clock-rate and release year within a processor family  Quadratic fraction of processors released on a given year 1998 1999 2000 Year 2001 2002 2003  Model future distributions and speeds Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 33 Other models?  Other models  Cache size: grows logarithmically  Number of processors per node: log2-normal  Memory size: log2-normal  Number of nodes per cluster: log2-normal  Models were validated against a snapshot of ROCKS clusters  These clusters have been added to the “training” set  More clusters are added every month  GridG  Provide a generic framework in which such laws and other correlations can be encoded Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 34 Resource Availability / Workload  Probabilistic models  Naive: exp. distributed availability and unavailability intervals  Sophisticated: Weibull distributions [Wolski et al.]  Traces  NWS, etc.  Desktop Grid resources [Kondo, SC’04]  Workload models  e.g., Batch schedulers  Traces  Models [Feitelson, JPDC’03]  job inter-arrival times: Gamma  amount of work requested: Hyper-Gamma  number of processors requested: Compounded (2^, 1, ...) Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 35 A Sample Synthetic Grid  Generate 5,000 routers with BRITE  Annotate latency according to BRITE’s Euclidian distance      method (scaling to obtain the desired network diameter) Annotate bandwidth based on a set of end-to-end NWS measurements Pick 30% of the end-points Generate a cluster at each end-point according to Kee’s synthesizer for Year 2006 Model cluster load with Feitelson’s model with a range of parameters for the random distributions Model resource failures based on Inca measurements on TeraGrid Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 36 Synthetic Grid Generation  Still far from widely accepted standards  Many ongoing, promising efforts  Researchers have recognized this as an issue  Tools from networking can be reused  A few “Grid” tools are available  What is really needed  Repository of “Grid Measurements”  Repository of “Synthetic/Generated Grids” Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 37 Presentation Outline  Introduction  Simulation for Grid Computing?  Generating Synthetic Grids  Simulating Applications on Synthetic Grids  Current Work and Future Directions Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 38 Simulation Levels  Simulating applications on a synthetic platform?  Spectrum of simulation levels more abstract less abstract Mathematical Simulation Based solely on equations Discrete-event Simulation Abstraction of system as a set of dependent actions and events (fine- or coarse-grain) Emulation Trapping and virtualization of low-level application/system actions  Boundaries above are blurred (d.e. simulation ~ emulation)  A simulation can combine all paradigms at different levels Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 39 Simulation Options  Network  Macroscopic: Flows in “pipes” more abstract  coarse-grain d.e. simulation + mathematical simulation  Microscopic: Packet-level simulation  fine-grain d.e. simulation  Actual flows go through “some” network  emulation  CPU  Macroscopic: Flows in a “pipe” less abstract more abstract  coarse-grain d.e. simulation + mathematical simulation  Microscopic: Cycle-accurate simulation  fine-grain d.e. simulation  Virtualization via another CPU / Virtual Machine  emulation Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 less abstract 40 Simulation Options  Application  Macroscopic: Application = analytical “flow”  Less Macroscopic: sets of abstract “tasks” with resource needs and dependencies more abstract  coarse-grain d.e. simulation  Application specification or pseudo-code API  Virtualization  emulation of actual code with trapping of application-generated events less abstract  Two projects  MicroGrid (UCSD)  SimGrid (UCSD + IMAG + Univ. Nancy) Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 41 MicroGrid  Set of simulation tools for evaluating middleware,  applications, and network services for Grid systems Applications are supported by emulation and virtualization  Actual application code is executed on “virtualized” resources  Microgrid accounts for Application  CPU  network  Virtualization Virtual Resources MicroGrid Physical Resources Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 42 MicroGrid Virtualization  Resource virtualization  “resource names” are virtualized  gethostname, sockets, Globus’ GIS, MDS, NWs, etc.  Time virtualization  Simulating the TeraGrid on a 4 node cluster  Simulating a 4 node cluster on the TeraGrid  CPU virtualization  Direct execution on (a fraction of) a physical resource  No application modification  Main challenge  Synchronization between real time and virtual time Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 43 MicroGrid Network  Packet-level simulation  Network calls are intercepted  Are sent to a network simulator that has been configured with the virtual network topology  Implemented with MaSSF  An MPI version of DaSSF [Liu et al., 2001]  Configured via SSF standard (MDL, etc.)  Protocol stacks implemented on top of MaSSF  TCP, UDP, BGP, etc.  Socket calls are trapped without application modification  real data is sent  delays are scaled to match the virtual time  Main challenges  scaling (20,000 routers on a 128-node cluster)  synchronization with computation Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 44 MicroGrid in a NutShell (emulation) Virtualization via trapping of application events (emulation)  Can be really slow  But hopefully accurate  Can have high overhead  But captures the overhead! Virtualization via another CPU Microscopic: Packet-level simulation (fine-grain discrete event simulation)  Can be really slow for long transfers  But hopefully accurate CPU Application Network less abstract Emulation Edinburgh, Scotland, June 2005 Discrete-event Simulation Grid Performance Workshop 2005 Mathematical Simulation more abstract 45 SimGrid  Originally developed for scheduling research  Must be fast to allow for thousands of simulation  Application  No real application code is executed  Consists of tasks that have  dependencies  resource consumptions  Resources  No virtualization  A resource is defined by  a rate a which it does “work”  a fixed overhead that must be paid by each task  traces of the above if needed + failures  A task can use multiple resources  data transfer over multiple links, computation that uses a disk and a CPU Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 46 SimGrid  Uses a combination of mathematical simulation and coarse-grain discrete event simulation  Simple API to “specify” an application rather than having it already implemented  Fast simulation  Key issue: Resource sharing  In MicroGrid: resource sharing “emerges” out of the lowlevel emulation and simulation  Packets of different connections interleaved by routers  CPU cycles of different processes get slices of the CPU  Drawback: slow simulation  How can one do something faster that is still reasonable?  Come up with macroscopic models of resource sharing Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 47 Resource Sharing in SimGrid  Macroscopic resource sharing can be easy  A CPU: CPU-bound processes/threads get a fair share of the CPU “in steady state”  Why go through the trouble of emulating CPU-bound processes?  Just say how many cycles they need, and “compute” how many cycles they get per second  Macroscopic resource sharing can be not so easy  The Network  Many end-points, routers, and links  Many end-to-end TCP flows?  How much bandwidth does each flow receive? Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 48 Bandwidth Sharing  Macroscopic TCP modeling is a field  “Fluid in Pipe” analogy  Rule of Thumb: Share of what a flow gets on its bottleneck link is inversely proportional to its Round-Trip Time 10 Mb/sec 10 Mb/sec 10 Mb/sec 8 Mb/sec 8 Mb/sec 10 Mb/sec 2 Mb/sec 8 Mb/sec 8 Mb/sec  Turns out TCP in steady-state implements a type of resource sharing called: Max-Min Fairness Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 49 Max-Min Fairness  Principle  Consider the set of all network links, L  cl is the capacity of link l  Considers the set of all flows, R  a flow = a subset of L  xr is the bandwidth allocated to flow r  Bandwidth capacities are respected  l  L, ∑ r  R | l  r xr ≤ cl  TCP in steady-state is such that: minr  R xr is maximized  The above can be solved efficiently (with appropriate data structures) Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 50 SimGrid  Uses the Max-Min fairness principle for all resource sharing  fast  validated in the real-world for CPUs  validated with NS-2 for networks  Limitation  Max-Min fairness is for steady-state  e.g., no TCP slow-start  e.g., no process priority boosts  Unclear when it starts breaking down  Is justified for “long enough” transfers and computations  reasonable for scientific applications  not so much for applications such as a Grid information service Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 51 SimGrid in a NutShell Macroscopic: Flows in a “pipe (mathematical simulation + coarse-grain d.e. simulation)  Very fast  Not accurate for short transfers Macroscopic: Flows in a “pipe (mathematical simulation + coarse-grain d.e. simulation)  Very fast  Abstract application model Macroscopic: abstract “tasks” with resource needs and dependencies (coarse-grain d.e. simulation)  Very fast  Abstract application model CPU Application less abstract Emulation Edinburgh, Scotland, June 2005 Discrete-event Simulation Grid Performance Workshop 2005 Network Mathematical Simulation more abstract 52 Other Projects  ModelNet  Network emulation  unmodified application  packets routed through a “core cluster”  GigaBit-switched nodes running a modified kernel  Emulates router queues  More “emulation” than MicroGrid  Only for networking, but plans to add support for computation  Still many “in-house” simulators that may aspire to become widely-used tools       EmuLab/DummyNet ChicagoSim OptorSim EDGSim GridSim ... Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 53 So what should I use?  It really depends on your goal / resources  SimGrid’s network model has clear limitations, e.g. for short transfers  SimGrid simulations are easy to set up  MicroGrid simulations take a lot of time (although they can be parallelized)  ModelNet requires some hardware setup  SimGrid does not require for a full application to be written  MicroGrid models overhead of system calls implicitly  ...  Key trade-off: accuracy and speed  The more abstract the simulation the fastest  The less abstract the simulation the most accurate  Does this trade-off really hold? Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 54 Simulation Validation  The crux of most simulation work in most domains of  computer science Validation is difficult and almost never done convincingly  Provide justification that the model is plausible  Convince people that the simulator implements the model (“verification”)  Provide a few graphs that show that it’s reasonable  validation in a few special cases, at best  validation against another “validated” simulator  Argue that although absolute values are off, the trends are respected  Conclude that the simulator is useful to compare algorithms/designs  Obtain scientific results????? Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 55 FLASH vs. FLASH  FLASH vs. (Simulated) FLASH: Closing the Simulation Loop [Gibson et al., ASPLOS’00]  FLASH project at Stanford  building large-scale shared-memory multiprocessors  Went from conception, to design, to actual hardware (32-node)  Used simulation heavily over 6 years  The authors went back and compared simulation to the real world!  Simulation error is unavoidable  30% error in their case was not rare  Negating the impact of “we got 1.5% improvement”  One should focus on simulating the important things  A more complex simulator does not ensure better simulation  simple simulators worked better than sophisticated ones, which were unstable  simple simulators predicted trends as well as slower, sophisticated ones  It is key to use the real-world to tune/calibrate the simulator  Conclusion: for FLASH, the simple simulator was all that was needed Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 56 Presentation Outline  Introduction  Simulation for Grid Computing?  Generating Synthetic Grids  Simulating Applications on Synthetic Grids  Current Work and Future Directions Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 57 Grid Simulation: Accuracy vs. Speed  Comparing simulators and validating them is a gigantic amount of work  It will never been clear-cut  identify simulation “regimes”  It doesn’t lead to many papers  It’s eminently politically incorrect  Results depend on what the simulation is used for  Therefore nobody does it  The story one would like to tell is, e.g.:  Start with SimGrid simulations at first to identify promising approaches  Move to MicroGrid emulations to precisely quantify the trade-offs  How can we substantiate this story? Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 58 Current Work in SimGrid  SimGrid uses a simulation engine called SURF  currently SURF performs a blend of mathematical simulation and discrete event simulation  Current work  Adding a “MaSSF” back-end (i.e., MicroGrid)  Adding a “ModelNet” back-end SimGrid SURF MaSSF ModelNet  Goal: evaluate the speed-accuracy trade-off for  simulation of the network Expected result:  SimGrid sufficient and fast for large-enough messages  e.g., Good for scientific applications  MaSSF/ModelNet required for small/frequent messages  e.g., Good for middleware applications (e.g., NWS), p2p applications  But beware FLASH vs. FLASH! Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 59 What about Validation?  The GRAS project [Quinson et al.]  Idea: provide a way to compile code into real-world code and into simulation code  Write the application using the GRAS API  Compile it into a SimGrid simulation  Compile it into a real-world code  currently provides its own back-end and deployment  could use Globus as a back-end  Run and compare  Goals  smooth transition from design to prototyping to production  easy validation and simulation calibration  Plan  Use GRAS to easily compare SimGrid, MaSSF, ModelNet, and real world networks  Move on to full applications Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 60 Conclusion  Simulation is difficult  Eternal question: What does really matters?  Grid researchers are actively working on it  Usable tools exist  Grid simulation today should not “re-invent the wheel”  Two crucial next steps  Repository of synthetic Grids, Grid measurement datasets, and Grid simulation software  Scientifically sound validation experiments  Validate simulators  Understand what matters and what does not  Only then will we have a scientific discipline with a standard way to conduct experiments and a way for researcher to reproduce each other’s results. Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 61 Questions? 62 Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 63 A Simple Experiment - Sent out files of 500MB up to 1.5GB with TTCP - Using from 1 up to 16 simultaneous connections - Recorded the data transfer rate per connection Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 64 Normalized data rate per connection Experimental Results Number of concurrent TCP connections Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 65 Max-Min Fairness  Captures other resource sharing beyond networks! Interference of Communication and Computation [kreaseck et al., IJHPCA’05] CPU sharing Edinburgh, Scotland, June 2005 Grid Performance Workshop 2005 70

Simulation for Grid Computing Henri Casanova Univ. of California, San Diego

Related documents

Products

Support

Simulation for Grid Computing Henri Casanova Univ. of California, San Diego

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib