ROSS: Parallel Discrete-Event Simulations on Near Petascale Supercomputers Christopher D. Carothers Department of Computer Science Rensselaer Polytechnic Institute chrisc@cs.rpi.edu Sponsors: NSF CAREER, NeTS, PetaApps, ANL/ALCF Motivation Why Parallel Discrete-Event Simulation (DES)? – Large-scale systems are difficult to understand – Analytical models are often constrained Parallel DES simulation offers: – Dramatically shrinks model’s execution-time – Prediction of future “what-if” systems performance – Potential for real-time decision support • Minutes instead of days • Analysis can be done right away – Example models: national air space (NAS), ISP backbone(s), distributed content caches, next generation supercomputer systems. Ex: Movies over the Internet • Suppose we want to model 1 million home ISP customers downloading a 2 GB movie • How long to compute? – Assume a nominal 100K ev/sec seq. simulator – Assume on avg. each • 16+ trillion events @ 100K ev/sec packet takes 8 hops – 2GB movies yields 2 trillion 1K data packets. Over 1,900 days!!! Or – @ 8 hops yields 16+ 5+ years!!! trillion events Need massively parallel simulation to make tractable Outline • • • • • • Intro to DES Time Warp and other PDES Schemes Reverse Computation Blue Gene/L & /P ROSS Implementation ROSS Performance Results – PHOLD & PCS • Observations on PDES Performance • Future Directions Discrete Event Simulation (DES) Discrete event simulation: computer model for a system where changes in the state of the system occur at discrete points in simulation time. Fundamental concepts: • system state (state variables) • state transitions (events) A DES computation can be viewed as a sequence of event computations, with each event computation is assigned a (simulation time) time stamp Each event computation can • modify state variables • schedule new events DES Computation example: air traffic at an airport events: aircraft arrival, landing, departure arrival 8:00 schedules landed 8:05 schedules departure 9:15 arrival 9:30 processed event current event unprocessed event simulation time • Unprocessed events are stored in a pending list • Events are processed in time stamp order Discrete Event Simulation System model of the physical system independent of the simulation application Simulation Application • state variables • code modeling system behavior • I/O and user interface software calls to schedule events calls to event handlers Simulation Executive • event list management • managing advances in simulation time Event-Oriented World View state variables Integer: InTheAir; Integer: OnTheGround; Boolean: RunwayFree; Simulation application Simulation executive Now = 8:45 Pending Event List (PEL) 9:00 10:10 9:16 Event handler procedures Arrival Event { … } Landed Event { … } Departu re Event { … } Event processing loop While (simulation not finished) E = smallest time stamp event in PEL Remove E from PEL Now := time stamp of E Ex: Air traffic at an Airport Model aircraft arrivals and departures, arrival queueing Single runway model; ignores departure queueing • R = time runway is used for each landing aircraft (const) • G = time required on the ground before departing (const) State Variables • Now: current simulation time • InTheAir: number of aircraft landing or waiting to land • OnTheGround: number of landed aircraft • RunwayFree: Boolean, true if runway available Model Events • Arrival: denotes aircraft arriving in air space of airport • Landed: denotes aircraft landing • Departure: denotes aircraft leaving Arrival Events New aircraft arrives at airport. If the runway is free, it will begin to land. Otherwise, the aircraft must circle, and wait to land. R = time runway is used for each landing aircraft G = time required on the ground before departing Now: current simulation time InTheAir: number of aircraft landing or waiting to land • OnTheGround: number of landed aircraft • RunwayFree: Boolean, true if runway available Arrival Event: • • • • InTheAir := InTheAir+1; If (RunwayFree) RunwayFree:=FALSE; Schedule Landed event @ Now + R; Landed Event An aircraft has completed its landing. • • • • • • R = time runway is used for each landing aircraft G = time required on the ground before departing Now: current simulation time InTheAir: number of aircraft landing or waiting to land OnTheGround: number of landed aircraft RunwayFree: Boolean, true if runway available Landed Event: InTheAir:=InTheAir-1; OnTheGround:=OnTheGround+1; Schedule Departure event @ Now + G; If (InTheAir>0) Schedule Landed event @ Now + R; Else RunwayFree := TRUE; Departure Event An aircraft now on the ground departs for a new dest. • • • • • • R = time runway is used for each landing aircraft G = time required on the ground before departing Now: current simulation time InTheAir: number of aircraft landing or waiting to land OnTheGround: number of landed aircraft RunwayFree: Boolean, true if runway available Departure Event: OnTheGround := OnTheGround - 1; Execution Example State Variables R=3 G=4 InTheAir 0 1 2 OnTheGround 0 1 0 1 2 RunwayFree true false 0 1 2 1 0 true 3 4 5 6 7 8 9 10 11 Simulation Time Processing: Arrival F1 Arrival F2 Time Event 1 Arrival F1 3 Arrival F2 Time Time Event 3 Arrival F2 4 Landed F1 Event 4 Landed F1 Landed F1 Time Event 7 Landed F2 8 Depart F1 Landed F2 Time Event Now=1 Now=3 Now=4 Depart F2 Time Time Event Event 8 Depart F1 11 Depart F2 Now=0 Depart F1 Now=7 11 Depart F2 Now=8 Now=11 Summary • DES computation is sequence of event computations – Modify state variables – Schedule new events • DES System = model + simulation executive • Data structures – Pending event list to hold unprocessed events – State variables – Simulation time clock variable • Program (Code) – Main event processing loop – Event procedures – Events processed in time stamp order Outline • • • • • • Intro to DES Time Warp and other PDES Schemes Reverse Computation Blue Gene/L & /P ROSS Implementation ROSS Performance Results – PHOLD & PCS • Observations on PDES Performance • Future Directions How to Synchronize Parallel Simulations? parallel time-stepped simulation: lock-step execution parallel discrete-event simulation: must allow for sparse, irregular event computations barrier Problem: events arriving in the past Virtual Time Solution: Time Warp Virtual Time PE 1 PE 2 processed event “straggler” event PE 3 PE 1 PE 2 PE 3 Massively Parallel Discrete-Event Simulation Via Time Warp V i r t u a l Local Control Mechanism: error detection and rollback (1) undo state D’s (2) cancel “sent” events V i r t u a l Global Control Mechanism: compute Global Virtual Time (GVT) collect versions of state / events & perform I/O operations that are < GVT GVT T i m e T i m e LP 1 LP 2 LP 3 LP 1 LP 2 LP 3 processed event unprocessed event “straggler” event “committed” event Whew …Time Warp sounds expensive are there other PDES Schemes?… • “Non-rollback” options: – Called “Conservative” because they disallow out of order event execution. – Deadlock Avoidance • NULL Message Algorithm – Deadlock Detection and Recovery Deadlock Avoidance Using Null Messages Null Message Algorithm (executed by each LP): Goal: Ensure events are processed in time stamp order and avoid deadlock WHILE (simulation is not over) wait until each FIFO contains at least one message remove smallest time stamped event from its FIFO process that event send null messages to neighboring LPs with time stamp indicating a lower bound on future messages sent to that LP (current time plus minimum transit time between airports) END-LOOP Variation: LP requests null message when FIFO becomes empty • Fewer null messages • Delay to get time stamp information The Time Creep Problem 6.0 ORD (waiting on SFO) 7 Null messages: JFK: timestamp = 5.5 7.5 SFO: timestamp = 6.0 6.5 15 10 SFO (waiting on JFK) ORD: timestamp = 6.5 5.5 7.0 JFK (waiting on ORD) 9 8 Five null messages to process a single event! JFK: timestamp = 7.0 SFO: timestamp = 7.5 ORD: process time stamp 7 message 0.5 Assume minimum delay between airports is 3 units of time JFK initially at time 5 Many null messages if minimum flight time is small! Livelock Can Occur! Suppose the minimum delay between airports is zero! 5.0 15 10 SFO (waiting on JFK) ORD (waiting on SFO) 7 5.0 5.0 5.0 JFK (waiting on ORD) 9 8 Livelock: un-ending cycle of null messages where no LP can advance its simulation time There cannot be a cycle where for each LP in the cycle, an incoming message with time stamp T results in a new message sent to the next LP in the cycle with time stamp T (zero lookahead cycle) Lookahead The null message algorithm relies on a “prediction” ability referred to as lookahead • “ORD at simulation time 5, minimum transit time between airports is 3, so the next message sent by ORD must have a time stamp of at least 8” Lookahead is a constraint on LP’s behavior • Link lookahead: If an LP is at simulation time T, and an outgoing link has lookahead Li, then any message sent on that link must have a time stamp of at least T+Li • LP Lookahead: If an LP is at simulation time T, and has a lookahead of L, then any message sent by that LP must will have a time stamp of at least T+L – Equivalent to link lookahead where the lookahead on each outgoing link is the same Lookahead and the Simulation Model Lookahead is clearly dependent on the simulation model • could be derived from physical constraints in the system being modeled, such as minimum simulation time for one entity to affect another (e.g., a weapon fired from a tank requires L units of time to reach another tank, or maximum speed of the tank places lower bound on how soon it can affect another entity) • could be derived from characteristics of the simulation entities, such as nonpreemptable behavior (e.g., a tank is traveling north at 30 mph, and nothing in the federation model can cause its behavior to change over the next 10 minutes, so all output from the tank simulator can be generated immediately up to time “local clock + 10 minutes”) • could be derived from tolerance to temporal inaccuracies (e.g., users cannot perceive temporal difference of 100 milliseconds, so messages may be timestamped 100 milliseconds into the future). • simulations may be able to precompute when its next interaction with another simulation will be (e.g., if time until next interaction is stochastic, pre-sample random number generator to determine time of next interaction). Lookahead changes as LP topology changes which can have a profound impact on the performance of network models (wired or wireless). Why is Lookahead Important? problem: limited concurrency each LP must process events in time stamp order event without lookahead possible message OK to process LP D LP C with lookahead possible message OK to process LP B LP A not OK to process yet LTA LTA+LA Logical Time Each LP A using logical time declares a lookahead value LA; the time stamp of any event generated by the LP must be ≥ LTA+ LA • Lookahead is used in virtually all conservative synch. protocols • Essential to allow concurrent processing of events Lookahead is necessary to allow concurrent processing of events with different time stamps (unless optimistic event processing is used) Null Message Algorithm: Speed Up • • • toroid topology message density: 4 per LP 1 millisecond computation per event • • vary time stamp increment distribution ILAR=lookahead / average time stamp increment Conservative algorithms live or die by their lookahead! Deadlock Detection & Recovery Algorithm A (executed by each LP): Goal: Ensure events are processed in time stamp order: WHILE (simulation is not over) wait until each FIFO contains at least one message remove smallest time stamped event from its FIFO process that event END-LOOP • • • • No null messages Allow simulation to execute until deadlock occurs Provide a mechanism to detect deadlock Provide a mechanism to recover from deadlocks Deadlock Recovery Deadlock recovery: identify “safe” events (events that can be processed w/o violating local causality), ORD 7 (waiting on SFO) deadlock state Assume minimum delay between airports is 3 10 SFO (waiting on JFK) JFK (waiting 9 8 on ORD) Which events are safe? • Time stamp 7: smallest time stamped event in system • Time stamp 8, 9: safe because of lookahead constraint • Time stamp 10: OK if events with the same time stamp can be processed in any order • No lookahead creep! Preventing LA Creep Using Next Event Time Info ORD (waiting on SFO) 15 10 SFO (waiting on JFK) 7 JFK (waiting on ORD) 9 8 Observation: smallest time stamped event is safe to process • Lookahead creep avoided by allowing the synchronization algorithm to immediately advance to (global) time of the next event • Synchronization algorithm must know time stamp of LP’s next event • Each LP guarantees a logical time T such that if no additional events are delivered to LP with TS < T, all subsequent messages that LP produces have a time stamp at least T+L (L = lookahead) No Free Lunch for PDES! • Time Warp State saving overheads • Null message algorithm Lookahead creep problem – No zero lookahead cycles allowed • Lookahead Essential for concurrent processing of events for conservative algorithms – Has large effect on performance need to program it • Deadlock Detection and Recovery Smallest time stamp event safe to process – Others may also be safe (requires additional work to determine this) • Use time of next event to avoid lookahead creep, but hard to compute at scale… Can we avoid some of these overheads and complexities?? Outline • • • • • • Intro to DES Time Warp and other PDES Schemes Reverse Computation Blue Gene/L & /P ROSS Implementation ROSS Performance Results – PHOLD & PCS • Observations on PDES Performance • Future Directions Our Solution: Reverse Computation... • Use Reverse Computation (RC) – automatically generate reverse code from model source – undo by executing reverse code • Delivers better performance – negligible overhead for forward computation – significantly lower memory utilization Ex: Simple Network Switch Original N B on packet arrival... if( qlen < B ) qlen++ delays[qlen]++ else lost++ Forward if( qlen < B ) b1 = 1 qlen++ delays[qlen]++ else b1 = 0 lost++ Reverse if( b1 == 1 ) delays[qlen]-qlen-else lost-- Benefits of Reverse Computation • State size reduction – from B+2 words to 1 word – e.g. B=100 => 100x reduction! • Negligible overhead in forward computation – removed from forward computation – moved to rollback phase • Result – significant increase in speed – significant decrease in memory • How?... Beneficial Application Properties 1. Majority of operations are constructive – e.g., ++, --, etc. 2. Size of control state < size of data state – e.g., size of b1 < size of qlen, sent, lost, etc. 3. Perfectly reversible high-level operations gleaned from irreversible smaller operations – e.g., random number generation Rules for Automation... Generation rules, and upper-bounds on bit requirements for various statement types Type T0 T1 T2 T3 T4 T5 T6 T7 T8 Description Original simple choice if() s1 else s2 compound choice if () s1; (n-way) elseif() s2; elseif() s3; else() sn; fixed iterations (n) for(n)s; variable iterations while() s; (maximum n) function call foo(); constructive v@ = w; assignment k-byte destructive v = w; assignment sequence s1; s2; sn; Nesting of T0-T7 Application Code Translated Reverse if() {s1; b=1;} if(b==1){inv(s1);} else {s2; b=0;} else{inv(s2);} if() {s1; b=1;} if(b==1) {inv(s1);} elseif() {s2; b=2;} elseif(b==2) {inv(s2);} elseif() {s3; b=3;} elseif(b==3) {inv(s3);} else {sn; b=n;} else {inv(sn);} for(n) s; for(n) inv(s); b=0; for(b) inv(s); while() {s; b++;} foo(); inv(foo)(); v@ = w; v = @w; Bit Requirements Self Child Total 1 x1, 1+ x2 max(x1,x2) lg(n) x1, lg(n) + x2, max(x1….xn) …., xn 0x n*x lg(n) x lg(n) +n*x {b =v; v = w;} 8k v = b; s1; inv(sn); s2; inv(s2); sn; inv(s1); Recursively apply the above 0x 0 x 0 0 0 8k 0 x1+ x1+…+xn ….+ xn Recursively apply the above Destructive Assignment... • Destructive assignment (DA): – examples: x = y; x %= y; – requires all modified bytes to be saved • Caveat: – reversing technique for DA’s can degenerate to traditional incremental state saving • Good news: – certain collections of DA’s are perfectly reversible! – queueing network models contain collections of easily/perfectly reversible DA’s • queue handling (swap, shift, tree insert/delete, … ) • statistics collection (increment, decrement, …) • random number generation (reversible RNGs) Reversing an RNG? double RNGGenVal(Generator g) { long k,s; double u; u = 0.0; s = Cg [2][g]; k = s / 15499; s = 138556 * (s - k * 15499) - k * 3979; if (s < 0.0) s = s + 2147483423; Cg [2][g] = s; u = u + 4.65661336096842131e-10 * s; if (u >= 1.0) u = u - 1.0; s = Cg [0][g]; k = s / 46693; s = 45991 * (s - k * 46693) - k * 25884; if (s < 0) s = s + 2147483647; Cg [0][g] = s; u = u + 4.65661287524579692e-10 * s; s = Cg [1][g]; k = s / 10339; s = 207707 * (s - k * 10339) - k * 870; if (s < 0) s = s + 2147483543; Cg [1][g] = s; u = u - 4.65661310075985993e-10 * s; if (u < 0) u = u + 1.0; s = Cg [3][g]; k = s / 43218; s = 49689 * (s - k * 43218) - k * 24121; if (s < 0) s = s + 2147483323; Cg [3][g] = s; u = u - 4.65661357780891134e-10 * s; if (u < 0) u = u + 1.0; return (u); } Observation: k = s / 46693 is a Destructive Assignment Result: RC degrades to classic state-saving…can we do better? RNGs: A Higher Level View The previous RNG is based on the following recurrence…. xi,n = aixi,n-1 mod mi where xi,n one of the four seed values in the Nth set, mi is one the four largest primes less than 231, and ai is a primitive root of mi. Now, the above recurrence is in fact reversible…. inverse of ai modulo mi is defined, bi = aimi-2 mod mi Using bi, we can generate the reverse recurrence as follows: xi,n-1 = bixi,n mod mi Reverse Code Efficiency... • Property... – Non-reversibility of indvidual steps DO NOT imply that the computation as a whole is not reversible. – Can we automatically find this “higher-level” reversibility? • Other Reversible Structures Include... – Circular shift operation – Insertion & deletion operations on trees (i.e., priority queues). Reverse computation is well-suited for small grain event models! RC Applications • PDES applications include: Original – Wireless telephone networks B – Distributed content caches – Large-scale Internet models – packet arrival... • TCP over AT&T backbone • Leverges RC “swaps” Forward – Hodgkin-Huxley neuron models if( qlen < B ) b1 = 1 – Plasma physics models using PIC qlen++ • Non-DES include: – Debugging – PISA – Reversible instruction set architecture for low power computing – Quantum computing delays[qlen]++ else b1 = 0 lost++ if( qlen < B ) qlen++ delays[qlen]++ else lost++ Reverse if( b1 == 1 ) delays[qlen]-qlen-else lost-- Outline • • • • • • Intro to DES Time Warp and other PDES Schemes Reverse Computation Blue Gene/L & /P ROSS Implementation ROSS Performance Results – PHOLD & PCS • Observations on PDES Performance • Future Directions Target Systems: Blue Gene/L & /P • Configuration: • Properties for GOOD scaling: – BG/L nodes: 2x700 MHz PPC cores – BG/P nodes: 4x850 MHz PPC cores – Dediciated compute and I/O nodes (32:1 or 8:1). – 3-D torus P2P network – Additional barrier, collective, I/O and ethernet networks – Can partition system into dedicated slices from 32 nodes to whole systems – Balanced architecture between network(s) and processor speed – Exclusive access to network and process – Exceptionally low OS jitter – Collective overheads not adversely impacted at large nodes counts 1 rack of IBM Blue Gene/L Blue Gene /L Layout Blue Gene/L SoC Blue Gene/L Network Blue Gene /P Layout Blue Gene/P Architectual Highlights • Scaled performance via density and frequency increase – 2x performance increase via doubling the processors per node. – 1.2x from frequency increase: 700 MHz 850 MHz. • Enhanced function – – – – – 4-way SMP 3 modes: SMP/ DUAL/ VNM L2, L3 changed for SMP mode DMA for torus, remote put-get, user prog. memory prefetch Enhanced 64-bit performance counters via PPC450 core Double Hummer FPU and networks are the same..except.. • Better Network – 2.4x more bandwidth, lower latency Torus and Tree neworks – 10x higher Ethernet I/O bandwidth • 72K nodes in 72 racks for 1 PF peak performance – Low power via aggressive power management Blue Gene: L vs. P Blue Gene /P Compute Card Outline • • • • • • Intro to DES Time Warp and other PDES Schemes Reverse Computation Blue Gene/L & /P ROSS Implementation ROSS Performance Results – PHOLD & PCS • Observations on PDES Performance • Future Directions Local Control Implementation • MPI_ISend/MPI_Irecv used to send/recv off core events • Event & Network memory is managed directly. – Pool is allocated @ startup • Event list keep sorted using a Splay Tree (logN) • LP-2-Core mapping tables are computed and not stored to avoid the need for large global LP maps. V i r t u a l Local Control Mechanism: error detection and rollback (1) undo state D’s (2) cancel “sent” events T i m e LP 1 LP 2 LP 3 Global Control Implementation GVT (kicks off when memory is low): 1. 2. 3. 4. 5. 6. Each core counts #sent, #recv Recv all pending MPI msgs. MPI_Allreduce Sum on (#sent - #recv) If #sent - #recv != 0 goto 2 Compute local core’s lower bound time-stamp (LVT). GVT = MPI_Allreduce Min on LVTs Algorithms needs efficient MPI collective LC/GC can be very sensitive to OS jitter V i r t u a l Global Control Mechanism: compute Global Virtual Time (GVT) collect versions of state / events & perform I/O operations that are < GVT GVT T i m e LP 1 LP 2 LP 3 So, how does this translate into Time Warp performance on BG/L & BG/P? Outline • • • • • • Intro to DES Time Warp and other PDES Schemes Reverse Computation Blue Gene/L & /P ROSS Implementation ROSS Performance Results – PHOLD & PCS • Observations on PDES Performance • Future Directions Performance Results: Setup • PHOLD • PCS – Personal Communications Services Network • ROSS parameters – – – – Synthetic benchmark model 1024x1024 grid of LPs Each LP has 10 initial events Event routed randomly among all LPs based on a configurable “percent remote” parameter – Time stamps are exponentially distributed with a mean of 1.0 (i.e., lookahead is 0). – – – – – – Cell phone call network model NxN grid Cell phone modeled as events, LPs are the “grid” region spaces. Call arrivals, service time and mobility are all exponentially distributed 4096x4096 grid of LPS w/ 10 initial cell phones per LP. Measures call blocking statistics – GVT_Interval number of times thru “scheduler” loop before computing GVT. – Batch number of local events to process before “check” network for new events. • Batch X GVT_Interval events processed per GVT epoch – KPs kernel processes that hold the aggregated processed event lists for LPs to lower search overheads for fossil collection of “old” events. – Send/Recv Buffers – number of network events for “sending” or “recv’ing”. Used as a flow control mechanism. PHOLD on 8192 BG/L cores PHOLD on 8192 BG/L cores PHOLD on 8192 BG/L cores PHOLD on 8192 BG/L cores 7.5 billion ev/sec for 10% remote on 32,768 cores!! 2.7 billion ev/sec for 100% remote on 32,768 cores!! Stable performance across processor configurations attributed to near noiseless OS… 12.27 billion ev/sec for 10% remote on 65,536 cores!! 4 billion ev/sec for 100% remote on 65,536 cores!! Rollback Efficiency = 1 - Erb /Enet Outline • • • • • • Intro to DES Parallel DES via Time Warp Reverse Computation Blue Gene/L & /P Implementation Performance Results – PHOLD & PCS • Observations on PDES Performance • Future Directions History of PHOLD Performance Year Author Event-Rate (ER) Processor Efficiency ER/(MHz * cores) 1995* Fujimoto 101,000 158 1996* Hao 95,000 238 2000* Carothers, Bauer, Pearce 375,000 186 2005* Chen & Szymanski 228 Million 221 2006* Bauer & Carothers 10 Million 63 2007 Perumalla 210 Million 37 2009 Bauer, Carothers & Holder 12.26 Billion 220 *These results are not completely comparable which explains large variation in event rate and processor efficiency Movies over the Internet Revisited • Suppose we want to model 1 million home ISP customers over AT&T downloading a 2 GB movie • How long to compute with massively parallel DES? • 16+ trillion events @ 1 Billion ev/sec … ~4.5 hours!! Observations… • • • • – – – • – – ROSS on Blue Gene indicates billion-events per second model are feasible today! Yields significant TIME COMPRESSION of current models.. LP to PE mapping less of a concern… Past systems where very sensitive to this ~90 TF systems can yield “Giga-scale” event rates. Tera-event models require teraflop systems. Assumes most of event processing time is spent in event-list management (splay tree enqueue/dequeue). Potential: 10 PF supercomputers will be able to model near peta-event systems 100 trillion to 1 quadrillion events in less than 1.4 to 14 hours Current “testbed” emulators don’t come close to this for Network Modeling and Simulation.. Outline • • • • • • Intro to DES Parallel DES via Time Warp Reverse Computation Blue Gene/L & /P Implementation Performance Results – PHOLD & TLM • Observations on PDES Performance • Future Directions Future Models Enabled by XScale Computing • Discrete “transistor” level models for whole multi-core architectures… – • Potential for more rapid improvements in processor technology… Model nearly whole U.S. Internet at packet level… – • Potential to radically improve overall QoS for all Model all C4I network/systems for a whole theatre of war faster than real-time many time over.. – Enables the real-time“active” network control.. Future Models Enabled by XScale Computing • Realistic discrete model the human brain – 100 billion neurons w/ 100 trillion synapes (e.g. connections – huge fan-out) – Potential for several exa-events per run • Detailed “discrete” agent-based model for every human on the earth for.. – gobal economic modeling – pandemic flu/disease modeling – food / water / energy usage modeling… ROSS Website… • GOTO: • odin.cs.rpi.edu