The Experiment Lifecycle and its Major Programs

The Experiment Lifecycle and its Major Programs 1 Experiment Lifecycle: the User Perspective 2 Creating an Experiment • Done with `batchexp’ for both batch and interactive experiments – “batch”* is historical name • Can bring the experiment to three states – swapped – pre-run only – posted – queued experiment ready to run – active – experiment swapped in 3 Swapping An Experiment • Done with `swapexp’ • Can effect several transitions – swapped to active (swap in experiment) – active to swapped (swap out experiment) – active to active (modify experiment) – posted to swapped (dequeue batch experiment) 4 Pre-run (tbprerun) • Parse NS file (parse-ns and parse.tcl) – Put virtual state in database (xmlconvert) • Do visualization layout (prerender) • Compute static routes (staticroutes) 5 swapped to active (tbswap in) • Mapping: Find nodes for experimenter – assign_wrapper – assign • Allocate nodes (nalloc) – Set serial console access (console_setup) • Set up NFS exports (exports_setup) • Set up DNS names (named_setup) • Reboot nodes and wait for them (os_setup) – Load disks if necessary (os_load) 6 swapped to active (contd.) • • • • Start event system (eventsys_control) Create VLANs (snmpit) Set up mailing lists (genelists) Failure at any step results in swapout 7 active to swapped(tbswap out) • Stop the event system (eventsys_control) • Tear down VLANs (snmpit) • Free nodes (nfree) – Scheduled reservations (sched_reserve) – Place in reloadpending experiment – Revoke console access (console_setup) • Reset DNS (named_setup) • Reset NFS exports (exports_setup) • Reset mailing lists (genelists) 8 active to active (tbswap modify) • Purpose: experiment modification – Get new virtual state (re-parse NS file) – Bring physical mapping into sync with new state • Leaves alone nodes whose physical mapping matches the new virtual state 9 Important Daemons • batch_daemon – Picks up posted experiments – Attempts a swapin – One experiment at a time for each user – Swaps out finished batch experiments • reload_daemon – Picks up nodes from reloadpending experiment – Frees them when done reloading 10 Next, in More Depth • Parsing • Resource allocation – Setup for the action: assign_wrapper – The real brains: assign • • • • • • Serial console management Link shaping IP routing support Traffic generation Inter-node synchronization Event system 12 Parsing Experiment Configurations 13 Experiment Configuration Language • General purpose OTcl scripting language based on NS • Exports an API nearly identical to that of NS albeit a subset • Testbed specific actions via the tb-* procedures – We provide a compatibility script to include when running under a NS simulation • Define your own procedures / classes / methods 14 Making sense out of others’ code • The parser is also written in OTcl • It mirrors a subset of NS classes • Implemented methods for the above classes capture the user specified experiment attributes • Convert experiment attributes to an intermediate XML format – Generic format makes it easy to add support for other configuration languages • Store the configuration in the virt_* tables such as virt_nodes, virt_lans etc. 15 Implementation Quirks • Capture top level resource names for later use – E.g.: Use 'n0' to name the physical node when the user asks for set n0 [$ns node] • Rename resource names to workaround restrictions such as in DNS – E.g.: Node 'n(0)' to 'n-0' • Parser run on ops for security reasons – Mixing trusted/untrusted OTcl code on main server (boss) is dangerous • Read tbsetup/ns2ir/README in the source tree for details 16 Assign Wrapper (PG Version) 17 Assign Wrapper • Perl frontend to assign • Converts virtual DB representation to more neutral “top” file format (input) • Converts results from plain text format into physical DB representation • assign_wrapper is extremely testbed aware • Moves information from virtual tables to physical tables 18 Virtual Representation • An experiment is really a set of tables in the database • Includes “virt_nodes” and “virt_lans” which describe the nodes and the network topology • Other tables include routes, program agents, traffic generators, virtual types, etc. 19 Virtual Representation Cont. • Example: set n1 [$ns node] set n2 [$ns node] set link0 [$ns duplex-link $n1 $n2 100MB 10ms] tb-set-hardware $n2 pc600 • Is stored in database tables: virt_node ('n1', '10.1.1.1', 'pc850', 'FBSD-STD', ...) virt_node ('n2', '10.1.1.2', 'pc600', 'RHL-STD, ...) virt_lan ('link0', 'n1', '100MB', '5ms', ...) virt_lan ('link0', 'n2', '100MB', '5ms', ...) 20 What’s a top file? • Stands for "topology" file, but thats too many syllables. • Input file to assign specifying nodes, links, desires. • Conversion of DB format to: node n2 pc850 node n1 pc600 link link0/n1:0,n2:0 n1 n2 100000 0 0 • Combine with current (free) physical resources to come up with a solution. 21 Assign Results • Assign maps n1 and n2 to pc1 and pc41 based on types and bandwidth. Nodes node1 pc1 node2 pc41 End Nodes Edges link0/n1:0,n2:0 intraswitch pc1/eth3 pc41/eth1 End Edges • The above is a “simplified” version of actual results. Gory details available elsewhere. 22 Assign Wrapper Continues • Allocate physical resources (nodes) as specified by assign • Allocate virtual resources (vnodes) on physical nodes (local and remote) • If some nodes already allocated (someone else got them before you), try again • Keep trying until maximum try exceeded; assign might fail to find a solution on first N tries 23 Assign Wrapper Keeps Going … • Insert set of “vlans” into database – pc1/eth3 connected to pc41/eth1 • Update “interfaces” table with IP addresses assigned by the parser • Update “nodes” table with user specified values from virt_nodes. – Osids, rpms, tarballs, etc. • Update “linkdelays” table with end node traffic shaping configuration (from virt_lans) 24 And Going and Going • Update “delays” table with delay node traffic shaping configuration • Update “tunnels” table with tunnel configuration (widearea nodes) • Update “agents” table with location of where events should be sent to control traffic shaping • Call exit(0) and rest! 25 Resource Allocation: assign 26 assign’s job • Maps virtual resources to local nodes and VLANs • General combinatorial optimization approach to NP-hard problem • Uses simulated annealing • Minimizes inter-switch links, number of switches, and other constraints. • Takes seconds for most experiments 27 What’s Hard About It? • Satisfy constraints – Requested types – Can’t go over inter-switch bandwidth – Domain-specific constraints • LAN placement for virtual nodes • Subnodes • Maximize opportunity for future mappings – Minimize inter-switch bandwidth – Avoid scarce nodes 28 What It Can Do • Handle multiple types of nodes on multiple switches • Allow users to ask for classes of nodes • Prefer/discourage use of certain nodes • Map multiple virtual nodes to one physical node • Handle nodes that are 'hosted' in some other node • Partial solutions 29 What It Doesn't Do • Map based on observed end-to-end network characteristics – Applicable to wide-area and wireless – But, we have another program, wanassign, that can • Satisfy requests for specific link types – But, we could approximate with subnodes • Full node resource description 30 Issues • Complicated – – – – Several authors Subject of paper evaluating many configurations Nature of randomized algorithm makes debugging hard Evolved over time to keep up with features • Scaling – Particularly with virtual and simulated nodes • Not just scale (1000’s), it’s the type of node – Pre-passes may help • The good: it’s coped with a lot of new demands! 31 Remote Console Access 32 Executive Summary • • • • • • Allow user access to consoles via serial line Console proxy enables remote access Authentication and encryption All console output logged Requires OS support for serial consoles Utah Emulab: all nodes have serial lines – Not required, but handy 33 Serial Consoles • Can redirect console in three places – BIOS: on most “server” motherboards – Boot loader: easy on BSD and Linux – OS: easy on BSD and Linux • Boot loaders and OSes must be configured – Generally via boot loader configuration 34 The serial line proxy (capture) • Original purpose was to log console output – Read/write serial line, log data, present tty IF – Use “tip” to access pty • Enhanced to “remote” the console – Present a socket interface – Can be accessed from anywhere on the network • One capture process per serial line 35 Authentication (capserver) • Only users in an experiment can access • Use a one-time key – capture running on serial line host generates new key for every “session” • Sends key to capserver on the boss node – capserver records key in DB, returns ownership info – capture uses info to protect ACL and log files 36 Clients (console, tiptunnel) • console is the replacement for tip – Run on ops, obtains access info via ACL file created by capture – File permissions restrict user access • tiptunnel is the remove version – Binaries for Linux, BSD, Windows – Run as a helper app from browser – Access info passed via secure web connection – All communication via SSL 37 Emulab Link Shaping 38 Executive Summary • Emulab allows setting and modification of bandwidth, latency, and loss rate on a perlink basis • Interface through NS script or command • Implemented either by dedicated “delay” nodes or on end nodes • Delay nodes work with any end node OS • End node shaping for FreeBSD or Linux 39 Delay nodes • Run FreeBSD + dummynet + bridging • FreeBSD kernel: – Runs at 10000Hz to improve accuracy – Uses polling device drivers to reduce overhead • • • • Nodes are dedicated to an experiment One node can shape multiple links Transparent to end nodes Not transparent to switch fabric 40 VLANs and Delay Nodes - Diagram 41 End node shaping (“link delays”) • Handle link shaping at both ends of the link • Requires OS support on the end nodes – FreeBSD: dummynet – Linux: “tc” with modifications • Conserves Emulab resources at potential expense of emulation fidelity • Works in environments where delay nodes are not practical or possible 42 Dynamic control • Link settings can be modified at “run time” – at commands in the NS file – tevc command • Run a control agent (delay_agent) on all nodes implementing shaping • Listens for events, interacts with kernel to effect changes • OS specific 43 IP routing support in Emulab 44 Executive Summary • Emulab offers three options for IP routing in a topology: none, manual, or automatic • Specified via the NS file • Routes setup automatically at boot time • There is no agent for dynamic modification of routes 45 User-specified routing • “None” – No experimental network routes will be setup – Used for LANs and routing experiments • “Manual” – Explicit specification of routes in the NS file – Routes becomes part of DB state of experiment – Passed to a node at boot, part of self-config – Implies IP forwarding enabled 46 Emulab-provided routing • “Static” – Emulab calculates routes at experiment creation (routecalc, staticroutes) – Shortest path calculation between all pairs – Optimized to coalesce into network routes • “Session” – Dynamic routing: runs gated/OSPF on all nodes – Auto-generated config file uses only active experimental interfaces 47 Routing Gotcha’s • Node default route uses the control net – Missing manual routes result in lost traffic • Control net is visible to routing daemons – Makes their job easy (one hop to anyone) • NxN "Static" route computation and storage do not scale as N increases, such as in multiplexed virtual nodes 48 Traffic Generation in Emulab 49 Executive Summary • Emulab allows experiments to run and control background traffic generators • Interface through NS script or command line tool • Constant Bit Rate traffic only right now • UDP or TCP only right now 50 Implementation details • Based on TG (http://www.postel.org/tg/) – UDP or TCP, one-way, various distributions of interarrival and length • Modified to be an event agent – Start and stop, change packet rate and size • Interface: – NS: standard syntax for traffic sources/sinks – tevc command line tool 51 Inter-node synchronization in Emulab 52 Executive Summary • Provides a simple inter-node barrier synchronization mechanism for experiments • Example: wait for all nodes to finish running a test before starting the next one • Not a centralized service (per-experiment infrastructure), scales well • Easy to use: can be scripted 53 History • Originally implemented a single-barrier, single-use “ready” mechanism: – Allowed users to know when all nodes were “up” – Used centralized TMCC to report/query status – Network/server unfriendly: constant polling • Users wanted a more general mechanism – Multiple barriers, reusable barriers • Tended to roll their own – Often network unfriendly as well 54 Enter the Sync Server • In NS file, declare a node as the server: – set node1 [$ns node] – tb-set-sync-server $node1 • When node boots, it starts up the sync server automatically • Nodes requiring synchronization use emulab-sync application • Use can be scripted using program agent 55 Example client use • One node acts as barrier master, initializing barrier and waiting for a number of clients: – /usr/testbed/bin/emulab-sync -i 4 • All other client nodes contact the barrier: – /usr/testbed/bin/emulab-sync • emulab-sync blocks until the barrier count is reached 56 Implementation • Simple TCP-based server and client program – UDP version in the works • Client: – Gets server info from a config file written at boot – Connect to server and write a small record – Block until a reply is read • Server: – Accept connections, read records from clients – Write a reply when all clients have connected 57 Issues • Why not use the event system for synchronization? – Event system is a centralized service – As we move to decentralization, may reconsider • Authentication: none – Local: uses shared control net so this is a problem, won't be with control net VLANs – Wide-area: wide-open, add HMAC ala events or just use event system 58 The Emulab Event System 59 Emulab Control Plane • Many of Emulab’s features are dynamically controllable: – Traffic generators: can be started, stopped, and parameters altered – Link shaping: links can be brought up and down, characteristics can be modified • Control is via the NS file, the web interface, or a command line tool. 60 Example: A Link • NS: create a shaped link: – set link0 [$ns duplex-link $n1 $n2 50Mb 10ms DropTail] • NS: control the link: – $ns at 100 "$link0 modify DELAY=20 BANDWIDTH=25" – $ns at 200 "$link0 down" • Command line: control the link – tevc -e tutorial/linktest +10 link0 down 61 What's really happening? • A link “agent” runs on each (delay) node to control all of the links for that node. • The agent listens for “events” from the server telling it what to do. • A per-experiment scheduler doles out the events at the proper time, sending them to the agents. • Other agents include the traffic generators, program objects, link tester. 62 Come on, what's really happening?! • Use Elvin (http://elvin.dstc.edu.au/) – off-the-shelf publish-subscribe system • Agents "listen" for events by "subscribing" to those they care about. • The per-experiment scheduler "publishes" events as they come due. • Events flow from the scheduler through the Elvin daemon to the nodes, and ultimately to the agents that wanted them. 63 Static/Dynamic event flow 64 Issues: Time • What happens to “event time” when an experiment is swapped? – Run in real time: events could be lost – Suspend time: dilation of experiment time – Restart time: replay static event stream • Timing for dynamic events – tevc … +10 link0 down; tevc … +10 link1 up – What is the latency between events? • What latency do we need to guarantee? 65 Issues: Security • Elvin mechanism is too heavyweight – Requires encryption to protect authentication keys – We have no reason to encrypt our events • Don't want to tie ourselves to Elvin – In principle – Elvin has gone closed source • Emulab past: no authentication, no wide-area • Emulab current: use end-to-end HMAC – Key transferred via TMCC – Wide-area nodes supported, cannot inject events 66 Issues: Scaling • Open Elvin TCP connection for every agent – Use per-node proxy – But agents still send events directly to boss – And there are still a lot of nodes • Use UDP? – What about lost events? • Deliver static events to nodes early? – Doesn't help dynamic (“now”) events • Multicast, someday (not the current usage model) • You’d think we could just find a better pub/sub system, but haven’t. 67

The Experiment Lifecycle and its Major Programs

Related documents

Products

Support

The Experiment Lifecycle and its Major Programs

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib