testbed-chpc - University of Utah

advertisement
Cluster or Network?
An Emulation Facility for Research
Jay Lepreau
Chris Alfeld
David Andersen (MIT) Mac Newbold
Rob Place
Kristin Wright
Dept. of Computer Science
University of Utah
http://www.cs.utah.edu/flux/testbed/
February 3, 2000
1
Research We Do
• Operating systems, local and distributed
• Distributed systems
Web caching schemes, distributed objects, ...
• Active Networks
code in every packet: route me!
Configurable router
• Router operating systems
2
What?
• A configurable Internet (cluster) in a room
230 nodes, 1000 links, BFS (switch)
virtualizable topology, links, software
• An instrument for experimental CS research
• Universally available to any remote
experimenter
• Simple to use!
3
Why?
• “We evaluated our system on five nodes.”
-job talk from university with 300-node cluster
• “We evaluated our Web proxy design with 10 clients on
100Mbit ethernet.”
• “Simulation results indicate ...”
• “Memory and CPU demands on the individual nodes were
not measured, but we believe will be modest.”
• “The authors ignore interrupt handling overhead in their
evaluation, which likely dominates all other costs.”
• “Resource control remains an open problem.”
4
Why 2
• “You have to know the right people to get access to the
cluster.”
• “The cluster is hard to use.”
• “<Experimental network X> runs FreeBSD 2.2.x.”
• “October’s schedule for <experimental network Y> is…”
• “<Experimental network Z> is tunneled through the
Internet”
5
Complementary to Other Experimental
Environments
• Simulation
• Small static testbeds
• Live networks
• Maybe someday, a large scale set of
distributed small testbeds (“Access”)
6
Some Unique Characteristics
• Significant scale: initially 225 nodes, degree four
100Mb links between 42 core routers.
• User-configurable control of “physical” characteristics:
shaping of link latency/bandwidth/drops/errors
(via invisibly interposed “shaping nodes”),
router processing power, buffer space, …
• Node breakdown: 42 core, 160 edge, 26 shaping, 2
management
7
More Unique Characteristics
• Capture of low-level node behavior such as interrupt
load and memory bandwidth
• User-replaceable node OS software
• User-configurable physical link topology
(VLAN via BFS; “P-LAN” via BFPP)
• Completely configurable and usable by external
researchers, including node power cycling
8
Fundamental Research Leverage:
Extremely Configurable
9
Obligatory Pictures
10
Prototype Pieces: edge nodes
11
Big Iron
12
A View from the Dark Side
13
And the Light Side
14
Artist’s Conception
15
Zoom in: “Delay” Node
16
Feature:
Automatic mapping of desired topologies and
characteristics to physical resources
• Algorithm goals:
minimize likelihood of experimental artifacts (bottlenecks)
“optimal” packing of multiple simultaneous experiments
 Complete in finite time!
• Constraint-based heuristic algorithm (version 2!)
• Feature: accepts ns-compatible specification
17
Current Algorithm
• Simulated annealing
Make random change (move node from one switch to another),
compute score, accept/reject based on current temp.
• Heuristic algorithm
• ~ 4 seconds for 30 nodes; polynomial
• Improve:
Hardwired node connections will slow it down x100
Edge nodes
Speed - incremental score recomputation
18
Virtual Topology
Mapping into Physical Topology
Roatan: Remote Console for a Node
21
Early Network Configuration GUI
22
Research Applications
• Simulation validation
• Active networks
• Resource demands of services inside routers
• Denial-of-service resistance
• Interaction of adaptive applications and protocols
• All sorts of distributed system experiments
• ...
23
Research Applications (continued)
• Detailed performance monitoring and analysis
• Relationships between {node, link, topology}
characteristics and
Application performance
Task scheduling and assignment
Communication software
Application algorihms
….
24
Study: Interconnection Techniques
• Point-to-point vs.always through a switch
Salmon et al (Caltech)
• Cost vs. performance
• Of most interest on large clusters
• Locality of communication patterns
• Interference with local processing
• Ad hoc mobile networking
25
Research Issues and Other Challenges
• Calibration, validation, and scaling: how to emulate
different speed networks? Scaling behavior of
emulating faster links by slowing nodes?
• Can we sufficiently capture real router internal
behavior in a PC?
• Assuring validity: detecting switch bottlenecks,
measuring and controlling physical characteristics
without introducing artifacts.
• Algorithms and software to map requirements to
resources while minimizing artifacts.
• Integrate with ns?
• Providing a reasonable user interface to all this.
26
Final Remarks
• Should be limping next month
• Looking for feedback on your potential use
• Looking for early users
• Collaborators/clients: UU Physics, CMU CS,
MIT CS, Georgia Tech, IBM research
• Sponsors: University of Utah, Novell, DARPA,
Compaq, Nortel, <your_name_here>
27
Download