Network Performance Management S. Keshav C/NRG (with Rosen Sharma, Andy Choi, Wilson Huang, Lili Qiu, Russell Schwager, Rachit Siamwalla, Jia Wang, and Yin Zhang) Motivation Networks are increasing in breadth…. – greater density of connections – PCs come with built-in networking – ADSL and cable modems – wireless networking as well as in depth – variety of qualities, policies, and media The current situation Loss of productivity from – slow file access – web site disconnection – slow access to a web site – no one knows exactly why! Greater breadth and depth => even more dependency on the network => even more problems Is QoS enough? Lots of research in the area of QoS – RSVP, differential service etc. provide a good overall user experience, one stream at a time – Is QoS all there is to a good user experience? An incorrect reservation poor service for one stream A misconfigured router complete loss of service to one or more ports! Aha! User experience is affected more by ‘mundane’ network management than by ‘exotic’ QoS research This motivates our entire research effort Why networks fail Link or router failure Transient overload Unanticipated increase in load Misconfiguration Increasingly harder to detect Need Better Network Management Current approaches – GUI-centric – lots of flashing lights, but no intelligence Can detect failures but... – ad hoc capacity planning – ad hoc configuration • no way of testing other than “just try it!” Can’t manage network performance Performance management Topology discovery Configure new hardware (simulation) Collect statistics (monitoring) Fix problems (AI and simulation) Identify problems (display and simulation) Discovery: Project Octopus Temporary Set Heuristic Permanent Set Techniques DNS-ls SNMP Random probe Traceroute Directed broadcast ping Results Have automatically discovered entire CS department topology As well as entire Stanford topology (> 220 subnets) Cornell topology is being discovered as we speak! – info being shared with CIT Monitoring A PERL script uses SNMP and queries a router using various MIB entries. The MIB entries are stored in an input file. The values gathered from the router are stored in a file. The script works on both UNIX and WinNT. Monitoring (contd.) Other PERL scripts parse the data and convert it to other formats. Currently supported formats: – HTML - The data is presented in a table format in HTML. – GNUPlot graphs - The data can be graphed or saved in pbm format A Case Study: CSGate2 From 2/19/98 to 2/23/98, the router CSGate2 was probed every 5 minutes recording various statistics on the data coming into and going out of the router. Incoming bytes at CSgate2 Display goals We want to display multiple views Views should be dynamic Shoul allow expansion and contraction Rapid creation of user interface Reusability of GUI components Solution: Script Java Component-based system Reusable manageable components Can build large manageable applications Sharing over the web Record and playback Architecture Use JavaScript/Visual Basic as the scripting language Use Java to write components Create a adapter hierarchy for the current AWT components Script Java Objects Communication Abstraction Data Model multicast channels linearized data HTML pages Java structures intelligence protection by namespace structures java perl javascript Advantages Allows us to glue components using a scripting language, allowing rapid prototyping and development New components can be easily integrated For large applications, a lot of the complexity and chaos can be taken out of scripting Advantages(cont.) JavaScript can be streamed from the server, allowing for presentations and sharing Dynamic Html – layers are windows – these windows render html Storage goals We need to store topology and monitoring results somewhere Database: too structured and too much overhead File system: not enough semantics Idea: treat URL as a file system link and HTML tags as associated semantics WebFS HTML tags allow arbitrary semantic abstractions Manipulate these abstractions to present a virtualized file system grep -headings *.html sed ‘/<annot tag=foo>/jdbc(“tags.db”, “foo”)/’ The magic bullet: simulation Realistic simulation where networking subsystem interacts with other parts of kernel Fast simulation for large networks ( > 1000 hosts) Hide the abstraction of simulated network, same API as system calls FreeBSD kernel User Space machine gated msg Telnetd gated traps Telnetd ping Sockets Network Stack ping Kernel wrapper Kernel core Simulated machine machine gated msg Telnetd ping Kernel wrapper Kernel core Task based approach – a trap sends a message to kernel – an upper call is a message from kernel All components of simulated machine live on same process Simulated link More on simulated machine Capture network related system calls, file descriptor auto re-mapping. Virtual file system root Single-thread kernel, therefore no need for locking Simulated network machine Telnetd msg gated ping Kernel core Integrating with real network Use U-Net to interact with external device Router has the illusion of being in a physical network Test equipment before actual deployment Unet Physical Router Tradeoffs Balance between realism and speed – Using FreeBSD as basis for realistic simulation – Using session level simulation to speed up Ease of porting applications Open issues Fault identification – Bayesian networks? – Ensemble of experts? – Other AI approaches? How to do session-level simulation? Configuring real systems – IP9000