An Architectural Approach to Managing Data in Transit Micah Beck Director & Associate Professor Logistical Computing and Internetworking Lab Computer Science Department University of Tennessee DOE Data Management Workshop 3/17/2004 “Data in Transit” » After being generated by an instrument or supercomputer » Not stored in a permanent archive » Serving the diverse purposes of a community of users and applications » Being transferred, processed and stored to meet changing and unanticipated needs • Visualization • Data Mining • Collaboration • Distributed Computing Interoperability via a Common Interface » Span heterogeneous physical resources, operating systems, local management schemes » Serve changing and unexpected application requirements; enable application autonomy » We measure success in terms of infrastructure deployment scalability • In networks and distributed systems, this means number, distribution, global reach, spanning administrative domains… • The Internet is the gold standard of infrastructure deployment scalability Layering as An Architectural Approach » Abstractions at each layer can hide differences at lower layers » Exposed approaches avoid creating overly complex mechanisms at lower layers » The E2E Principle: Attributes of lower layers implemented on shared infrastructure enable deployment scalability • Generality: Serve diverse application needs, model diverse lower layer resources • Weak semantics: Don’t give too much away at one time! The IP Network Stack Application Transport Network Link Physical … common interface (IP) IP’s Failure of Scalability » Today, IP is failing as a common interface » The design of IP is out of date • Application communities are more diverse • Link layer technologies violate IP assumptions » Application communities are defining their own common interfaces for general resource sharing, deploying their own infrastructure (e.g. the Grid) » Some networking communities have abandoned interoperability at the network layer between widely divergent link layer technologies (e.g. optical switching & IP) The Transit Layer: A New Location for Interoperability » Expand the link layer to a local layer to model transfer, storage and processing resources » Insert a new transit layer between the local and network layers to implement a common interface to diverse technologies at the local layer » Adopt a highly general common interface at the transit layer, providing a uniform view of all of the resources of the network node » Build diverse network services on top of this common interface to model diverse application requirements » “Locating Interoperability in the Network Stack”, Micah Beck & Terry Moore, UT-CS-04-520, Univ. of TN CS Dept Tech Rpt The Transit Network Stack Application Transport … Network Transit common interface Local Physical transfer storage processing Transit Networking: A Unified View “… memory locations … are just wires turned sideways in time” Dan Hillis, 1982, Why Computer Science is No Good Logistical Networking: An Overlay Implementation of the Transit Layer » Logistical Networking is an overlay implementation of transit layer functionality built on top of the IP network » The Internet Backplane Protocol is the common transit layer interface for Logistical Networking » Network nodes are IBP “depots” that run as user level processes, communicate using TCP/IP as well as other link and network layer protocols » Depots also serve storage and processing resources to Logistical Networking clients LN Tools and Deployment » The Logistical Runtime System (LoRS) is a set of tools based on IBP that enable users to take advantage of the resources of IBP depots » Logistical Distribution Network (LoDN) is a data directory, monitoring and management system » The Logistical Backbone is a Resources Discovery service and global experimental IBP testbed • Over 35 TB of storage available • Over 300 depots in 21 countries • Leverages the resources of PlanetLab » Additional depots deployed at ORNL & NERSC L-Bone: August 2003 (20TB) Example LN Applications » Astrophysics: Terascale Supernova Initiative (A. Mezzacappa, ORNL; J. Blondin, NCSU) • Management of massive datasets » Fusion Energy Research (S. Klasky, PPPL) • Streaming of simulation data during generation » Viewset-Based Visualization • Prestaging & caching of distant data » Content Distribution • Heroic data distribution problems (Linux ISOs) » Multimedia Networking • Creation, mgt & delivery of high value content LN Futures and Directions » Storage • Implementation of file system services • Moving data through firewalls at line speed • QoS in highly controlled environments » Networking • Interoperability at ultrascale • Advanced services (e.g. multicast) » Computation • Offloading visualization to IBP depots • Developing sets of operations to support application communities Thank you! mbeck@cs.utk.edu http://loci.cs.utk.edu