The Data Logistics Toolkit Martin Swany Professor, School of Informatics and Computing Executive Associate Director, Center for Research in Extreme Scale Computing (CREST) Indiana University The Data Logistics Toolkit • Logistics - the management of the flow of resources from the point of origin to the point of consumption • The DLT integrates local and distributed storage infrastructure, file transfer software, performance monitoring and tuning • The DLT software distribution supports the creation of networkoptimized data nodes DLT Overview • Set of packages with configuration scripts, etc. • Allows the configuration of – – – – DTN with GridFTP IBP storage depot for content distribution Phoebus WAN accelerator On-ramp for Internet2 AL2S using XSP • Includes Periscope/perfSONAR monitoring • Automatic network tuning DTN with AL2S On-Ramp • Working with the Globus team at U. Chicago and Argonne • Leveraging our eXtensible Session Protocol (XSP) to create end-to-end, “sessions” – user-network interface (UNI) • XSP daemon acts as network controller – signals AL2S/OESS, OSCARS, OpenFlow • GridFTP XIO driver, updating to use the Globus Transfer Network Controller API • Generic, transparent on-ramp to circuit networks like AL2S WAN Acceleration • A key reason the Science DMZ model “works” is the separation of lossy access networks from high-bandwidth, long-latency links • Termination of TCP connections in “middleboxes” can increase throughput by reducing the RTT • Protocol translation • Storage in the network to buffer and burst Distributed Storage for Content Distribution • IBP provides a primitive, scalable, in-network storage service • File-like abstractions can be built on top of this • Uses a data structure known as an exNode (like a Unix inode) to track allocations • These basic building blocks can be used to build various instances – – – – Parallel filesystem Distributed RAID-like storage Content distribution network Bittorrent-like peer to peer transfers Architecture • Unified Network Information Service (UNIS) – Descendant of perfSONAR Lookup and Topology Services – Network and service “graph” • Intelligent Data Movement Service (IDMS) – Data dispatcher – Operates on UNIS data – Spawn storage services dynamically in GENI • Periscope/perfSONAR – Monitoring for operational integrity and optimization, BLiPP • Storage Services – IBP, prototype based on Ceph • Other services – Data transfer (GridFTP), WAN acceleration Earth Observation Depot Network (EODN) – An open, community specific content distribution network for remote sensing data Landsat data • Landsat 8 launched February 13th, 2013 • Covers the entire land surface of the Earth every 16 days – 8 day offset from Landsat 7 – • Each scene contains a GeoTIFF product: high-resolution sensor images – • ~700 scenes each day ~1GB compressed, 2GB uncompressed Traditionally used for environmental monitoring and land use and land cover change studies EODN EODN Harvester (2) harvest Landsat Ground Network (4) publish UNIS DMS EODN (DLT) web GUI WISC IU NYSER MIZZ (1) subscribe (5) fast download (6) Processing… Client RealEarth UW-Madison Cisco Appliance Platform • In collaboration with Internet2, Cisco and Fusion-io • Cisco C220 server – 2x Intel® Xeon® E5-2680, 16 cores@4GHz, 64GB DDR3 RAM – Fusion-io ioDrive2 1.2 TB • CentOS 6.4 Linux with DLT RPMs and tuning for data transfer throughput Acknowledgements • Staff Scientist Dr. Ezra Kissel leads the DLT development efforts, PI of the GENI IDMS effort • CC-NIE integration project with U. Tennessee and Vanderbilt U. • CC-NIE integration project with the Globus team at U. Chicago and Argonne Nat’l Lab • EODN development with AmericaView, U. Wisconsin 12 Phoebus-SLaBS performance GridFTP transfers over dedicated 10G path, increasing WAN latency, 4ms LAN RTT and .001% edge loss