Document

advertisement
The Data Logistics Toolkit
Martin Swany
Professor, School of Informatics and Computing
Executive Associate Director, Center for Research in Extreme
Scale Computing (CREST)
Indiana University
The Data Logistics Toolkit
• Logistics - the management of the flow of
resources from the point of origin to the point
of consumption
• The DLT integrates local and distributed
storage infrastructure, file transfer software,
performance monitoring and tuning
• The DLT software
distribution supports the
creation of networkoptimized data nodes
DLT Overview
• Set of packages with configuration scripts,
etc.
• Allows the configuration of
–
–
–
–
DTN with GridFTP
IBP storage depot for content distribution
Phoebus WAN accelerator
On-ramp for Internet2 AL2S using XSP
• Includes Periscope/perfSONAR monitoring
• Automatic network tuning
DTN with AL2S On-Ramp
• Working with the Globus team at U. Chicago and
Argonne
• Leveraging our eXtensible Session Protocol
(XSP) to create end-to-end, “sessions”
– user-network interface (UNI)
• XSP daemon acts as network controller
– signals AL2S/OESS, OSCARS, OpenFlow
• GridFTP XIO driver, updating to use the Globus
Transfer Network Controller API
• Generic, transparent on-ramp to circuit networks
like AL2S
WAN Acceleration
• A key reason the Science DMZ model “works” is
the separation of lossy access networks from
high-bandwidth, long-latency links
• Termination of TCP connections in
“middleboxes” can increase throughput by
reducing the RTT
• Protocol
translation
• Storage in the
network to
buffer and
burst
Distributed Storage for Content Distribution
• IBP provides a primitive, scalable, in-network
storage service
• File-like abstractions can be built on top of this
• Uses a data structure known as an exNode (like
a Unix inode) to track allocations
• These basic building blocks can be used to build
various instances
–
–
–
–
Parallel filesystem
Distributed RAID-like storage
Content distribution network
Bittorrent-like peer to peer transfers
Architecture
• Unified Network Information Service (UNIS)
– Descendant of perfSONAR Lookup and Topology Services
– Network and service “graph”
• Intelligent Data Movement Service (IDMS)
– Data dispatcher
– Operates on UNIS data
– Spawn storage services dynamically in GENI
• Periscope/perfSONAR
– Monitoring for operational integrity and optimization, BLiPP
• Storage Services
– IBP, prototype based on Ceph
• Other services
– Data transfer (GridFTP), WAN acceleration
Earth Observation Depot Network (EODN) –
An open, community specific content distribution
network for remote sensing data
Landsat data
•
Landsat 8 launched February 13th, 2013
•
Covers the entire land surface of the Earth every 16 days – 8 day offset from Landsat 7
–
•
Each scene contains a GeoTIFF product: high-resolution sensor images
–
•
~700 scenes each day
~1GB compressed, 2GB uncompressed
Traditionally used for environmental monitoring and land use and land cover change
studies
EODN
EODN
Harvester
(2) harvest
Landsat Ground Network
(4) publish
UNIS
DMS
EODN (DLT)
web
GUI
WISC
IU
NYSER
MIZZ
(1) subscribe
(5) fast download
(6) Processing…
Client
RealEarth
UW-Madison
Cisco Appliance Platform
• In collaboration with Internet2, Cisco and
Fusion-io
• Cisco C220 server
– 2x Intel® Xeon® E5-2680, 16 cores@4GHz, 64GB DDR3 RAM
– Fusion-io ioDrive2 1.2 TB
• CentOS 6.4 Linux with DLT RPMs and tuning for
data transfer throughput
Acknowledgements
• Staff Scientist Dr. Ezra Kissel leads the DLT
development efforts, PI of the GENI IDMS effort
• CC-NIE integration project with U. Tennessee
and Vanderbilt U.
• CC-NIE integration project with the Globus team
at U. Chicago and Argonne Nat’l Lab
• EODN development with AmericaView, U.
Wisconsin
12
Phoebus-SLaBS performance
GridFTP transfers over dedicated 10G path, increasing WAN latency, 4ms LAN RTT and .001%
edge loss
Download