3-KBenninger_DANCES_Final

advertisement
NSF Collaborative Research: CC-NIE Integration:
Developing Applications with Networking
Capabilities via End-to-End Software
Defined Networking (DANCES)
Kathy Benninger, Pittsburgh Supercomputing Center
Workshop on the Development of a Next-Generation Cyberinfrastructure
1-Oct-2014
What is DANCES?
• The DANCES project, an NSF funded CC-NIE
collaborative award, is developing mechanisms for
managing network bandwidth by adding end-to-end
software-defined networking (SDN) capability and
interoperability to selected CI applications and to
application end point network infrastructure
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
2
DANCES Participants and Partner Sites
• Pittsburgh Supercomputer Center (PSC)
• National Institute for Computational Sciences (NICS)
• Pennsylvania State University (Penn State)
• National Center for Supercomputing Applications (NCSA)
• Texas Advanced Computing Center (TACC)
• Georgia Institute of Technology (GaTech)
• eXtreme Science and Engineering Discovery Environment
(XSEDE)
• Internet2
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
3
DANCES Partner Sites on AL2S XSEDEnet
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
4
DANCES Application Integration Targets
• Add network bandwidth scheduling capability using
SDN to supercomputing infrastructure applications
• Resource management and scheduling
– Torque/MOAB scheduling software
– Enable bandwidth reservation for file transfer
• Wide area distributed file systems
– XSEDE-wide file system (XWFS)
– SLASH2 wide area distributed file system developed by PSC
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
5
File System Application Integration Research
• XWFS
– Based on IBM’s GPFS, this WAN file system is deployed
across several XSEDE Service Providers. Research activity
is XWFS data flow integration with SDN/OpenFlow across
XSEDEnet/Internet2
• SLASH2
– PSC’s SLASH2 WAN file system is deployed at PSC and
partner sites. Research activity is SLASH2 data flow
integration with SDN/OpenFlow and resource scheduling
across XSEDEnet/Internet2
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
6
Application Integration Research
• GridFTP
– Integration of SDN/OpenFlow capability with the resource
management and scheduling subsystems of XSEDE’s
advanced computational cyberinfrastructure to support
the GridFTP data transfer application
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
7
DANCES System Diagram
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
8
SDN/OpenFlow Infrastructure Integration
• Application interface with SDN/OF environment
– Torque Prologue and Epilogue scripts to set up and tear down network
reservation for scheduled file transfer via file system (XWFS, SLASH2)
or GridFTP
– Map SLASH2 and XWFS file system interfaces to network bandwidth
reservation
– Interface to Internet2’s Open Exchange Software Suite (OESS)
• AL2S VLAN provisioning
• Establish end-to-end path between file transfer source and destination
sites
• SDN/OF-capable switches
– Existing infrastructure at some sites (e.g., CC-NIE and CC*IIE recipients)
– Evaluating hardware for deployment
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
9
Workflow Example: SDN-enabled SLASH2
Note: SLASH2 supports file replication and multiple residency
1.
User requests file residency at a particular site
2.
SLASH2 checks and returns file residency status
3.
Check user authorization for bandwidth scheduling
4.
SLASH2 will initiate path set up with end site OpenFlow configuration
and transaction with Internet2’s FlowSpace Firewall and OESS for wide
area authorization and path provisioning
5.
During transfer SLASH2 will poll for remote residency completion
6.
Upon completion of transfer, remove the provisioned path
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
10
Workflow Example: Torque/MOAB with GridFTP
1. User creates DANCES-GridFTP job and submits it
2. Torque/MOAB schedules the job when resources are
available
3. DANCES-GridFTP job initiated
4. Torque uses Prologue script to send Northbound API
instruction to SDN controller to create end-to-end path
5. Path set up will include local OpenFlow configuration and
transaction with Internet2’s FlowSpace Firewall and OESS
for wide area authorization and path provisioning
6. Torque/MOAB Epilogue script to tear down provisioning
when finished
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
11
User Interaction
• The user community primarily consists of domain researchers
and scientists, therefore DANCES emphasizes transparent
functionality of the bandwidth scheduling mechanism
• Administratrively, user requests bandwidth reservation
capability
– As a computational resource from the XRAC (typical one year)
– To support a limited-time large data set transfer need (< one year)
• Operationally, a user’s bandwidth reservation request may
– Succeed: bandwidth scheduled and transfer will proceed
– Be deferred by scheduler with permission, until bandwidth is available
– Fail: Request declined, user notified, transfer will proceed as besteffort along with the unscheduled traffic
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
12
Cyberinfrastructure Issues - Policy
• Criteria for allocating bandwidth scheduling capability to
users/projects
• Agreement on the dedicated bandwidth that each site
commits for scheduled transfers
• Monitoring and accounting of bandwidth usage
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
13
Cyberinfrastructure Issues - Technical
• Authentication and authorization mechanism for
users/projects to allow bandwidth reservation request
– Site/XSEDE context
– Internet2 AL2S context
• Real-time cross-site tracking and management of allocated
bandwidth resources
• Extend Torque/MOAB, XWFS, and SLASH2 to support SDN
commands
• Vendor support for OpenFlow 1.3 flow metering
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
14
Research Questions
• How do multiple SDN/OF controllers overlay into the CI?
• Does OpenFlow 1.3 flow metering meet the performance
needs?
• Are there significant SDN/OF operational differences
between wide area and machine room environments?
• How well do multi-vendor OpenFlow 1.3 implementations
interoperate?
• How to optimize network bandwidth utilization by using
bandwidth scheduling?
• What is sufficient verification by project team to pave the
way for production deployment at XSEDE and campus sites?
© 2014 Pittsburgh Supercomputing Center
© 2010 Pittsburgh Supercomputing Center
15
Download