CC-NIE Networking Infrastructure: 100 Gb/s Science DMZ

advertisement
UCSC 100 Gbps Science DMZ –
1 year 9 month Update
Brad Smith & Mary Doyle
Slide 1
Slide 1
Goal 1 - 100 Gbps DMZ - Complete!
CENIC HPR and
Global Research
Networks
CENIC DC and
Global Internet
Existing 10 Gb/s
SciDMZ 10 Gb/s
SciDMZ Research 10 Gb/s
Border
Border
SciDMZ Infrastructure 100 Gb/s
Science DMZ
Router
L2
dtn.ucsc.edu
Core
Core
Science DMZ
10 Gb/s Campus
Distribution Core
Slide 2
Goal 2 – Collaborate with users to use it!
• MCD Biologist doing brain wave imaging
• SCIPP analyzing LHC ATLAS data
• HYADES cluster doing Astrophysics visualizations
• CBSE Cancer Genomics Hub
Slide 3
Exploring mesoscale brain wave imaging data
James Ackman
Assistant Professor
Department of Molecular, Cell, & Developmental Biology
University of California, Santa Cruz
1. Record brain activity patterns
2. Analyze cerebral connectivity
• external computing
• local computing
Science DMZ
•
•
•
•
Acquire 60 2.1GB TIFF images/day (120 GB/day total).
Initially transfer 20 Mbps = 12-15 mins/TIFF = 15hrs/day!
With Science DMZ 354 Mbps = 1min = 1hr/day!
Expected to grow 10x over near term
Slide 4
SCIPP Network Usage for Physics with ATLAS
Ryan Reece
ryan.reece@cern.ch
Santa Cruz Institute for Particle Physics
Slide 5
ATLAS Detector
Humans
(for scale)
p+
p+
T. Rex
Slide 6
Data Volume
• LHC running 2009-2012 produced ~ 100 PB
– Currently ~10 PB/year
• SCIPP process and skim that on the LHC computing grid, and bring
~10 TB of data to SCIPP each year.
– 12hr transfer time impacts ability to provide input for next experiment
• Expect ≈ 4 times the data volume in the next run 2015-2018.
• Our bottleneck is downloading the skimmed data to SCIPP.
• Current download rate ~ few TB every few weeks.
Slide 7
Throughput 1 Gbps – 400 Mbps
public network
private network
atlas01 (headprv)
public-private
network bridge
XROOTD
data-flow
wrk0prv
users
atlas02 (int0prv)
1 Gb
nfs
NFS
data-flow
≈20 TB
downloading
from grid
atlas03 (nfsprv)
1 Gb
atlas04 (int1prv)
1 Gb
1 Gb
Dell 6248 Switch (2007)
campus
network
1 Gb
wrk1prv
wrk2prv
...
XROOTD
wrk7prv
128 CPUs
≈20 TB
Slide 8
Throughput 10 Gbps – 400 Mbps?!
public network
private network
atlas01 (headprv)
public-private
network bridge
XROOTD
data-flow
wrk0prv
users
atlas02 (int0prv)
1 Gb
nfs
NFS
data-flow
≈20 TB
downloading
from grid
atlas03 (nfsprv)
atlas04 (int1prv)
10 Gb
10 Gb
10 Gb
Dell 6248 Switch (2007)
campus
network
1 Gb
wrk1prv
wrk2prv
...
XROOTD
wrk7prv
128 CPUs
≈20 TB
Slide 9
With help from
ESNet!
Offload Dell Switch – 1.6 Gbps
public network
private network
atlas01 (headprv)
public-private
network bridge
XROOTD
data-flow
wrk0prv
users
atlas02 (int0prv)
1 Gb
nfs
NFS
data-flow
≈20 TB
downloading
from grid
atlas03 (nfsprv)
atlas04 (int1prv)
10 Gb
10 Gb
10 Gb
10 Gb
Dell 6248 Switch (2007)
campus
network
1 Gb
wrk1prv
wrk2prv
...
XROOTD
wrk7prv
128 CPUs
≈20 TB
Slide 10
SCIPP Summary
•
Quadrupled throughput
– Reduce download time from 12 hrs to 3 hrs
•
Still long ways from 10 Gbps potential
– ~30mins (factor of 8)
•
Probably not going to be enough for new run
– ~4x data volume
•
Possible problems
–
–
–
–
•
Atlas03 storage (not enough spindles)
WAN or protocol problems
6 year old Dell switch
Investigating GridFTP solution and new LHC data access node from SDSC
We are queued up to help them when they’re ready…
Slide 11
Hyades
• Hyades is an HPC cluster for Computational Astrophysics
• Funded by a $1 million grant from NSF in 2012
• Users from departments of

Astronomy & Astrophysics

Physics

Earth & Planetary Sciences

Applied Math & Statistics

Computer Science, etc
• Many are also users of national supercomputers
Slide 12
Hyades Hardware
•
•
•
•
•
•
•
•
180 Compute Nodes
8 GPU Nodes
1 MIC Node
1 big-memory Analysis Node
1 3D Visualization Node
Lustre Storage, providing 150TB of scratch space
2 FreeBSD Files Servers, providing 260TB of NFS space
1 PetaByte Cloud Storage System, using Amazon S3 protocols
Slide 13
Slide 14
Data Transfer
• 100+ TB between Hyades and NERSC
• 20 TB between Hyades and NASA Pleiades; in the process of
moving 60+ TB from Hyades to NCSA Blue Waters
• 10 TB from Europe to Hyades
• Shared 10 TB of simulation data with collaborators in Australia,
using the Huawei Cloud Storage
Slide 15
Remote Visualization
• Ein is a 3D Visualization workstation, located in an Astronomy office
(200+ yards from Hyades)
• Connected to Hyades via a 10G fiber link
• Fast network enables remote visualization in real time:
– Graphics processing locally on Ein
– Data storage and processing remotely, either on Hyades or on NERSC
supercomputers
Slide 16
CBSE CGHub
• NIH/NCI archive of cancer genomes
• 10/2014 - 1.6PB of genomes uploaded
• 1/2014 – 1PB/month downloaded(!)
• Located at SDSC… managed from UCSC
• Working with CGHub to explore L2/“engineered” paths
Slide 17
Innovations…
• “Research Data Warehouse”
– DTN with long-term storage
• Whitebox switches
–
–
–
–
On chip packet buffer – 12 MB
128 10 Gb/s SERDES... so 32 40-gig ports
SOC… price leader, uses less power
Use at network edge
Slide 18
Project Summary
• 100 Gbps Science DMZ completed
• Improved workflow for a number of research groups
• Remaining targets
–
–
–
–
Extend Science DMZ to more buildings
Further work with SCIPP… when they need it
L2 (“engineered”) paths with CBSE (genomics)
SDN integration
• Innovations
– “Research Data Warehouse” - DTN as long-term storage
– Whitebox switches
Slide 19
Questions?
Brad Smith
Director Research & Faculty Partnerships, ITS
University of California Santa Cruz
brad@ucsc.edu
Slide 20
Download