PPT - Larry Smarr - California Institute for Telecommunications and

advertisement
High Performance Cyberinfrastructure
Enables Data-Driven Science
in the Globally Networked World
Keynote Presentation
Sequencing Data Storage and Management Meeting at
The X-GEN Congress and Expo
San Diego, CA
March 14, 2011
Dr. Larry Smarr
Director, California Institute for Telecommunications and Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
1
Follow me on Twitter: lsmarr
Abstract
High performance cyberinfrastructure (10Gbps dedicated optical channels endto-end) enables new levels of discovery for data-intensive research projects—
such as next generation sequencing. In addition to international and national
optical fiber infrastructure, we need local campus high performance research
cyberinfrastructure (HPCI) to provide “on-ramps,” as well as scalable
visualization walls and compute and storage clouds, to augment the emerging
remote commercial clouds. I will review how UCSD has built out just such a
HPCI and is in the process of connecting it to a variety of high throughput
biomedical devices. I will show how high performance collaboration
technologies allow for distributed interdisciplinary teams to analyze these large
data sets in real-time.
Two Calit2 Buildings Provide
Laboratories for “Living in the Future”
•
“Convergence” Laboratory Facilities
– Nanotech, BioMEMS, Chips, Radio, Photonics
– Virtual Reality, Digital Cinema, HDTV, Gaming
•
Over 1000 Researchers in Two Buildings
– Linked via Dedicated Optical Networks
UC San Diego
UC Irvine
www.calit2.net
Over 400 Federal Grants, 200 Companies
The Required Components of
High Performance Cyberinfrastructure
•
•
•
•
•
High Performance Optical Networks
Scalable Visualization and Analysis
Multi-Site Collaborative Systems
End-to-End Wide Area CI
Data-Intensive Campus Research CI
The OptIPuter Project: Creating High Resolution Portals
Over Dedicated Optical Channels to Global Science Data
OptIPortal
Scalable
Adaptive
Graphics
Environment
(SAGE)
Picture
Source:
Mark
Ellisman,
David Lee,
Jason Leigh
Calit2 (UCSD, UCI), SDSC, and UIC Leads—Larry Smarr PI
Univ. Partners: NCSA, USC, SDSU, NW, TA&M, UvA, SARA, KISTI, AIST
Industry: IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent
Visual Analytics--Use of Tiled Display Wall OptIPortal
to Interactively View Microbial Genome (5 Million Bases)
Acidobacteria bacterium Ellin345 Soil
Bacterium 5.6 Mb; ~5000 Genes
Source: Raj Singh, UCSD
Use of Tiled Display Wall OptIPortal
to Interactively View Microbial Genome
Source: Raj Singh, UCSD
Use of Tiled Display Wall OptIPortal
to Interactively View Microbial Genome
Source: Raj Singh, UCSD
Large Data Challenge: Average Throughput to End User
on Shared Internet is 10-100 Mbps
Tested
January 2011
Transferring 1 TB:
--50 Mbps = 2 Days
--10 Gbps = 15 Minutes
http://ensight.eos.nasa.gov/Missions/terra/index.shtml
Solution: Give Dedicated Optical Channels
to Data-Intensive Users
(WDM)
10 Gbps per User ~ 100-1000x
Shared Internet Throughput
c* f
Source: Steve Wallach, Chiaro Networks
“Lambdas”
Parallel Lambdas are Driving Optical Networking
The Way Parallel Processors Drove 1990s Computing
Dedicated 10Gbps Lightpaths Tie Together
State and Regional Fiber Infrastructure
Interconnects
Two Dozen
State and Regional
Optical Networks
Internet2 Dynamic
Circuit Network
Is Now Available
NLR 40 x 10Gb Wavelengths
The Global Lambda Integrated Facility-Creating a Planetary-Scale High Bandwidth Collaboratory
Research Innovation Labs Linked by 10G Dedicated Lambdas
www.glif.is
Created in Reykjavik,
Iceland 2003
Visualization courtesy of
Bob Patterson, NCSA.
Launch of the 100 Megapixel OzIPortal Kicked Off
a Rapid Build Out of Australian OptIPortals
January 15, 2008
January 15, 2008
No Calit2 Person Physically Flew to Australia to Bring This Up!
Covise, Phil Weber, Jurgen Schulze, Calit2
CGLX, Kai-Uwe Doerr , Calit2
http://www.calit2.net/newsroom/release.php?id=1421
“Blueprint for the Digital University”--Report of the
UCSD Research Cyberinfrastructure Design Team
• Focus on Data-Intensive Cyberinfrastructure
April 2009
No Data
Bottlenecks
--Design for
Gigabit/s
Data Flows
research.ucsd.edu/documents/rcidt/RCIDTReportFinal2009.pdf
Campus Preparations Needed
to Accept CENIC CalREN Handoff to Campus
Source: Jim Dolgonas, CENIC
Current UCSD Prototype Optical Core:
Bridging End-Users to CENIC L1, L2, L3 Services
To 10GigE cluster
node interfaces
.....
To cluster nodes
.....
Quartzite Communications
Core Year 3
Enpoints:
Wavelength
Quartzite
Selective
>= 60 endpoints
at 10 GigE
Core
Switch
>= 32 Packet switched Lucent
>= 32 Switched wavelengths
>= 300 Connected endpoints
To 10GigE cluster
node interfaces and
other switches
Glimmerglass
To cluster nodes
.....
Production
OOO
Switch
GigE Switch with
Dual 10GigE Upliks
To cluster nodes
...
.....
32 10GigE
Approximately
0.5 TBit/s
Arrive at the “Optical”
Force10
Center of Campus.
Switching
is a Hybrid
of:
Packet Switch
To
other
Packet,
nodes Lambda, Circuit -OOO and Packet Switches
GigE Switch with
Dual 10GigE Upliks
GigE
10GigE
4 GigE
4 pair fiber
Juniper T320
Source: Phil Papadopoulos, SDSC/Calit2
(Quartzite PI, OptIPuter co-PI)
Quartzite Network MRI #CNS-0421555;
OptIPuter #ANI-0225642
GigE Switch with
Dual 10GigE Upliks
CalREN-HPR
Research
Cloud
Campus Research
Cloud
Calit2 Sunlight
Optical Exchange Contains Quartzite
Maxine
Brown,
EVL, UIC
OptIPuter
Project
Manager
UCSD Planned Optical Networked
Biomedical Researchers and Instruments
•
CryoElectron
Microscopy Facility
San Diego
Supercomputer
Center
Cellular & Molecular
Medicine East
Calit2@UCSD
Bioengineering
National
Center for
Microscopy
& Imaging
Radiology
Imaging Lab
Center for
Molecular Genetics
Pharmaceutical
Cellular & Molecular
Sciences Building
Biomedical Research Medicine West
Connects at 10 Gbps :
–
–
–
–
Microarrays
Genome Sequencers
Mass Spectrometry
Light and Electron
Microscopes
– Whole Body Imagers
– Computing
– Storage
UCSD Campus Investment in Fiber Enables
Consolidation of Energy Efficient Computing & Storage
WAN 10Gb:
CENIC, NLR, I2
N x 10Gb/s
Gordon –
HPD System
Cluster Condo
Scientific
Instruments
GreenLight
Data Center
Triton – Petascale
Data Analysis
Digital Data
Collections
DataOasis
(Central) Storage
Campus Lab
Cluster
Source: Philip Papadopoulos, SDSC, UCSD
OptIPortal
Tiled Display Wall
Community Cyberinfrastructure for Advanced
Microbial Ecology Research and Analysis
http://camera.calit2.net/
Calit2 Microbial Metagenomics ClusterNext Generation Optically Linked Science Data Server
Source: Phil Papadopoulos, SDSC, Calit2
512 Processors
~5 Teraflops
~ 200 Terabytes Storage
4000 Users
From 90 Countries
1GbE
and
10GbE
Switched
/ Routed
Core
~200TB
Sun
X4500
Storage
10GbE
OptIPuter Persistent Infrastructure Enables
Calit2 and U Washington CAMERA Collaboratory
Photo Credit: Alan Decker
Feb. 29, 2008
Ginger
Armbrust’s
Diatoms:
Micrographs,
Chromosomes,
Genetic
Assembly
iHDTV: 1500 Mbits/sec Calit2 to
UW Research Channel Over NLR
Creating CAMERA 2.0 Advanced Cyberinfrastructure Service Oriented Architecture
Source:
CAMERA CTO
Mark Ellisman
The GreenLight Project:
Instrumenting the Energy Cost of Computational Science
• Focus on 5 Communities with At-Scale Computing Needs:
–
–
–
–
–
Metagenomics
Ocean Observing
Microscopy
Bioinformatics
Digital Media
• Measure, Monitor, & Web Publish
Real-Time Sensor Outputs
– Via Service-oriented Architectures
– Allow Researchers Anywhere To Study Computing Energy Cost
– Enable Scientists To Explore Tactics For Maximizing Work/Watt
• Develop Middleware that Automates Optimal Choice
of Compute/RAM Power Strategies for Desired Greenness
• Data Center for School of Medicine Illumina Next Gen
Sequencer Storage and Processing
Source: Tom DeFanti, Calit2; GreenLight PI
Moving to Shared Enterprise Data Storage & Analysis
Resources: SDSC Triton Resource & Calit2 GreenLight
http://tritonresource.sdsc.edu
SDSC
Large Memory
Nodes
• 256/512
GB/sys
• 8TB Total
• 128 GB/sec
• ~ 9 TF
Source: Philip Papadopoulos, SDSC, UCSD
x256
x28
UCSD Research Labs
SDSC Data Oasis
Large Scale Storage
• 2 PB
• 50 GB/sec
• 3000 – 6000 disks
• Phase 0: 1/3 TB,
8GB/s
Campus
Research
Network
N x 10Gb/s
Calit2 GreenLight
SDSC Shared
Resource
Cluster
• 24 GB/Node
• 6TB Total
• 256 GB/sec
• ~ 20 TF
NSF Funds a Data-Intensive Track 2 Supercomputer:
SDSC’s Gordon-Coming Summer 2011
• Data-Intensive Supercomputer Based on
SSD Flash Memory and Virtual Shared Memory SW
– Emphasizes MEM and IOPS over FLOPS
– Supernode has Virtual Shared Memory:
– 2 TB RAM Aggregate
– 8 TB SSD Aggregate
– Total Machine = 32 Supernodes
– 4 PB Disk Parallel File System >100 GB/s I/O
• System Designed to Accelerate Access
to Massive Data Bases being Generated in
Many Fields of Science, Engineering, Medicine,
and Social Science
Source: Mike Norman, Allan Snavely SDSC
Data Mining Applications
will Benefit from Gordon
• De Novo Genome Assembly
from Sequencer Reads &
Analysis of Galaxies from
Cosmological Simulations
& Observations
• Will Benefit from
Large Shared Memory
• Federations of Databases &
Interaction Network
Analysis for Drug
Discovery, Social Science,
Biology, Epidemiology, Etc.
• Will Benefit from
Low Latency I/O from Flash
Source: Mike Norman, SDSC
Rapid Evolution of 10GbE Port Prices
Makes Campus-Scale 10Gbps CI Affordable
• Port Pricing is Falling
• Density is Rising – Dramatically
• Cost of 10GbE Approaching Cluster HPC Interconnects
$80K/port
Chiaro
(60 Max)
$ 5K
Force 10
(40 max)
~$1000
(300+ Max)
$ 500
Arista
48 ports
2005
2007
2009
Source: Philip Papadopoulos, SDSC/Calit2
$ 400
Arista
48 ports
2010
10G Switched Data Analysis Resource:
SDSC’s Data Oasis
10Gbps
OptIPuter
UCSD
RCI
Co-Lo
5
8
CENIC/
NLR
2
32
Triton
Radical Change Enabled by
Arista 7508 10G Switch
384 10G Capable
4
8
Trestles 32
100 TF
2
12
Existing
Commodity
Storage
1/3 PB
40128
Dash
8
Oasis Procurement (RFP)
Gordon
128
2000 TB
> 50 GB/s
• Phase0: > 8GB/s Sustained Today
• Phase I: > 50 GB/sec for Lustre (May 2011)
:Phase II: >100 GB/s (Feb 2012)
Source: Philip Papadopoulos, SDSC/Calit2
Calit2 CAMERA Automatic Overflows
into SDSC Triton
@ SDSC
Triton Resource
@ CALIT2
Transparently
Sends Jobs to
Submit Portal
on Triton
CAMERA Managed
Job Submit
Portal (VM)
10Gbps
CAMERA
DATA
Direct
Mount
==
No Data
Staging
California and Washington Universities Are Testing
a 10Gbps Connected Commercial Data Cloud
• Amazon Experiment for Big Data
– Only Available Through CENIC & Pacific NW
GigaPOP
– Private 10Gbps Peering Paths
– Includes Amazon EC2 Computing & S3 Storage
Services
• Early Experiments Underway
– Robert Grossman, Open Cloud Consortium
– Phil Papadopoulos, Calit2/SDSC Rocks
Academic Research OptIPlanet Collaboratory:
A 10Gbps “End-to-End” Lightpath Cloud
HD/4k Live Video
HPC
End User
OptIPortal
Local or Remote
Instruments
National LambdaRail
10G
Lightpaths
Campus
Optical Switch
Data Repositories & Clusters
HD/4k Video Repositories
You Can Download This Presentation
at lsmarr.calit2.net
Download