Dan Katz' presentation

advertisement
Introduction to the TeraGrid
Daniel S. Katz
d.katz@ieee.org
Lead, LONI as a TeraGrid Resource Provider
Director of Science, TeraGrid GIG
Director, Cyberinfrastructure Development,
Center for Computation & Technology, LSU
Adjunct Associate Professor,
Electrical and Computer Engineering Department, LSU
CCT: Center for Computation
TechnologySTATE UNIVERSITY
AT &
LOUISIANA
What is the TeraGrid?
• World’s largest open scientific discovery infrastructure
• Leadership class resources at eleven partner sites combined to create
an integrated, persistent computational resource
– High-performance networks
– High-performance computers (>1 Pflops (~100,000 cores) -> 1.75 Pflops)
• And a Condor pool (w/ ~13,000 CPUs)
–
–
–
–
–
Visualization systems
Data Collections (>30 PB, >100 discipline-specific databases)
Science Gateways
User portal
User services - Help desk, training, advanced app support
• Allocated to US researchers and their collaborators through national
peer-review process
– Generally, review of computing, not science
• Extremely user-driven
– MPI jobs, ssh or grid (GRAM) access, etc.
CCT: Center for Computation & Technology
2
Governance
• 11 Resource Providers (RPs) funded under
separate agreements with NSF
•
•
•
•
Different start and end dates
Different goals
Different agreements
Different funding models
• 1 Coordinating Body – Grid Integration Group (GIG)
• University of Chicago/Argonne
• Subcontracts to all RPs and six other universities
• 7-8 Area Directors
• Working groups with members from many RPs
• TeraGrid Forum with Chair
CCT: Center for Computation & Technology
3
TeraGrid
CCT: Center for Computation & Technology
4
TG New Large Resources
• Ranger@TACC
– First NSF ‘Track2’ HPC system
– 504 TF
– 15,744 Quad-Core AMD
Opteron processors
– 123 TB memory, 1.7 PB disk
• Kraken@NICS (UT/ORNL)
– Second NSF ‘Track2’ HPC system
– 170 TF Cray XT4 system
– To be upgraded to Cray XT5 at 1 PF
• 10,000+ compute sockets
• 100 TB memory, 2.3 PB disk
Blue Waters@NCSA
NSF Track 1
10 PF peak
Coming in 2011
• Something@PSC
–
Third NSF ‘Track2’ HPC system
CCT: Center for Computation & Technology
5
Who Uses TeraGrid (2007)
Advanced Scientific All 19 Others
Atmospheric Computing
4%
Sciences
2%
3%
Earth Sciences
3%
Chemical, Thermal
Systems
5%
Materials Research
6%
Molecular
Biosciences
31%
Astronomical
Sciences
12%
Physics
17%
Chemistry
17%
CCT: Center for Computation & Technology
6
How TeraGrid Is Used
Use Modality
Community Size
(rough est. - number of users)
Batch Computing on Individual Resources
Exploratory and Application Porting
Workflow, Ensemble, and Parameter Sweep
Science Gateway Access
Remote Interactive Steering and Visualization
Tightly-Coupled Distributed Computation
850
650
250
500
35
10
2006 data
CCT: Center for Computation & Technology
7
How One Uses TeraGrid
RP 1
RP 2
POPS
(for now)
User
Portal
Science
Gateways
TeraGrid Infrastructure
Accounting, …
(Accounting, Network,Network,
Authorization,…)
Command
Line
RP 3
Slide courtesy of Dane Skow and Craig Stewart
Compute
Service
Viz
Service
CCT: Center for Computation & Technology
Data
Service
8
User Portal: portal.teragrid.org
CCT: Center for Computation & Technology
9
Access to resources
• Terminal: ssh, gsissh
• Portal: TeraGrid user
portal, Gateways
– Once logged in to
portal, click on “Login”
• Also, SSO from
command-line
CCT: Center for Computation & Technology
10
User Portal: User Information
• Knowledge Base
for quick answers
to technical
questions
• Documentation
• Science Highlights
• News and press
releases
• Education,
outreach and
training events
and resources
CCT: Center for Computation & Technology
11
Data Storage Resources
• GPFS-WAN
– 700 TB disk storage at SDSC, accessible from machines at NCAR,
NCSA, SDSC, ANL
• Data Capacitor
– 535 TB storage at IU, including databases
• Data Collections
– Storage at SDSC (files, databases) for collections used by
communities
• Tape Storage
– Available at IU, NCAR, NCSA, SDSC
• Access is generally through GridFTP (through portal or
command-line)
CCT: Center for Computation & Technology
12
Science Gateways
• A natural extension of Internet & Web 2.0
• Idea resonates with Scientists
– Researchers can imagine scientific capabilities provided through
familiar interface
• Mostly web portal or web or client-server program
• Designed by communities; provide interfaces
understood by those communities
– Also provide access to greater capabilities (back end)
– Without user understand details of capabilities
– Scientists know they can undertake more complex analyses and
that’s all they want to focus on
– TeraGrid provides tools to help developer
• Seamless access doesn’t come for free
– Hinges on very capable developer
Slide courtesy of Nancy Wilkins-Diehr
CCT: Center for Computation & Technology
13
Current Science Gateways
•
•
•
•
•
•
•
Biology and Biomedicine Science Gateway
Open Life Sciences Gateway
The Telescience Project
Grid Analysis Environment (GAE)
Neutron Science Instrument Gateway
TeraGrid Visualization Gateway, ANL
BIRN
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Open Science Grid (OSG)
Special PRiority and Urgent Computing
Environment (SPRUCE)
National Virtual Observatory (NVO)
Linked Environments for Atmospheric
Discovery (LEAD)
Computational Chemistry Grid (GridChem)
Computational Science and Engineering
Online (CSE-Online)
GEON(GEOsciences Network)
Network for Earthquake Engineering
Simulation (NEES)
SCEC Earthworks Project
Network for Computational Nanotechnology
and nanoHUB
GIScience Gateway (GISolve)
Gridblast Bioinformatics Gateway
Earth Systems Grid
Astrophysical Data Repository (Cornell)
Slide courtesy of Nancy Wilkins-Diehr
CCT: Center for Computation & Technology
14
Focused Outreach
• Campus Champions
– Volunteer effort of a staff member at a university
• Pathways for Broadening Participation
– Low-level trial effort
– TeraGrid RP staff have extended interaction with
MSIs
• Both
– Have initial small TG allocation
• Can create and distribute suballocations very quickly
– Work with users as needed to help them
CCT: Center for Computation & Technology
15
TG App: Predicting storms
• Hurricanes and tornadoes cause massive
loss of life and damage to property
• TeraGrid supported spring 2007 NOAA
and University of Oklahoma Hazardous
Weather Testbed
– Major Goal: assess how well ensemble
forecasting predicts thunderstorms,
including the supercells that spawn
tornadoes
– Nightly reservation at PSC, spawning jobs
at NCSA as needed for details
– Input, output, and intermediate data
transfers
– Delivers “better than real time” prediction
– Used 675,000 CPU hours for the season
– Used 312 TB on HPSS storage at PSC
Slide courtesy of Dennis Gannon, ex-IU, and LEAD Collaboration
CCT: Center for Computation & Technology
16
App: GridChem
Different
licensed
applications
with different
queues
Will be
scheduled for
workflows
Slide courtesy of Joohyun Kim
CCT: Center for Computation & Technology
17
Apps: Genius and Materials
Fully-atomistic simulations of
clay-polymer nanocomposites
Modeling blood flow before
(during?) surgery
Why cross-site /
distributed runs?
HemeLB on LONI
1.Rapid turnaround,
conglomeration of
idle processors to run
a single large job
LAMMPS on TeraGrid
2.Run big compute &
big memory jobs not
possible on a single
machine
Slide courtesy of Steven Manos and Peter Coveney
CCT: Center for Computation & Technology
18
TeraGrid: Both Operations
and Research
• Operations
– Facilities/services on which users rely
– Infrastructure on which other providers build
AND
• R&D
– Learning how to do distributed, collaborative science
on a global, federated infrastructure
– Learning how to run multi-institution shared
infrastructure
Looking for science that needs TG and NGS/DEISA/UK
CCT: Center for Computation & Technology
19
Download