Advancing Scientific Discovery through TeraGrid

advertisement
Advancing Scientific Discovery
through TeraGrid
Scott Lathrop
TeraGrid Director of Education, Outreach and Training
University of Chicago and Argonne National Laboratory
lathrop@mcs.anl.gov
www.teragrid.org
11 Resource Providers, One Facility
UW
Grid Infrastructure
Group (UChicago)
PSC
UC/ANL
NCAR
PU
NCSA
Caltech
UNC/RENCI
IU
ORNL
NICS
USC/ISI
SDSC
LONI
TACC
Resource Provider (RP)
Software Integration Partner
TeraGrid Objectives
• DEEP Science: Enabling Petascale Science
–Make Science More Productive through an integrated set
of very-high capability resources
•Address key challenges prioritized by users
• WIDE Impact: Empowering Communities
–Bring TeraGrid capabilities to the broad science
community
•Partner with science community leaders - “Science Gateways”
• OPEN Infrastructure, OPEN Partnership
–Provide a coordinated, general purpose, reliable set of
services and resources
•Partner with campuses and facilities
TeraGrid Resources and Services
• Computing - nearly a petaflop of computing power today
and growing
– 500 Tflop Ranger system at TACC
– NICS (U Tenn) system to come on-line this year
– Centralized help desk for all resource providers
• Remote visualization servers and software
• Data
– Allocation of data storage facilities
– Over 100 Scientific Data Collections
• Central allocations process
• Technical Support
–
–
–
–
Central point of contact for support of all systems
Advanced Support for TeraGrid Applications (ASTA)
Education and training events and resources
Over 20 Science Gateways
Requesting Allocations of Time
• TeraGrid resources are provided for free to
academic researchers and educators
• Development Allocations Committee (DAC) for
start-up accounts up to 30,000 hours of time are
requests processed in two weeks - start-up and
courses
• Medium Resource Allocations Committee (MRAC)
for requests of up to 500,000 hours of time are
reviewed four times a year
• Large Resource Allocations Committee (LRAC) for
requests of over 500,000 hours of time are
reviewed twice a year
TeraGrid User Community
100%
90%
All 20 Others
(< 2% Usa ge eac h)
Atmospheric Scie nces
80%
70%
Chemical, Thermal Systems
60%
Materials Research
50%
Ast ronomical Scie nces
40%
Physics
30%
Chemistry
20%
Molecular Bioscie nces
10%
0%
PIs
(879)
Acti ve
Use rs
(3,197)
C h argi ng
Use rs
(1,141)
All ocation s
(1.8B NUs)
NUs
(618M NUs)
TeraGrid Usage
275
Specific
Specific
Allocations
250
Roaming
Roaming Allocations
33% Annual Growth
225
200
200
Normalized
Units
(millions)
NUs (millions)
175
150
125
100
100
75
50
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
25
2004
TeraGrid currently delivers an
average of 420,000 cpu-hours per
day -> ~21,000 CPUs DC
2005
Dave Hart (dhart@sdsc.edu)
2006
2007
TeraGrid Usage Modes in CY2006
(est. number of
people/projects)
Batch Computing on Individual Resources
850
Exploratory and Application Porting
650
Workflow, Ensemble, and Parameter Sweep
160
Science Gateway Access
100
Remote Interactive Steering and Visualization
35
Tightly-Coupled Distributed Computation
10
Grid-y Users
Use Modality
Community Size
Advanced Support for TeraGrid Applications
Virtualized
Resources,
Ensembles:
FOAM
Climate
Model
Liu (UWisc)
Coupled Simulation: Full Body Arterial Tree Simulation
Karniadakis (Brown)
Sources: Ian Foster (UC/ANL), Mike Papka (UC/ANL), George Karniadakis (Brown). Images by UC/ANL.
On Demand:
Predicting Severe
Weather
Droegemeier (OU)
and LEAD
Large Data; Virtualized Resources: Earthquake Simulation
Olsen (SDSU), Okaya (USC), Southern California Earthquake Center
Sources: Kelvin Droegemeier (OU), Dennis Gannon (IU), Tom Jordan (USC). Images by PSC and SDSC.
TeraGrid Science Highlights 2007
Cosmology
Tiziana di Matteo, Carnegie Mellon U
•
•
•
Gas density is shown (increasing with
brightness) with temperature (increasing
from blue to red color). Yellow circles
indicate black holes (diameter increasing
with mass). At about 6 billion years, the
universe has many black holes and a
pronounced filamentary structure.
Found that black holes regulate galaxy
formation. As they swallow gas, they
radiate so much energy, they stop the
inflow of gas.
Worked with PSC to improve scaling and
use hybrid MPI-shared memory
programming for GADGET.
Arterial Tree Simulation and Visualization
Brown University, Northern Illinois University, and
University of Chicago/Argonne National Laboratory
Blood flow visualization
demonstration at SC07
Simulation runs across multiple
TeraGrid sites
Computation:
NCSA:
256 processors
UC/ANL:
64 processors
SDSC:
128 processors
SDSC: 144 processors
Total:
592 processors
Data transfer from compute to
visualization site (GridFTP)
UC/ANL:
4 processors
Visualization
UC/ANL: 16 processors
SC07 Exhibit floor
Storm prediction
Ming Xue, U. of Oklahoma
• Better alerts for thunderstorms, especially supercells
that spawn tornados, could save millions of dollars and
many lives.
• Unprecedented experiment, every day from April 15June 8 (tornado season) to test the ability of stormscale ensemble prediction under real forecasting
conditions for US east of the Rockies.
• First time for
– ensemble forecasting at storm scale
– real-time in a simulated operational
environment
• Successful predictions of the overall pattern and
evolution of many of the convective-scale features,
sometimes out to the second day, and good ability to
capture storm-scale uncertainties
Top: prediction 21
hours ahead of
time for May 24,
2007 ; Bottom:
observed.
Protein Structure
David Baker, U. of Washington
• David Baker’s Rosetta code has proved
the best at predicting protein 3-D
structure from sequence in biannual
competitions (CASP- Critical
Assessment of Structural Predictions)
• Used 1.3 M hours on NCSA Condor to
identify promising targets, then refined
22 promising targets on 730,000 hours
of SDSC Blue Gene.
• SDSC helped improve scaling to run on
40,960 processor BlueGene at IBM,
which reduced the running time for a
single prediction to 3 hours, instead
of weeks on a typical 1,000 processor
cluster.
Protein structure prediction by the
Rosetta code, showing the predicted
structure (blue), the X-ray structure
(red), and a low-resolution NMR
structure (green).
Solve any Rubik’s Cube in 26 moves?
• Rubik's Cube is perhaps the most
famous combinatorial puzzle of its time.
• > 43 quintillion states (4.3x10^19)
• Gene Cooperman and Dan Kunkle of
Northeastern Univ. just proved any state
can be solved in 26 moves.
• 7TB of distributed storage on TeraGrid
allowed them to develop the proof
URL: http://www.physorg.com/news99843195.html
TeraGrid Web Resources
TeraGrid Provides a
rich array of webbased resources:
• TeraGrid User Portal for
managing user allocations
and job flow
• Knowledge Base for quick
answers to technical
questions
• User Information including
documentation, information
about hardware and software
resources
• Science Highlights
• News and press releases
• Education, outreach and
training events and resources
In general, seminars and workshops will be
accessible via video on the Web. Extensive
documentation will also be Web-based.
Science Gateways
Broadening Participation in TeraGrid
• Increasing investment by
communities in their own
cyberinfrastructure, but
heterogeneous:
• Resources
• Users – from expert to K-12
• Software stacks, policies
• Science Gateways
– Provide “TeraGrid Inside”
capabilities
– Leverage community investment
Source: Dennis Gannon (gannon@cs.indiana.edu)
OGCE
OGCE Portlets
Portlets
with
with Container
Container
Service
Service
API
API
Apache
Apache Jetspeed
Jetspeed
Internal
Internal Services
Services
Grid
Grid
Service
Service
Stubs
Stubs
Local
Local
Portal
Portal
Services
Services
Rem
Remote
ote
Content
Content
Services
Services
Grid Resources
Workflow Composer
Grid
Protocols
Java
CoG Kit
– Web-based Portals
– Application programs running on
users' machines but accessing
services in TeraGrid
– Coordinated access points
enabling users to move
seamlessly between TeraGrid and
other grids.
Build standard portals to meet the domain
requirements of the biology communities
Develop federated databases to be
replicated and shared across TeraGrid
OGCE Science Portal
• Three common forms:
Technical Approach
Grid
Service
s
Open Source Tools
HTTP
Rem ote
Content
Servers
Gateways are Expanding
• 10 initial projects as part of TG proposal
• >20 Gateway projects today
• No limit on how many gateways can use TG
resources
– Prepare services and documentation so developers
can work independently
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Open Science Grid (OSG)
Special PRiority and Urgent Computing
Environment (SPRUCE)
National Virtual Observatory (NVO)
Linked Environments for Atmospheric
Discovery (LEAD)
Computational Chemistry Grid (GridChem)
Computational Science and Engineering
Online (CSE-Online)
GEON(GEOsciences Network)
Network for Earthquake Engineering
Simulation (NEES)
SCEC Earthworks Project
Network for Computational Nanotechnology
and nanoHUB
GIScience Gateway (GISolve)
Biology and Biomedicine Science Gateway
Open Life Sciences Gateway
The Telescience Project
Grid Analysis Environment (GAE)
Neutron Science Instrument Gateway
TeraGrid Visualization Gateway, ANL
BIRN
Gridblast Bioinformatics Gateway
Earth Systems Grid
Astrophysical Data Repository (Cornell)
TeraGrid as a Social Network
• Annual TeraGrid
conference - TeraGrid ‘08 Las Vegas - June
• Science Gateway
community very successful
– Transitioning to consulting
model
• Campus Champions
– Campus Representatives
assisting local users
• HPC University
– training and education
resources and events
• Education and Outreach
– Engaging thousands of people
TeraGrid ‘08 Conference
Call for Participation!
Riviera Hotel and Casino
Las Vegas
June 9th-13th, 2008
Science, Technology and Education
Papers
Tutorials
BOFs
Student Competitions
Visualization Showcase
Student Competition Teams
Campus Champions Program
• Training program for campus representatives
• Campus advocate for TeraGrid and CI
• TeraGrid ombudsman for local users
• Quick start-up accounts for campus
• TeraGrid contacts for problem resolution
• We’re looking for interested campuses!
HPC Education and Training
TeraGrid partners offer
training and education
events and resources to
educators and
researchers:
• Workshops, institutes and seminars
on high-performance scientific
computing
• Hands-on tutorials on porting and
optimizing code for the TeraGrid
systems
• On-line self-paced tutorials
• High-impact educational and visual
materials suitable for K–12,
undergraduate and graduate classes
“HPC University”
• Advance researchers’ HPC skills
– Catalog of live and self-paced training
– Schedule series of training courses
– Gap analysis of materials to drive development
• Work with educators to enhance the curriculum
– Search catalog of HPC resources
– Schedule workshops for curricular development
– Leverage good work of others
• Offer Student Research Experiences
– Enroll in HPC internship opportunities
– Offer Student Competitions
• Publish Science and Education Impact
– Promote via TeraGrid Science Highlights, iSGTW
– Publish education resources to NSDL-CSERD
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Sampling of Training Topics Offered
•
HPC Computing
–
–
–
–
–
–
–
–
–
Introduction to Parallel Computing
Toward Multicore Petascale Applications
Scaling Workshop - Scaling to Petaflops
Effective Use of Multi-core Technology
TeraGrid - Wide BlueGene Applications
Introduction to Using SDSC Systems
Introduction to the Cray XT3 at PSC
Introduction to & Optimization for SDSC Sytems
Parallel Computing on Ranger & Lonestar
• Domain-specific Sessions
– Petascale Computing in the Biosciences
– Workshop on Infectious Disease Informatics at NCSA
• Visualization
– Introduction to Scientific Visualization
– Intermediate Visualization at TACC
– Remote/Collaborative TeraScale Visualization on the TeraGrid
•
Other Topics
– NCSA to host workshop on data center design
– Rocks Linux Cluster Workshop
– LCI International Conference on HPC Clustered Computing
•
Over 30 on-line asynchronous tutorials
SC08-SC10 Education Program
• Multi-year, year-long, Education Programs to
provide continuity and sustained impact
• Integrate HPC into high school and
undergraduate science, technology, engineering
and mathematics classrooms
– Foster High School - College partnerships
• Significantly expanded digital libraries of
resources for teaching and learning CSERD/NSDL, ACM Digital Library
• Sponsors: ACM, IEEE, TeraGrid, NCSI, CSERD,
Krell, and NSF
• Recruiting faculty and institutions to innovate
their curriculum
Internships and Fellowships
TeraGrid Partners
offer internships and
fellowships that
allow
undergraduates,
post-graduate
students and faculty
to be located on-site
and work with
TeraGrid staff and
researchers in areas
critical to advancing
scientific discovery:
• Computer science in
user support and
operations
• Future technologies
• Research activities
Broadening Participation in TeraGrid
• Broaden awareness of TeraGrid
– Campus Visits (coupled with CI Days)
– Professional Society Meetings
– Develop promotional materials
• Build human capacity for Terascale research
– In-depth consulting (5-8 consultants)
– TeraGrid Fellowship Program for faculty and students
– Mentoring Program
– Campus Champions
• Enhance the usability and access of TG via SGs
– Assess Science Gateway readiness and community requirements
– Develop replicable strategies for integrating TeraGrid resources
into SGs, with an emphasis on under-served community needs
For More Information
www.teragrid.org
www.s-education.org
cserd.nsdl.org
lathrop@mcs.anl.gov
Download