Diane Baxter's presentation

advertisement
TeraGrid Resources
Enabling Scientific Discovery
Through Cyberinfrastructure
(CI)
Diane Baxter, Ph.D.
San Diego Supercomputer Center
University of California, San Diego
The National TeraGrid
Grid Infrastructure
Group (UChicago)
UW
PSC
UC/ANL
NCAR
PU
NCSA
Caltech
IU
UNC/RENCI
ORNL
U Tenn.
USC/ISI
SDSC
LSU
Resource Provider
(RP)
Software Integration Partner
TACC
http://www.teragrid.org/
A complex collaboration of over a dozen organizations
working together to provide cyberinfrastructure
that goes beyond what can be provided by individual
institutions,
to improve research productivity and enable
breakthroughs not otherwise possible.
TeraGrid . . . .
• Deep - provides leadership class resources at
11 partner sites
• Wide - is an integrated, persistent
computational resource for broad user
communities
• Open - is an open scientific discovery
infrastructure
• Is the world's largest, most comprehensive
distributed cyberinfrastructure for open
scientific research.
To be more specific, TeraGrid . . .
•
Uses high-performance network connections
(10-30 Tb/sec)
•
Integrates high-performance computers; resources
for data analysis, visualization, and storage; data
collection tools, high-end experimental facilities;
and supporting expertise around the country;
•
Provides more than a petaflop of computing
capability;
•
Offers more than 30 petabytes of online and archival
data storage, as well as systems to manage data
acquisition and access; and
•
Provides researchers access to over 100 disciplinespecific databases.
What’s in it (TeraGrid) for me?
• Instruments that delivers high-end IT resources computation, storage, visualization, and data/service
– A computational facility – over a PetaFLOP in parallel computing
capability
– A data storage and management facility - over 30 PetaBytes of
storage (disk and tape), over 100 scientific data collections
– A high-bandwidth national data network
• Services: help desk and consulting, Advanced Support
for TeraGrid Applications (ASTA), education and training
events and resources
• Access - without financial cost
– Research accounts allocated via peer review
– Startup and Education accounts automatic
TeraGrid Compute Power
UC/ANL
NCAR
NCSA
PSC
PU
IU
ORNL
Tennessee
LONI/LS
U
SDSC
TACC
Computational Resources
(size approximate - not to scale)
Slide Courtesy Tommy Minyard, TACC
TeraGrid Data Storage and Management
•
Persistent storage on disk and tape
•
Allocatable tape-based, geographically distributed
storage systems for backups of critical data :
»
»
»
»
IU (Indiana University)
NCAR (National Center for Atmospheric Research)
NCSA (National Center for Supercomputing Applications)
SDSC (San Diego Supercomputer Center)
•
Command line usage with GridFTP, using the File
Manager tool in the TeraGrid User Portal
•
GPFS-WAN (General Parallel File System Wide Area
Network). ~ 1 petabyte
•
IU Data Capacitor
•
Long term disk storage allocations
(1 Pb spinning disk for short-term data storage)
TeraGrid Architecture
RP 1
RP 2
POPS
User
Portal
Science
Gateways
TeraGrid Infrastructure
Network, Accounting, …
(Network, Authorization,
Accounting, …)
Command
Line
RP 3
Compute
Service
Viz
Service
Data
Service
(Are your eyes glazing over?)
Translation please!
Enter: Science Gateways
• A Science Gateway
– Enables scientific communities of users
with a common scientific goal and
vocabulary
– Has a common interface
– Leverages community investment
• Three common forms:
– Web-based Portals
– Application programs running on users'
machines but accessing services in
TeraGrid
– Coordinated access points enabling
users to move seamlessly between
TeraGrid and other grids.
12
Today, there are approximately 29 gateways
using the TeraGrid
How do Gateways help?
• Make science more productive
–
–
–
–
Researchers use same tools
Complex workflows
Common data formats
Data sharing
• Bring TeraGrid capabilities to
the broad science community
–
–
–
–
Lots of disk space
Lots of compute resources
Powerful analysis capabilities
A community-friendly interface to
information and research tools
But it’s not just ease of use.
What can scientists do that they couldn’t do
previously?
•
•
•
•
•
•
LEAD - access to radar data
NVO – access to sky surveys
OOI – access to sensor data
PolarGrid – access to polar ice sheet data
SIDGrid – analysis tools for social scientists
GridChem – developing multiscale coupling
How would this have been done before gateways?
Gateways can enhance and support
investments in other projects
• Increase access
– To instruments
• Increase capabilities
– To data analysis tools
• Improve workforce development
– For underserved populations, through broad access to
learning resources
• Increase outreach
• Increase public awareness
– Public sees value in investments in large facilities
• Slice bread
Gateways Greatly Expand Access
• Almost anyone can investigate scientific questions using
high end resources
– Not restricted to those in research groups with allocations
– Gateways allow anyone with a web browser to explore
• Fosters new ideas, cross-disciplinary approaches
• Encourages students to experiment
• But Gateways are used in production too
– Significant number of papers resulting from gateways including
GridChem, nanoHUB
– Scientists can focus on challenging science problems rather than
challenging infrastructure problems
How do we develop a new gateway?
Advanced support for Gateway Development
• Same peer review process used to request
resources
– 30,000 CPUs
– + 6 months of help from a TG Gateway Team member
– Reviews based on appropriate use of resources, science
is not reviewed if already funded
•Petascale
•Multisite workflows
•Gateways
•Domain expertise
Support is Very Targeted
• Start with well-defined objectives
– Focus on efficient or novel use of national CI resources
• Minimum .25 FTE for months to a year
– Enough investment to really understand and help solve
complex problems
• Must have commitment from PIs
– Want to make sure work is incorporated into production
codes and gateways
• Good candidates for targeted support include:
– Large, high impact projects
– Ability to influence new communities
– Suggestions from NSF directorates on important projects
• Lessons learned move into training and documentation
When is a gateway be most appropriate?
• Researchers using defined sets of tools in different ways
– Same executables, different input
•GridChem, CHARMM
– Creating multi-scale or complex workflows
– Shared datasets
• Common data formats
– National Virtual Observatory
– Earth System Grid
– Some groups have invested significant efforts already, e.g.:
•caBIG, extensive discussions to develop common terminology and formats
•BIRN, extensive data sharing agreements
• Difficult to access data/advanced workflows
– Sensor/radar input
•LEAD, GEON
Things you can do with the TeraGrid:
Simulate cell membrane processes
Work by Emad Tajkhorshid and James Gumbart,
of University of Illinois Urbana-Champaign.
– Mechanics of Force Propagation in TonBDependent Outer Membrane Transport.
Biophysical Journal 93:496-504 (2007).
– Results of the simulation may be seen at
www.life.uiuc.edu/emad/TonB-BtuB/btub2.5Ans.mpg
• Modeled mechanisms for transport of
molecules through cell membrane.
• Used 400,000 CPU hours [45 processor-years]
on systems at National Center for
Supercomputing Applications, IU, Pittsburgh
Supercomputing Center
Image courtesy of Emad Tajkhorshid, UIUC
Predict storms
• Hurricanes and tornadoes cause massive
loss of life and damage to property
• TeraGrid supported spring 2007 NOAA
and University of Oklahoma Hazardous
Weather Testbed
– Major Goal: assess how well ensemble
forecasting predicts thunderstorms,
including supercells  tornadoes.
– Delivers “better than real time”
prediction
– Used 675,000 CPU hours for the
season
– Used 312 TB on HPSS storage at PSC
Slide courtesy of Dennis Gannon, IU, and LEAD Collaboration
Watch Polar Ice Caps Melt (PolarGrid)
• Cyberinfrastructure Center
for Polar Science (CICPS)
– Experts in polar science,
remote sensing and
cyberinfrastructure
– Indiana, ECSU, CReSIS
• Satellite observations show
disintegration of ice shelves
in West Antarctica and
speed-up of several glaciers
in southern Greenland
– Most existing ice sheet
models, including those used
by IPCC cannot explain the
rapid changes
http://www.polargrid.org/p
olargrid/images/4/42/C005
0-polargrid-big.m4v
Source: Geoffrey Fox
CY2007 Usage by Discipline
Advanced Scientific
Atmospheric
Computing
Sciences
2%
3%
All 19 Others
4%
Earth Sciences
3%
Chemical, Thermal
Systems
5%
Materials Research
6%
3.95B SUs delivered in CY2007
Molecular
Biosciences
31%
Astronomical
Sciences
12%
Physics
17%
24
Chemistry
17%
Do you want to see more Gateway examples?
• Yes
• No
Recent Gateways using TeraGrid
Significantly
• SCEC
• SIDGrid
• CIG
SCEC using gateway to produce hazard map
• PSHA hazard map for
California using newly
released Earthquake
Rupture Forecast
(UCERF2.0) calculated
using SCEC Science
Gateway
• Warm colors indicate
regions with a high
probability of experiencing
strong ground motion in the
next 50 years.
• High resolution map,
significant CPU use
LEAD (portal.leadproject.org/)
• Simple enough an undergraduate can use it! http://wxchallenge.com/
• National Center for Supercomputing Applications (NCSA) and IU teamed up to
support WxChallenge weather forecast competition. 64 teams, 1000 students,
~16,000 CPU hours on Big Red
• XBaya is available from http://www.collab-ogce.org/
NanoHub Harnesses TeraGrid for Education
Nanotechnology education
• Used in dozens of courses
at many universities
• Teaching materials
• Collaboration space
• Research seminars
• Modeling tools
• Access to cutting edge
research software
Social Informatics Data Grid
• Heavy use of “multimodal”
data.
– Subject might be viewing a
video, while a researcher
collects heart rate and eye
movement data.
• Events must be
synchronized for analysis,
large datasets result
• Extensive analysis
capabilities are not
something that each
researcher should have to
create for themselves.
http://www.ci.uchica
go.edu/research/files
/sidgrid.mov
• Social scientists have traditionally worked in isolated labs without the
capability to share data or insights with others.
• SIDGrid enables a number of capabilities.
– Data that is expensive to collect can now be shared with others, increasing
the potential for scientific impact.
– Geographically distant researchers can collaborate on the analysis of the
same data set.
– Complex analysis tools and workflows are now available for all to use, rather
than having each lab duplicate efforts.
– All researchers now have access to the highest quality computational
resources
•SIDGrid uses TeraGrid resources for computationally-intensive tasks such as media
transcoding algorithms for pitch analysis of audio tracks and fMRI image analysis
• SIDGrid is unique among social science data archive projects
– Focused on streaming data which change over time
– Provides the ability to investigate multiple datasets, collected at different
time scales, simultaneously
• Active users of the SIDGrid system include a human neuroscience group
and linguistic research groups from the University of Chicago and the
University of Nottingham, UK
SIDGrid
sidgrid.ci.uchicago.edu
TeraGrid Pathways Activities
• 2 Gateway components
– Adapt gateways for educational use by underrepresented
communities
•GEON – SDSC, Navajo Tech
– Teach participants from underrepresented communities
how to build gateways
•PolarGrid – IU, ECSU
Navajo Technical College and gateways
•Incorporating the use of
gateways in their curricula
•GEON, GISolve areas of initial
interest
Menu
TG Resources and Services
•Computing – over a petaflop of computing
power and growing
•Data
– Data storage facilities & management tools
– Scientific data collections
•Over 30 Science Gateways
•Remote visualization servers and software
•Technical Support
– Central point of contact for support of all systems
– Advanced Support for TeraGrid Applications (ASTA)
•Education and training events and
resources
– K-12 Education
– Pathways
– Campus Champions
35
Human Connection: Your Campus Champion
• The Campus Champions program supports campus representatives as
the local source of knowledge about high-performance computing
opportunities and resources.
Knowledge plus assistance will empower campus researchers,
educators, and students to advance scientific discovery.
• Your campus will benefit by having direct access to the TeraGrid and
input to its staff, resource allocations awarded for their use, and
assistance in using those resources.
• TeraGrid will support the Campus Champion. See
– http://www.teragrid.org/eot/campuschamps.html
– To join the Campus Champions program, contact the TeraGrid Campus
Champions Program Coordinator, at tgcc-help@teragrid.org.
Online Resources
• Online resources at www.teragrid.org
• TeraGrid User Portal for managing allocations and job flow
• Documentation
– Knowledge Base for quick answers to FAQ’s
– HPC University to increase general HPC knowledge
• Calendar of events including upcoming workshops and
training
– Annual conference - TG09
• Arlington, VA
• June 22-26, 2009
TeraGrid: greater than the sum of its parts
• Leadership in cyberinfrastructure development, deployment
and support
• Expertise in building national computing and data resources
• Leveraging extensive resources, expertise, R&D, and EOT
– Leveraging activities at other participant sites
– Learning from each other improves expertise of all TG staff
– Shared training, education, and outreach resources benefit all
• Simplified access to high end resources
–
–
–
–
Single unified allocations process
Single point of contact for problem reporting
Coordinated software environments
Uniform access to heterogeneous resources to solve a single
scientific problem
Would you like to learn more about
getting a TeraGrid allocation ?
Yes
Not today
How does the Allocations process work?
• Startup allocations: for code development, experimentation with
TeraGrid platforms, and application testing. Startup requests may total
up to 200,000 service units (SUs) of computation, up to 5TB on disk
and 25TB on tape of storage.
• Education allocations: for use in classroom instruction or training
activities,
with the same SU and storage limits as Startup allocations.
• Research allocations: requires a detailed justification of resource
usage. Requests are reviewed four times a year by the Resource
Allocations Committee.
– National peer-review process
•allocates computational and data resources
•makes recommendations on allocation of advanced direct support services
•Currently awarding >1B Normalized Units of resources
– Principal investigator (PI) must be a researcher, educator, or postdoctoral
researcher at a US academic or non-profit research institution.
Go to the POPS page https://pops-submit.teragrid.org

Create a POPS Login
Indicate that you are “New” to the Teragrid

Indicate this is a “Start-up” Request

Select Startup or Educational

Fill out PI information
Skip Co-PIs info


Fill out info on your project

Fill out info on your funding
Estimate your computing need (reasonably)





Upload your CV and Submit!

when ready
Acknowledgements
•
This work is made possible by the dedicated efforts of the TeraGrid staff. In particular, slides came from Scott
Lathrop, Craig Stewart, John Towns, Dane Skow, Daphne Siefert-Herron, Vickie Lynch, David Hart (Indiana Dave);
David Hart (California Dave), Fran Berman, Nancy Wilkins-Diehr, Laura McGinnis and probably others.
•
The Grid Infrastructure Group management of the TeraGrid is funded by NSF grant 0503697.
•
The LEAD portal is developed under the leadership of IU Professors Dr. Dennis Gannon and Dr. Beth Plale, and
supported by NSF grant 331480. Marcus Christie and Surresh Marru of the Extreme! Computing Lab contributed the
LEAD graphics
•
The ChemBioGrid Portal is developed under the leadership of IU Professor Dr. Geoffrey C. Fox and Dr. Marlon Pierce
and funded via the Pervasive Technology Labs (supported by the Lilly Endowment, Inc.) and the National Institutes of
Health grant P20 HG003894-01.
•
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do
not necessarily reflect the views of the National Science Foundation (NSF), National Institutes of Health (NIH), Lilly
Endowment, Inc., or any other funding agency.
Thank you!
• Questions?
Download