TeraGrid and Web 2.0 Technologies

advertisement
TeraGrid and Web 2.0
Technologies
Daniel S. Katz
d.katz@ieee.org
Director of Science, TeraGrid GIG
Senior Computational Researcher, Computation Institute,
University of Chicago & Argonne National Laboratory
Affiliate Faculty, Center for Computation & Technology, LSU
Adjunct Associate Professor, Electrical and Computer
Engineering Department, LSU
Outline
• Introduction to TeraGrid
• Web 2.0 in TeraGrid in general
• Web 2.0 in Cactus
• Web 2.0 in Science Gateways
What is the TeraGrid
• World’s largest distributed cyberinfrastructure for open scientific research,
supported by US NSF
• Integrated high performance computers (>2 PF HPC & >27000 HTC CPUs),
data resources (>3 PB disk, >60 PB tape, data collections), visualization,
experimental facilities (VMs, GPUs, FPGAs), network at 11 Resource Provider
sites
• Allocated to US researchers and their collaborators through national peer-review
process
• DEEP: provide powerful computational resources to enable research that can’t
otherwise be accomplished
• WIDE: grow the community of computational science and make the resources
easily accessible
• OPEN: connect with new resources and institutions
• Integration: Single: portal, sign-on, help desk, allocations process, advanced
user support, EOT, campus champions
Who Uses TeraGrid (2008)
How TeraGrid Is Used
Use Modality
Community Size
(rough est. - number of users)
Batch Computing on Individual Resources
850
Exploratory and Application Porting
650
Workflow, Ensemble, and Parameter Sweep
250
Science Gateway Access
500
Remote Interactive Steering and Visualization
35
Tightly-Coupled Distributed Computation
10
2006 data
How One Uses TeraGrid
RP 1
RP 2
POPS
(for now)
User
Portal
Science
Gateways
TeraGrid Infrastructure
Accounting, …
(Accounting, Network,Network,
Authorization,…)
Command
Line
RP 3
Compute
Service
Slide courtesy of Dane Skow and Craig Stewart
Viz
Service
Data
Service
User Portal: portal.teragrid.org
Access to resources
• Terminal: ssh, gsissh
• Portal: TeraGrid user
portal, Gateways
– Once logged in to portal,
click on “Login”
• Also, SSO from
command-line
Science Gateways
• A natural extension of Internet & Web 2.0
• Idea resonates with Scientists
– Researchers can imagine scientific capabilities provided through
familiar interface
• Mostly web portal or web or client-server program
• Designed by communities; provide interfaces understood by
those communities
– Also provide access to greater capabilities (back end)
– Without user understand details of capabilities
– Scientists know they can undertake more complex analyses and that’s
all they want to focus on
– TeraGrid provides tools to help developer
• Seamless access doesn’t come for free
– Hinges on very capable developer
Slide courtesy of Nancy Wilkins-Diehr
Current Science Gateways
•
•
•
•
•
•
•
Biology and Biomedicine Science Gateway
Open Life Sciences Gateway
The Telescience Project
Grid Analysis Environment (GAE)
Neutron Science Instrument Gateway
TeraGrid Visualization Gateway, ANL
BIRN
• Open Science Grid (OSG)
• Special PRiority and Urgent Computing
Environment (SPRUCE)
• National Virtual Observatory (NVO)
• Linked Environments for Atmospheric
Discovery (LEAD)
• Computational Chemistry Grid (GridChem)
• Computational Science and Engineering
Online (CSE-Online)
• GEON(GEOsciences Network)
• Network for Earthquake Engineering
Simulation (NEES)
• SCEC Earthworks Project
• Network for Computational Nanotechnology
and nanoHUB
• GIScience Gateway (GISolve)
• Gridblast Bioinformatics Gateway
• Earth Systems Grid
• Astrophysical Data Repository (Cornell)
Slide courtesy of Nancy Wilkins-Diehr
TG App: Predicting storms
• Hurricanes and tornadoes cause
massive loss of life and damage to
property
• TeraGrid supported spring 2007 NOAA
and University of Oklahoma Hazardous
Weather Testbed
– Major Goal: assess how well ensemble
forecasting predicts thunderstorms, including
the supercells that spawn tornadoes
– Nightly reservation at PSC, spawning jobs at
NCSA as needed for details
– Input, output, and intermediate data
transfers
– Delivers “better than real time” prediction
– Used 675,000 CPU hours for the season
– Used 312 TB on HPSS storage at PSC
Slide courtesy of Dennis Gannon, ex-IU, and LEAD Collaboration
TG App: SCEC-PSHA
• Part of SCEC (Tom Jordan, USC)
• Using the large scale simulation data,
estimate probablistic seismic hazard
(PSHA) curves for sites in southern
California (probability that ground motion
will exceed some threshold over a given
time period)
• Used by hospitals, power plants, schools,
etc. as part of their risk assessment
• For each location, need a Cybershake run
followed by roughly 840,000 parallel short
jobs (420,000 rupture forecasts, 420,000
extraction of peak ground motion)
– Parallelize across locations, not individual
workflows
• Completed 40 locations to date, targeting
200 in 2009, and 2000 in 2010
Managing these requires
effective grid workflow tools for
job submission, data
management and error
recovery, using Pegasus (ISI)
and DAGman (Wisconsin)
12
Information/image courtesy of Phil Maechling
App: GridChem
Different
licensed
applications
with different
queues
Will be
scheduled for
workflows
Slide courtesy of Joohyun Kim
TG Apps: Genius and Materials
Fully-atomistic simulations of
clay-polymer nanocomposites
Modeling blood flow before
(during?) surgery
Why cross-site /
distributed runs?
HemeLB on LONI
1.Rapid turnaround,
conglomeration of
idle processors to
run a single large job
2.Run big compute &
big memory jobs not
possible on a single
machine
Slide courtesy of Steven Manos and Peter Coveney
LAMMPS on TeraGrid
TeraGrid Future
• Current RP agreements end in March 2011
– Except track 2 centers (current and future)
• TeraGrid XD (eXtreme Digital) starts in April 2011
– Potential interoperation with OSG and others
• Current TG GIG continues through July 2011
– Allows four months of overlap in coordination
– Probable overlap between GIG and XD members
• Blue Waters (track 1) production in 2011
TeraGrid: Both Operations and
Research
• Operations
– Facilities/services on which users rely
– Infrastructure on which other providers build
AND
• R&D
– Learning how to do distributed, collaborative science on a global,
federated infrastructure
– Learning how to run multi-institution shared infrastructure
Outline
• Introduction to TeraGrid
• Web 2.0 in TeraGrid in general
• Web 2.0 in Cactus
• Web 2.0 in Science Gateways
Web 2.0 in TeraGrid in general
• EOT/ER
– TGCommunity: Use social networking tool(s) to provide a collaboration
environment in support of users of on-line training materials
– Student Engagement: use of social networks to engage more students in
Computational Science Problem of the week; general student engagement
– ER communications: Facilitate storage of and access to science images and stories
• RDAV
– Remote Data Analysis and Visualization (RDAV) system to be added to TeraGrid in
2010
– Will include DoE’s Scientific Data Management (SDM) Dashboard, which includes
methods for disseminating (sharing) results among defined groups
• TG staff makes heavy use of wiki
– Almost all working group communication/collaboration
– Including drafting and assembling quarterly and annual report, project plans,
budgets
– Most old presentations
– Area for campus champions
– Working on user wiki, but not there yet
Outline
• Introduction to TeraGrid
• Web 2.0 in TeraGrid in general
• Web 2.0 in Cactus
• Web 2.0 in Science Gateways
Web 2.0 in Cactus
Cactus: Community toolkit or framework or
environment, http://www.cactuscode.org/
Individual
research
groups
Domain specific
shared
infrastructure
Flesh: APIs,
information,
orchestration
Adaptive mesh refinement,
parallel I/O, interaction, …
Credit: Gabrielle Allen (Integrating Web 2.0 Technologies..., IEEE Cluster Comp. 2009)
Cactus Project Info Historically
• 1995 - Material put in http for Mosaic
– Mosaic encouraged content to make WWW useful
• mid to late 90’s – collaborative cork board (CoCoBoard)
– Web-based project pages – could attach images (1-D result plots)
• Up to present – wiki
– Project-based private wiki
– Cons: network needed to access/edit wiki, editing slow
Credit: Gabrielle Allen (Integrating Web 2.0 Technologies..., IEEE Cluster Comp. 2009)
Cactus Simulation Info Historically
• 1999 – httpd thorn (in main dist. in 2000)
– First collaborative tool integrated into Cactus
– Published simulation status, variables, timing, viewport, output files,
etc. to web page
– Allowed parameter steering through web page
– Issues:
• Authorization to web pages (username/password in parameter file) is
insecure and awkward, newer version uses https and can also use X.509)
• Browsers can display images in certain formats, a Visualization thorn uses
gnuplot to include e.g. performance with time, physical parameters
• Problem deploying on compute nodes where web server cannot be directly
accessed (port forwarding, firewalls)
• How to find and track the simulations, publicize existence to a
collaboration?
Credit: Gabrielle Allen (Integrating Web 2.0 Technologies..., IEEE Cluster Comp. 2009)
Cactus Simulation Info Historically
• 2001 – prototype of readable report automatically
generated for each simulation (computation and physics)
• How to collect reports in one place?
• Mail Thorn (sendmail)
– Email reliable and fault tolerant (spool)
– Supercomputers do not allow mail to be sent from compute nodes
• Notification also was done by SMS
• All had to be customize written for Cactus, then maintained,
ported to various machines
Credit: Gabrielle Allen (Integrating Web 2.0 Technologies..., IEEE Cluster Comp. 2009)
New Web 2.0 technologies in Cactus
• Twitter thorn
– Uses libcurl
– Includes parameters for twitter name/passwd
– Uses twitter API for status/updates
• Flickr thorn
– Uses flickcurl, libcurl, libxm2, openssl
– Authentication more complex (API key, shared secret)
– Sends images from running simulation, generated by other Cactus
thorns
– Each simulation gets its own Flickr set of images
Credit: Gabrielle Allen (Integrating Web 2.0 Technologies..., IEEE Cluster Comp. 2009)
Future Web 2.0 technologies in
Cactus
• Video thorn
– Sends animations of simulations to Flickr, YouTUbe, Vimeo
• Common authentication mechanism for multiple services?
• Social networking model – making it easier for groups to
form and collaborate
• Other possibilities - DropBox to publish files across a
collaboration, SlideShare for presentations, WordPress for
simulation reports/blogs, FaceBook to replace grid portals
and aggregate services, Cloud computing APIs for “grid”
scenarios, …
Credit: Gabrielle Allen (Integrating Web 2.0 Technologies..., IEEE Cluster Comp. 2009)
Outline
• Introduction to TeraGrid
• Web 2.0 in TeraGrid in general
• Web 2.0 in Cactus
• Web 2.0 in Science Gateways
Science Portals 2.0
• Workspace customized by user for specific projects
• Pluggable into iGoogle or other compliant (and open source) containers
• Integrates with user workspace to provide a complete and dynamic view of the user’s science,
alongside other aspects of their lives (e.g. weather, news)
• Integrates with social networks (e.g. FriendConnect, MySpace) to support collaborative science
• TG User Portal provides this same information, but this view is more dynamic and more
reusable, and can be more flexibly integrated into the user’s workspace
• Gadgets suitable for use on mobile devices
Technology Detail
• Gadgets are HTML/JavaScript
embedded in XML
• Gadgets conform to specs that
are supported by many
containers (iGoogle, Shindig,
Orkut, MySpace)
Resource Load gadget shows current view
of load on available resources
Job Status gadget shows current view of
job queues by site, user, status
File Transfer gadget for
lightweight access to data stores,
simple global file searching, and
reliable transfers
Domain science gadgets
complement general purpose
gadgets to encompass the full
range of scientists’ interests
Slide courtesy of Wenjun Wu, Thomas Uram, Michael Papka
OLSGW Gadgets
•OLSGW Integrates bio-informatics applications
•BLAST, InterProScan, CLUSTALW , MUSCLE, PSIPRED, ACCPRO, VSL2
•454 Pyrosequencing service under development
•Four OLSGW gadgets have been published in the iGoogle gadget directory. Search for
“TeraGrid Life Science”.
Slide courtesy of Wenjun Wu, Thomas Uram, Michael Papka
Run Social and Behavior Science Tools as
SIDGrid Gadgets
SIDGrid Experiment browsing page
Listing project files and available analysis tools;
Providing browser-side gadget execution environment
Three steps to launch SIDGRID application gadgets:
1. Select data files to analyze
2. Select an analysis application
3. Launch SIDGrid gadgets (Praat and workflow
history gadget) to run analysis and monitor the
progress
Slide courtesy of Wenjun Wu, Thomas Uram, Michael Papka
PolarGrid
• Goal: Work with Center for
Remote Sensing of Ice Sheets
• Requirements:
– View CReSIS data sets, run filters,
and view results through Web
map interfaces;
– See/Share user’s events in a
Calendar;
– Update results to a common
repository with appropriate access
controls;
– Post the status of computational
experiments.
– Support collaboration and
information exchange by
interfacing to blogs and discussion
areas
Login Screen
Interface to create new users and login using
existing accounts.
Integrated with OpenID API for authentication.
Solution: Web 2.0-enabled PolarGrid Portal
Slide courtesy of Raminder Singh, Gerald Guo, Marlon Pierce
PolarGrid
Home Page with a set of gadgets like Google Calendar, Picasa, Facebook, Blog, Twitter
Slide courtesy of Raminder Singh, Gerald Guo, Marlon Pierce
Google Gadgets & Satellite Data
• Purdue: disseminating remote sensing products
– Flash- and HTML-based gadgets, programmed by undergraduate students
• Bring traffic/usage to a broad set of satellite products at Purdue
Terrestrial Observatory
• Backend system including commercial satellite data receivers, a smaller
cluster, TG data collection, etc.
• User control includes zoom, pause/resume, frame step through, etc.
MODIS satellite viewer
Slide courtesy of Carol Song
GOES-12 satellite viewer
Future App: Real-time High
Resolution Radar Data
• Delivering 3D visualization of
radar data via a Google gadget
• LiveRadar3D
– Super high res, real-time NEXRAD
data
– Continuously updated as new data
comes
– 3D rendering that includes multiple
stations in the US
– Significant processing (high
throughout) and rendering supported
by TG systems
– To be released next spring
Slide courtesy of Carol Song
GISolve 2.0
• TG GIScience Gateway (for high performance, distributed, collaborative GIS) uses the
GISolve Toolkit middleware to synthesize cyberinfrastructure, GIS, and spatial analysis
and modeling capabilities, including Web 2.0:
– AJAX for highly-interactive user interface
– GeoServer & OpenLayers & Google Maps for online mapping & visualization, interactive
data browsing/selection/editing
– Twitter for status updates of TG analysis jobs for online collaboration
– REST for TeraGrid/application/gateway integration
Slide courtesy of Shaowen Wang and Yan Liu
Comparison of Gadgets and Portlets
OpenSocial Gadget (Web 2.0)
Reusable browser-side web module
XML, HTML, CSS, JavaScript
Reusability
Java Portlet (Web 1.0)
Reusable server-side portal module
Web Form, Portlet/JSP Markup, Portlet code
Advantage: Wider applicability
Application Logic
OpenSocial
Containers
Defined in the JavaScript code of the
gadget
Defined in the server-side portlet
Advantage: Greater interactivity and
scalability
Communication
with Server
AJAX
Container/Langu
age Dependence
OpenSocial-compliant container,
language independent
(PHP, Java,…)
Web Form, Portlet/Servlet
Advantage: Greater interactivity
JSR 168 Portal
Java
Advantage: Larger community of users
and developers == faster advancement
Deployment
OpenSocial Container:
iGoogle, MySpace, Orkut,…
Portlet container:
Gridsphere, Websphere
Advantage: Larger community, more
reuse between sites
Slide courtesy of Wenjun Wu, Thomas Uram, Michael Papka
Thanks & an advert
• GCE2009 Workshop: Fifth workshop on Grid
Computing Environments
– Supercomputing 2009 Workshop, November 20th
• Topics: Science Gateways, Social Networking, Web
Security, Gateway Toolkits, Mobile Applications,
and Information Services
– See http://www.collabogce.org/gce09/index.php/Program
– Proceedings to be published by ACM Digital Library
• GCE08 Proceedings:
– http://ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=47
38437&isYear=2008
Download