file

advertisement
Cyberinfrastructure,
E-Science and
the San Diego Supercomputer Center
Chaitan Baru
San Diego Supercomputer Center
California Institute for Telecommunications and Information Technology
University of California, San Diego
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Acknowledgements
• US National Science Foundation
– Sponsors of GEON, and GEON international activities
• The University of Auckland
– Local hosts
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Cyberinfrastructure and E-science
• Cyberinfrastructure:
– “…The comprehensive infrastructure needed to capitalize on
dramatic advances in information technology…”
– “…essential to support the frontiers of research and education
in this field…”
– From NSF’s Cyberinfrastructure Vision for 21st Century
Discovery, www.nsf.gov/od/oci/ci-v7.pdf, July 20, 2006
• “E-Science”- the science enterprise enabled by
the use of such cyberinfrastructure
– “Science increasingly performed through distributed global
collaborations enabled by the Internet, using very large data
collections, terascale computing resources and high performance
visualizations.”
– From Oxford e-Science Center, http://e-science.ox.ac.uk/
public/general/definitions.xml
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
SDSC’s Support for CI and e-Science
• Production Services
– For nationally allocated supercomputer platforms, as well as computational
platforms and storage systems for other projects
• User Services
– For nationally allocated supercomputers
• Research and Development Collaborations
– In support of computational science and informatics in a wide variety of
science, engineering, humanities, and other disciplines
– To develop common cyberinfrastructure (software) components
• R&D constitutes >50% of SDSC’s activities
– In funding as well as staffing
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Integrated Cyberinfrastructure System
Education and Training
Discovery & Innovation
Source: Dr. Deborah Crawford, Chair, NSF CI Working Committee
Applications
• Geosciences
• Environmental Sciences
• Neurosciences
• High Energy Physics …
•
Development
Tools & Libraries
Domain-specific
Cybertools
(software)
Shared
Cybertools
(software)
Middleware Services
Hardware
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Distributed Resources
(computation, storage,
communication, etc.)
TeraGrid Network
Grid
Infrastructure
Group
(UChicago)
UW
PSC
UC/ANL
NCAR
PU
NCSA
Caltech
IU
UNC/RENCI
ORNL
U Tenn.
USC/ISI
SDSC
LSU
TACC
Resource Provider (RP)
Software Integration Partner
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
TeraGrid Science Gateways
• Provide entry points into TeraGrid for
community-specific tools
• Community-led initiative for the TeraGrid
• URL
– http://www.teragrid.org/programs/sci_gateways/
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Computational Science and Informatics:
And the CS/IT context
• Computational physics and chemistry
– Born at the time of Fortran, file-based systems, and expensive
supercomputers, Internet, ftp, and HTML
• Bioinformatics
– Born at the time of Relational Database Management Systems
(RDBMS), microprocessors, client-server computing, the Web, 3-tier
architectures, CORBA, XML
• Geoinformatics
– Being born at the time of Web2.0, Google, mySpace, YouTube,
mashups, social networking, and ontologies…
Ref: Caring and Sharing of e-Science Data, C. Baru, Commentary, International
Journal of Digital Libraries, October 2007
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Community Cyberinfrastructure Projects
Friendly Work-Facilitating Portals
Adapted from: Mark Ellisman
UC San Diego
Ocean Observing (OOI)
Ecological Observatories (NEON)
Earthquake Engineering (NEES)
Hardware
Geosciences (GEON)
Middleware
Services
Biomedical Informatics (BIRN)
Development
Tools & Libraries
High Enegy Physics (GriPhyN)
Authentication - Authorization - Auditing - Workflows - Visualization - Analysis
Your
Specific
Tools
& User
Apps.
Shared
Tools
Science
Domains
Distributed Computing, Instruments and Data
Resources
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Portal-based Science Environments
Support for resource sharing and collaborations
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Common CI Software Elements
• NSF Software Development for Cyberinfrastructure
(SDCI) Program
–
–
–
–
–
ROCKS -- Cluster Management Software
SRB/IRODS -- Collection-based Data Management
Kepler -- Scientific Workflow Software
Open Source DataTurbine -- Streaming Data Middleware
Inca -- Testing and Monitoring Software
• Other Common Software
– GAMA -- Grid Account Management Architecture
– GridSphere -- Portlet-based Portal Infrastructure
– RDV -- Realtime Data Viewer
• Common Portlets
– GEON portlets: Registration, Search, myWorkspace, TeraGrid Gateway
– Used in several other CI projects
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Observing Systems
• An important area for several US agencies, including
National Science Foundation
– Several agencies support observing system networks, e.g. USGS, NOAA,
EPA, DoE, DOD, NASA, DHS, etc
• A range of projects
– Major research equipment: deployment of coordinated, regional, continental,
international-scale instrumentation and sensor networks
–  standardized instrumentation and protocols
– Cyberinfrastructure: development of IT and software for managing sensor
networks; collecting, analyzing, distributing data; data assimilation and
execution of forecasting models
–  standardized IT infrastructure (interfaces, technology implementations)
– Individual investigator, or small group-driven research:
–
–
–
–
Local (regional) sensor networks, to study specific phenomena
Analysis of collected data
Modeling and data assimilation
…
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Observing Systems Efforts
• Some NSF Projects
– EarthScope: Obtain “snapshot” of the lithospheric structure of the continental US
– US Array; Plate Boundary Observatory (PBO); San Andreas Fault Observatory at Depth
(SAFOD)
– Ocean Observing Initiative: Understand ocean phenomena in the deep ocean and at
the coastal margins
– Regional Coastal Observatory; Global Observatory
– National Ecological Observatory Network (NEON): model and predict the state of the
ecosystem of the US
– 17 climatic domains across contiguous states + 2 in Alaska + 1 in Hawaii
– Long-Term Ecological Research Network (LTER): intensive studies at local and
regional scales
– >30 LTER sites across US
– WATERS: monitor watersheds across US to study hydrologic as well as
environmental engineering issues
– CLEANER: Environmental engineering-based observatory projects
– Hydrologic Information System (HIS): Hydrology-based observing systems projects
– NEES, NVO, …
• Moore Foundation-funded Projects
– CAMERA: Metagenomics and marine microbials
– GLEON: Global Lake Observatory Network
– TEAM: Tropical Ecological Assessment and Monitoring Network
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Cyberinfrastructure (CI) Components in
Observing Systems
• “Embedded CI”
– Software for managing instruments, dataloggers, and data in sensor networks,
including metadata generation
– “Cyberdashboard” for management of instruments/sensor networks
• Data Management
– of data streams (with metadata) from dataloggers (in the field) to data repositories, to
data archives
– “Cyberdashboard” to keep track of data collection protocols
• Analysis and Computation
– Support for model runs, data assimilation, data analysis, data mining, including
periodic reprocessing of archived data
• Data Access
– Authenticated access to a range of data products, from raw to highly derived,
including the ability to “push” data to client applications
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
NSF Ocean Observing Initiative (OOI)
Courtesy: John Orcutt, Scripps Institution of Oceanography, University of California, San Diego
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
OOI - Coastal Scale Observatory
Courtesy: John Orcutt, Scripps Institution of Oceanography, University of California, San Diego
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
OOI - Regional
Courtesy: John Orcutt, Scripps Institution of Oceanography, University of California, San Diego
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
OOI - Global Node
Courtesy: John Orcutt, Scripps Institution of Oceanography, University of California, San Diego
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
OOI - From Construction to Operations
Courtesy: John Orcutt, SIO Matt Arrott, Calit2, University of California, San Diego
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
OOI - Conceptual View of the
Cyberinfrastructure
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
NEON Cyberinfrastructure
NEON Domains
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
The NEON “Single String” Testbed
NEON Single String Testbed (SSTB)
James Reserve, CA
SDSC, San Diego
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
MoveBank
For Animal Tracking and Photo Monitoring Data
• A data repository
• A live data pipeline
• Online mapping and
analysis tools
• An educational tool
• A community of
collaborators
• www.movebank.org
•
NSF BD&I: 0756920
PIs: Roland Kays (NY History Museum), Martin Wikelski (Princeton), Tony
Fountain (SDSC, UCSD), Sameer Tilak (SDSC, UCSD)
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
MoveBank
Current Activities
• Designing Data System
– Requirements analysis
– Schema definitions for camera trap and
tracking data (trajectories)
• Extending DataTurbine streaming
data system for animal tracking
and photo monitoring
– Integration of cameras to data acquisition
system
– Event detection and notification system
design
• Building a knowledge base of best
practices
• Networking with other animal
tracking communities and
researchers to build collaborations
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Moore Observing Systems Projects
• Some projects funded by Gordon and Betty Moore Foundation
at UCSD
• CAMERA: Metagenomics project
– Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research
and Analysis (Craig Venter, Larry Smarr)
– Provide access to metagenomics databases collected from ocean water samples
from around the world
• OceanLife: Biodiversity in seamounts
– Karen Stocks & Amarnath Gupta, SDSC
– Integrated information source for seamount biodiversity
• GLEON: Global Lake Ecological Observatory Network
– Peter Arzberger, Calit2/UCSD, Tony Fountain, SDSC
– Tim Kratz, Paul Hanson, U.Wisc
• Cyberinfrastructure for TEAM
– Tropical Ecology Assessment and Monitoring
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Source: Paul Hanson, U.Wisc
Courtesy: Peter Arzberger, Calit2/UCSD
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
GLEON’s Mission
Facilitate interaction and build collaborations
among an international, multidisciplinary
community of researchers focused on
understanding, predicting, and communicating
the impact of natural and anthropogenic
influences on lake ecosystems by developing,
deploying, and using networks of emerging
observational system technologies and
associated cyberinfrastructure.
Source: Tim Kratz, U.Wisc
http://gleon.org
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Lake site
Cyber-support site
1. 19 countries participating
2. More than 120 scientists
3. Most sites are developing
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Source: Paul Hanson,
U.Wisc
3 Networks
People
Data
Lake observatories
Source:
Paul
GEON Workshop, Auckland, New Zealand,
Nov 26-28,
2007Hanson
Tropical Ecology Assessment and
Monitoring (TEAM) Network
• Conservation International project
– PI: Sandy Andelman, Vice President, Conservation International
– Funded by Gordon and Betty Moore Foundation
• Monitor wildland plots in tropical regions
– Current sites: Brazil (3), Costa Rica, Suriname
– Upcoming site: Madagascar
• Cyberinfrastructure provided by SDSC
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
TEAM Cyberinfrastructure Goals
• Provide secure, reliable access to near real-time data from all
TEAM sites
• Facilitate timely, efficient, consistent data entry
– By assisting with adherence to site-specific protocols
– Providing up-to-date status of data entry
– Providing ready visualizations of cross-site, network-level data
• Manage a variety of different data types
– Field collections, sensor data, museum collections, remote sensing data
– Sensor data includes images and acoustic data
• Provide customized portals (portlets)
– E.g. site-specific information (with multi-lingual support), and project specific data
and tools
• CI goals are similar to those of other environmental
observatory projects, e.g. NEON…
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
TEAM Initial Implementation
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
• Local PoP node:
E.g. at a site in a given country, or
One PoP node for a country
• Future capability
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
TEAM Portal and Data Management
• Portal based on
– Drupal: for content management
– GridSphere: for sharing and collaboration of data and tools
• Support different data types
– Observational data
– Climate data; Photos / images
– Spatial (GIS) data
– Different layers, e.g. including socioeconomic data
– Museum collections
– E.g. MetaCat, EcoGrid
– Acoustic data
– Algorithms for classifying acoustic data
– Remote sensing data
– Landsat, MODIS, ASTER, LiDAR
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
CUAHSI
Hydrologic Information System (HIS)
• Hydrology Data Portal
• Digital Watershed
• Hydrologic Analysis
(Source: David Maidment, UT Austin)
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
HIS Service Oriented Architecture
Web portal Interface (HDAS)
Information input, display, query and output services
Web services
interface
HTML -XML
WaterOneFlow
Web Services
e.g. USGS,
NCDC
WSDL - SOAP
3rd
party
servers
Uploads
Downloads
Preliminary data exploration and discovery. See what is
available and perform exploratory analyses
Data access
through web
services
Data storage
through web
services
GIS
Matlab
IDL
Observatory
servers
Workgroup HIS
SDSC HIS
servers
Splus, R
D2K, I2K
Programming
(Fortran, C,
VB)
WaterML and CUAHSI HIS Mediation
• Develop WaterML as an interchange standard
for hydrologic data
• HIS serves as a mediator across multiple
agency and individual PI data
– Provides identifiers for sites, variables, etc. across observation
networks
– Manages and publishes controlled vocabularies, and provides
vocabulary/ontology management and update tools
– Provides common structural definitions for data interchange
– Provides a sample protocol implementation
– Governance framework: a consortium of universities, MOUs with
federal agencies, collaboration with key commercial partners, led by
renowned hydrologists, and NSF support for core development and
test beds
NEESit and the NEES User Community





NEES Equipment Sites (15 large-scale labs)
NEESR Research Grants (>40 NSF projects)
Earthquake Engineering Researchers
Earthquake Engineering Practitioners
K-12 and Undergraduate students
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
The NEESit System
Scientific Collaboration Environment (NEES Portal)
Telepresence
EOT
Cyber Accessibility
Video, Data, Audio
Archiving
Secure Communication
Data Repository
Structured Metadata
Community Content
Graphical User Interface
On-line experiment
Phys/Comp.
e-publications
Curated
Computational Tools
High Performance Computing
Hybrid Simulation (Phys/Comp.)
Visualization
Scientific Workflows
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
NEES Portal: Parallel Computing &
TeraGrid Access
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Emergency Response Projects
• Katrinasafe and Disastersafe
– Collaboration between American Red Cross and SDSC during Hurricane
Katrina
– Continuing now as disastersafe.redcross.org
– Funded by an NSF grant for exploratory research on cyberinfrastructure
preparedness
• Spatiotemporal analysis of 911 call data
– Collaboration with Public Safety Network
– Funded by the NSF Digital Government program
• UCSD Hazards Initiative
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
disastersafe.redcross.org
• Outcome of collaboration on Katrinasafe
– Site hosted at SDSC
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Spatiotemporal Analysis of 9-1-1
Emergency Call Streams
• Funded by NSF Digital Government program
• Project Goals
– Provide situational awareness at a command and decision level (vs
operational)
– Assist local and State level emergency responses by
– Generating immediate and dynamic information about the impact of
medium- to large-scale events
– Facilitating dynamic resource allocation
– Serving as an early warning system of emergency events
• Collaboration among
– California Office of Emergency Services (OES)
– University of California, San Diego
– Public Safety Network
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Temporal Extent of Collected Data
• San Francisco Bay Area:
30 months of data
• San Diego County:
16 months of data
• Total of
5,301,191 calls
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Spatial Extent of Collected Data
San Francisco Bay
Area, 69 PSAPs
San Diego County, 20 PSAPs
(Dithered to approx. 300m; One day of 9-1-1 call activity shown)
= landline call
= cellular call
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Call Stream Shows Temporal Regularity
Average hourly call volume for the San
Francisco Combined Emergency
Communications Center (CECC) PSAP.
Average daily call volume for the
San Francisco Combined
Emergency Communications
Center (CECC) PSAP.
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Daily Call Volume
4th of July
Data collection
process offline
Histogram of daily call volume for
the collected data
Times series of daily call
volume for the collected
data (SF)
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Animation: Clustering of phone calls
QuickTime™ and a
Microsoft Video 1 decompressor
are needed to see this picture.
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Cessna plane collision in San Diego
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Other projects
• PRAGMA: Pacific Rim Assembly for Grid
Middleware Applications
– PI: Dr. Peter Arzeberger, UCSD; Co-PI: Phil Papadopoulos
– GEON is a participant in PRAGMA, and co-chairs the PRAGMA
Geosciences Working Group
• Optic fiber links and Lamba Grid
– PI: Prof. Larry Smarr
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Technical Interoperability Issues
• Authentication
– Need a common authentication framework, to provide role-based
access to distributed resources
– Else, users will be burdended with too many accounts and
passwords, one for each site
• Information security
– Provenance, IP issues
• Distributed Data (and Metadata)
– Metadata search interoperability
– Large archives will remain distributed. Need metadata search
interoperability so that a single search can search several metadata
catalogs
– Caching and replication of frequently used (large) data
– “Distributed curation with centralized hosting” could be an option
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Technical Interoperability Issues
• Distributed Computing
– “Portlet aggregation”
– A set of functionality, e.g. data+Web services, can be implemented as
a portlet
– A portal can be deployed containing a number of such distributed
portlets
– Portals can provide “gateways” to large storage and computing
resources, e.g. including the TeraGrid
– “Federated portlets”
– A set of portlets shared by more than one community
• Technologies to Support Collaborations in
Virtual Organizations
– Standard tools (email, forums, wikis)
– Social networking
– Development of ontologies, and recommendation systems
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Thanks!
• email: baru@sdsc.edu
GEON Workshop, Auckland, New Zealand, Nov 26-28, 2007
Download