Director, San Diego Supercomputer Center
Professor and High Performance Computing Endowed Chair,
UC San Diego
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
Science
Commerce
Information
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
Entertainment
• Today’s “computer” is a coordinated set of hardware, software, data, and services providing an “end-toend” resource.
• Cyberinfrastructure captures the integrated character of today’s IT environment
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman wireless
DATA
Field instrument computer computer network computer
DATA storage sensors network
DATA viz computer network field instrument
The “computer” as an integrated set of resources
UCSD
UNIVERSITY OF CALIFORNIA
Cyberinfrastructure =
Resources
(computers, data storage, networks, scientific instruments, experts, etc.)
+ “Glue”
(integrating software, systems, and organizations)
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
Cyberinfrastructure-enabled Neurosurgery
Radiologists and neurosurgeons at
Brigham and Women’s Hospital,
Harvard Medical School exploring transmission of 30/40 MB brain images (generated during surgery) to SDSC for analysis and alignment
• PROBLEM: Neuro-surgeons seek to remove as much tumor tissue as possible while minimizing removal of healthy brain tissue
• Brain deforms during surgery
• Surgeons must align preoperative brain image with intra-operative images to provide surgeons the best opportunity for intra-surgical navigation
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Transmission repeated every hour during 6-8 hour surgery.
Transmission and output must take on the order of minutes
Finite element simulation on biomechanical model for volumetric deformation performed at SDSC; output results are sent to BWH where updated images are shown to surgeons
UCSD
UNIVERSITY OF CALIFORNIA
SDSC
•
National facility funded by NSF,
NIH, DOE, Library of Congress,
NARA, etc.
•
Employs nearly 400 researchers, staff and students
•
National Facility and UCSD
Organized Research Unit
•
Home to many associated activities including
• Protein Data Bank
• Biomedical Informatics Research
Network (BIRN) Coordinating
Center
• Geosciences Network (GEON)
• NEES IT Center, etc.
Data and
Knowledge Systems
Grid and
Cluster
Computing
SW tools, workbenches, toolkits
Community Databases and Data Collections
SAN DIEGO SUPERCOMPUTER CENTER
High Performance computing
Dataoriented
Science and
Engineering
Networking
Computational
Science and Engineering
UCSD
UNIVERSITY OF CALIFORNIA Fran Berman
•
COMPUTE SYSTEMS
• DataStar
• 2,528 Power4+ processors
• IBM p655 8-way and p690
32-way nodes
• 7 TB total memory
• Up to 3 GBps I/O to disk
• TeraGrid Cluster
• 512 Itanium2 IA-64 processors
• 1 TB total memory
• Also 128 2-way data nodes
Blue Gene Data
• First academic IBM Blue
Gene system
• 2,048 PowerPC processors
• 128 I/O nodes http://www.sdsc.edu/ user_services/
•
DATA ENVIRONMENT
• 1.4 PB Storage-area Network (SAN)
• 6 PB StorageTek tape library
• HPSS and SAM-QFS archival systems
• DB2, Oracle, MySQL
• Storage Resource Broker
• 72-CPU Sun Fire 15K
• IBM p690s – HPSS, DB2, etc http://datacentral.sdsc.edu/
Support for community data collections and databases
Data management, mining, analysis, and preservation
SCIENCE and TECHNOLOGY STAFF,
SOFTWARE, SERVICES
• User Services
• Application/Community Collaborations
• Education and Training
• SDSC Synthesis Center
• Community SW, toolkits, portals, codes
• http://www.sdsc.edu/
SAN DIEGO SUPERCOMPUTER CENTER UCSD
Fran Berman UNIVERSITY OF CALIFORNIA
• Over the next decade, data will come from everywhere
• Scientific instruments
• Experiments
• Sensors and sensornets
• New devices (personal digital devices, computerenabled clothing, cars, …)
• And be used by everyone
• Scientists
• Consumers
• Educators
• General public
Data from simulations
Data from sensors
Data from instruments
Volunteer Data
• Cyberinfrastructure must support unprecedented diversity, globalization, integration, scale, and use
Data from analysis
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
iPod Shuffle
(up to 120 songs) = 512
MegaBytes
Printed materials in the Library of
Congress = 10 TeraBytes
1 human brain at the micron level
= 1 PetaByte
Kilo
10 3
Mega
10 6
Giga
10 9
1 novel = 1
MegaByte Tera
10 12
1 Low
Resolution
Photo = 100
KiloBytes
Peta
Exa
10
10
15
18
* Rough/average estimates
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
SDSC
HPSS tape archive = 6
PetaBytes
UCSD
UNIVERSITY OF CALIFORNIA
All worldwide information in one year
= 2
ExaBytes
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
•
The SCEC TeraShake simulation is a result of immense effort from the Geoscience community for over
10 years
•
Focus is on understanding big earthquakes and how they will impact sediment-filled basins.
• Simulation combines massive amounts of data, high-resolution models, large-scale supercomputer runs
1906
M 7.8
Major
Earthquakes on the San
Andreas Fault,
1680-present
1857
M 7.8
How dangerous is the southern San
Andreas Fault?
•
TeraShake results provide new information enabling better
• Estimation of seismic risk
• Emergency preparation, response and planning
• Design of next generation of earthquake-resistant structures
1680
M 7.7
?
•
Such simulations provide potentially immense benefits in saving both many lives and billions in economic losses
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
Domain: 600Km x 300km x 80km
Mesh Dimension: 3000x1500x400
Spatial resolution = 200m
Simulated time = 200s
Number of time steps = 20,000
• What you’re looking at:
• L.A. experiences strong ground motion from the
S->N scenario
• The N->S rupture generates strong reverberations in the
Imperial Valley, ultimately hitting Mexicalli and other northern Mexico cities.
•
Large local peaks in ground motion near Palm
Springs, resulting in immense damage.
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
• Computers and Systems
• 80,000 hours on 240 processors of DataStar
• 256 GB memory p690 used for testing, p655s used for production run, TG used for porting
• 30 TB Global Parallel file GPFS
• Run-time 100 MB/s data transfer from GPFS to SAM-QFS
• 27,000 hours post-processing for high resolution rendering
•
People
• 20+ people involved in information technology support
• 20+ people involved in geoscience modeling and simulation
SAN DIEGO SUPERCOMPUTER CENTER
• Data Storage
• 47 TB archival tape storage on Sun StorEdge SAM-QFS
• 47 TB backup on High
Performance Storage system
HPSS
• SRB Collection with
1,000,000 files
• Funding
• SDSC Cyberinfrastructure resources for TeraShake funded by NSF
• Southern California
Earthquake Center is an
NSF-funded geoscience research and development center
UCSD
UNIVERSITY OF CALIFORNIA Fran Berman
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
• Many Science, Cultural, and Official
Collections must be sustained for the foreseeable future
• Critical collections must be preserved:
• community reference data collections (e.g. Protein Data Bank)
• irreplaceable collections
(e.g. Shoah collection)
• longitudinal data
(e.g. PSID
– Panel Study of
Income Dynamics)
• No plan for preservation often means that data is lost or damaged
“….
the progress of science and useful arts … depends on the reliable preservation of knowledge and information for generations to come.”
“Preserving Our Digital Heritage”,
Library of Congress
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
• What should we preserve?
• What materials must be “rescued”?
• How to plan for preservation of materials by design?
• How should we preserve it?
• Formats
• Storage media
• Stewardship – who is responsible?
• Who should pay for preservation?
• The content generators?
• The government?
• The users?
• Who should have access?
Print media provides easy access for long periods of time but is hard to data-mine
Digital media is easier to data-mine but requires management of evolution of media and resource planning over time
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
• Comprehensive approach to infrastructure for long-term preservation requires the integration of
• Collection ingestion
• Access and Services
• Research and development for new functionality and adaptation to evolving technologies
• Business model, data policies, and management issues critical to success of the infrastructure
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
Services
Policy
UCSD
UNIVERSITY OF CALIFORNIA
Ingestion
R&D
Consortium
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
• First program of its kind to support research and community data collections and databases
• Comprehensive resources
• Disk: 400 TB accessible via HPC systems, Web, SRB, GridFTP
• Databases: DB2, Oracle, MySQL
• SRB: Collection management
• Tape: 6 PB, accessible via file system,
HPSS, Web, SRB, GridFTP
• Data collection and database hosting
• Batch oriented access
• Collection management services
• Collaboration opportunities:
• Long-term preservation
• Data technologies and tools
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
New Allocated Data Collections include
• Bee Behavior (Behavioral Science)
• C5 Landscape DB (Art)
• Molecular Recognition Database
(Pharmaceutical Sciences)
• LIDAR (Geoscience)
• LUSciD (Astronomy)
• NEXRAD-IOWA (Earth Science)
• AMANDA (Physics)
• SIO_Explorer (Oceanography)
• Tsunami and Landsat Data
(Earthquake Engineering)
• UC Merced Library Japanese Art Collection
(Art)
• Terabridge (Structural Engineering) datacentral-allocations@sdsc.edu
UCSD
UNIVERSITY OF CALIFORNIA
SDSC/UC Academic Associates
Program Cyberinfrastructure and
“Seeding” Activities
• Targeted workshops
•
Priority SW installation and support
• Priority participation for Cyberinfrastructure
Summer Institute
• Focused assistance with developing successful proposals for national allocation programs
• Targeted user services
•
Special UC compute and data allocations
• Priority for “early usage” of new national resources
SDSC Cyberinfrastructure
Resources Heavily
Used by UC faculty and students
• UC PIs account for 329+ trillion bytes of data stored at SDSC
• In FY05, over 5 million CPU hours on HPC machines at
SDSC were used by UC faculty and students at all campuses
• UCSD faculty make up 40% of among top users of
SDSC compute resources
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA
• Cyberinfrastructure captures the practice and potential of modern science and engineering
• Cyberinfrastructure is the focus of increasing number of federal programs
• NSF (all directorates), NIH (BISTI,
Bioinformatics, Computational Biology, etc.),
DOE (Science Grid), etc.
• Cyberinfrastructure is critical for success in modern research and education initiatives
• Stem cell research
• Grid computing
• Multi-disciplinary science and engineering
SAN DIEGO SUPERCOMPUTER CENTER
Leadership in
Cyberinfrastructure provides a competitive edge to
California researchers, educators, practitioners, and business leaders
UCSD
UNIVERSITY OF CALIFORNIA Fran Berman
berman@sdsc.edu
www.sdsc.edu
SAN DIEGO SUPERCOMPUTER CENTER
Fran Berman
UCSD
UNIVERSITY OF CALIFORNIA