A Vision for a New Era in Computational Science

advertisement

Cyberinfrastructure and

California

Dr. Francine Berman

Director, San Diego Supercomputer Center

Professor and High Performance Computing Endowed Chair,

UC San Diego

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

The Digital World

Science

Commerce

Information

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Entertainment

Today’s Technology is a Team Sport

• Today’s “computer” is a coordinated set of hardware, software, data, and services providing an “end-toend” resource.

• Cyberinfrastructure captures the integrated character of today’s IT environment

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman wireless

DATA

Field instrument computer computer network computer

DATA storage sensors network

DATA viz computer network field instrument

The “computer” as an integrated set of resources

UCSD

UNIVERSITY OF CALIFORNIA

Cyberinfrastructure -- An Integrating Concept

Cyberinfrastructure =

Resources

(computers, data storage, networks, scientific instruments, experts, etc.)

+ “Glue”

(integrating software, systems, and organizations)

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

How does Cyberinfrastructure Work?

Cyberinfrastructure-enabled Neurosurgery

Radiologists and neurosurgeons at

Brigham and Women’s Hospital,

Harvard Medical School exploring transmission of 30/40 MB brain images (generated during surgery) to SDSC for analysis and alignment

• PROBLEM: Neuro-surgeons seek to remove as much tumor tissue as possible while minimizing removal of healthy brain tissue

• Brain deforms during surgery

• Surgeons must align preoperative brain image with intra-operative images to provide surgeons the best opportunity for intra-surgical navigation

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

Transmission repeated every hour during 6-8 hour surgery.

Transmission and output must take on the order of minutes

Finite element simulation on biomechanical model for volumetric deformation performed at SDSC; output results are sent to BWH where updated images are shown to surgeons

UCSD

UNIVERSITY OF CALIFORNIA

SDSC is a National

Cyberinfrastructure Center

SDSC

National facility funded by NSF,

NIH, DOE, Library of Congress,

NARA, etc.

Employs nearly 400 researchers, staff and students

National Facility and UCSD

Organized Research Unit

Home to many associated activities including

• Protein Data Bank

• Biomedical Informatics Research

Network (BIRN) Coordinating

Center

• Geosciences Network (GEON)

• NEES IT Center, etc.

Data and

Knowledge Systems

Grid and

Cluster

Computing

SW tools, workbenches, toolkits

Community Databases and Data Collections

SAN DIEGO SUPERCOMPUTER CENTER

High Performance computing

Dataoriented

Science and

Engineering

Networking

Computational

Science and Engineering

UCSD

UNIVERSITY OF CALIFORNIA Fran Berman

SDSC Resources Are Available to the Community

COMPUTE SYSTEMS

• DataStar

• 2,528 Power4+ processors

• IBM p655 8-way and p690

32-way nodes

• 7 TB total memory

• Up to 3 GBps I/O to disk

• TeraGrid Cluster

• 512 Itanium2 IA-64 processors

• 1 TB total memory

• Also 128 2-way data nodes

Blue Gene Data

• First academic IBM Blue

Gene system

• 2,048 PowerPC processors

• 128 I/O nodes http://www.sdsc.edu/ user_services/

DATA ENVIRONMENT

• 1.4 PB Storage-area Network (SAN)

• 6 PB StorageTek tape library

• HPSS and SAM-QFS archival systems

• DB2, Oracle, MySQL

• Storage Resource Broker

• 72-CPU Sun Fire 15K

• IBM p690s – HPSS, DB2, etc http://datacentral.sdsc.edu/

Support for community data collections and databases

Data management, mining, analysis, and preservation

SCIENCE and TECHNOLOGY STAFF,

SOFTWARE, SERVICES

• User Services

• Application/Community Collaborations

• Education and Training

• SDSC Synthesis Center

• Community SW, toolkits, portals, codes

• http://www.sdsc.edu/

SAN DIEGO SUPERCOMPUTER CENTER UCSD

Fran Berman UNIVERSITY OF CALIFORNIA

Cyberinfrastructure Can Help Harness

Today’s Deluge of Data

• Over the next decade, data will come from everywhere

• Scientific instruments

• Experiments

• Sensors and sensornets

• New devices (personal digital devices, computerenabled clothing, cars, …)

• And be used by everyone

• Scientists

• Consumers

• Educators

• General public

Data from simulations

Data from sensors

Data from instruments

Volunteer Data

• Cyberinfrastructure must support unprecedented diversity, globalization, integration, scale, and use

Data from analysis

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

How much Data is there?*

iPod Shuffle

(up to 120 songs) = 512

MegaBytes

Printed materials in the Library of

Congress = 10 TeraBytes

1 human brain at the micron level

= 1 PetaByte

Kilo

10 3

Mega

10 6

Giga

10 9

1 novel = 1

MegaByte Tera

10 12

1 Low

Resolution

Photo = 100

KiloBytes

Peta

Exa

10

10

15

18

* Rough/average estimates

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

SDSC

HPSS tape archive = 6

PetaBytes

UCSD

UNIVERSITY OF CALIFORNIA

All worldwide information in one year

= 2

ExaBytes

Cybeirnfrastructure and Data:

Using Data for Analysis and

Simulation

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Cyberinfrastructure – enabled

Disaster Preparedness

The SCEC TeraShake simulation is a result of immense effort from the Geoscience community for over

10 years

Focus is on understanding big earthquakes and how they will impact sediment-filled basins.

• Simulation combines massive amounts of data, high-resolution models, large-scale supercomputer runs

1906

M 7.8

Major

Earthquakes on the San

Andreas Fault,

1680-present

1857

M 7.8

How dangerous is the southern San

Andreas Fault?

TeraShake results provide new information enabling better

• Estimation of seismic risk

• Emergency preparation, response and planning

• Design of next generation of earthquake-resistant structures

1680

M 7.7

?

Such simulations provide potentially immense benefits in saving both many lives and billions in economic losses

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Domain: 600Km x 300km x 80km

Mesh Dimension: 3000x1500x400

Spatial resolution = 200m

Simulated time = 200s

Number of time steps = 20,000

• What you’re looking at:

• L.A. experiences strong ground motion from the

S->N scenario

• The N->S rupture generates strong reverberations in the

Imperial Valley, ultimately hitting Mexicalli and other northern Mexico cities.

Large local peaks in ground motion near Palm

Springs, resulting in immense damage.

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Making Terashake Work --

Resources

• Computers and Systems

• 80,000 hours on 240 processors of DataStar

• 256 GB memory p690 used for testing, p655s used for production run, TG used for porting

• 30 TB Global Parallel file GPFS

• Run-time 100 MB/s data transfer from GPFS to SAM-QFS

• 27,000 hours post-processing for high resolution rendering

People

• 20+ people involved in information technology support

• 20+ people involved in geoscience modeling and simulation

SAN DIEGO SUPERCOMPUTER CENTER

• Data Storage

• 47 TB archival tape storage on Sun StorEdge SAM-QFS

• 47 TB backup on High

Performance Storage system

HPSS

• SRB Collection with

1,000,000 files

• Funding

• SDSC Cyberinfrastructure resources for TeraShake funded by NSF

• Southern California

Earthquake Center is an

NSF-funded geoscience research and development center

UCSD

UNIVERSITY OF CALIFORNIA Fran Berman

Cyberinfrastructure and Data:

Preserving our Scientific and

Cultural Heritage

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Data Preservation

• Many Science, Cultural, and Official

Collections must be sustained for the foreseeable future

• Critical collections must be preserved:

• community reference data collections (e.g. Protein Data Bank)

• irreplaceable collections

(e.g. Shoah collection)

• longitudinal data

(e.g. PSID

– Panel Study of

Income Dynamics)

• No plan for preservation often means that data is lost or damaged

“….

the progress of science and useful arts … depends on the reliable preservation of knowledge and information for generations to come.”

“Preserving Our Digital Heritage”,

Library of Congress

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Key Challenges for Digital Preservation

• What should we preserve?

• What materials must be “rescued”?

• How to plan for preservation of materials by design?

• How should we preserve it?

• Formats

• Storage media

• Stewardship – who is responsible?

• Who should pay for preservation?

• The content generators?

• The government?

• The users?

• Who should have access?

Print media provides easy access for long periods of time but is hard to data-mine

Digital media is easier to data-mine but requires management of evolution of media and resource planning over time

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Planning Ahead for Preservation

• Comprehensive approach to infrastructure for long-term preservation requires the integration of

• Collection ingestion

• Access and Services

• Research and development for new functionality and adaptation to evolving technologies

• Business model, data policies, and management issues critical to success of the infrastructure

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

Services

Policy

UCSD

UNIVERSITY OF CALIFORNIA

Ingestion

R&D

Consortium

Cyberinfrastructure

Resources at SDSC

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

SDSC Data Central

• First program of its kind to support research and community data collections and databases

• Comprehensive resources

• Disk: 400 TB accessible via HPC systems, Web, SRB, GridFTP

• Databases: DB2, Oracle, MySQL

• SRB: Collection management

• Tape: 6 PB, accessible via file system,

HPSS, Web, SRB, GridFTP

• Data collection and database hosting

• Batch oriented access

• Collection management services

• Collaboration opportunities:

• Long-term preservation

• Data technologies and tools

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

New Allocated Data Collections include

• Bee Behavior (Behavioral Science)

• C5 Landscape DB (Art)

• Molecular Recognition Database

(Pharmaceutical Sciences)

• LIDAR (Geoscience)

• LUSciD (Astronomy)

• NEXRAD-IOWA (Earth Science)

• AMANDA (Physics)

• SIO_Explorer (Oceanography)

• Tsunami and Landsat Data

(Earthquake Engineering)

• UC Merced Library Japanese Art Collection

(Art)

• Terabridge (Structural Engineering) datacentral-allocations@sdsc.edu

UCSD

UNIVERSITY OF CALIFORNIA

SDSC Academic Associates Program Targets

Enabling Cyberinfrastructure Collaborations

SDSC/UC Academic Associates

Program Cyberinfrastructure and

“Seeding” Activities

• Targeted workshops

Priority SW installation and support

• Priority participation for Cyberinfrastructure

Summer Institute

• Focused assistance with developing successful proposals for national allocation programs

• Targeted user services

Special UC compute and data allocations

• Priority for “early usage” of new national resources

SDSC Cyberinfrastructure

Resources Heavily

Used by UC faculty and students

• UC PIs account for 329+ trillion bytes of data stored at SDSC

• In FY05, over 5 million CPU hours on HPC machines at

SDSC were used by UC faculty and students at all campuses

• UCSD faculty make up 40% of among top users of

SDSC compute resources

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Cyberinfrastructure is Fundamental for

California

• Cyberinfrastructure captures the practice and potential of modern science and engineering

• Cyberinfrastructure is the focus of increasing number of federal programs

• NSF (all directorates), NIH (BISTI,

Bioinformatics, Computational Biology, etc.),

DOE (Science Grid), etc.

• Cyberinfrastructure is critical for success in modern research and education initiatives

• Stem cell research

• Grid computing

• Multi-disciplinary science and engineering

SAN DIEGO SUPERCOMPUTER CENTER

Leadership in

Cyberinfrastructure provides a competitive edge to

California researchers, educators, practitioners, and business leaders

UCSD

UNIVERSITY OF CALIFORNIA Fran Berman

Thank You

berman@sdsc.edu

www.sdsc.edu

SAN DIEGO SUPERCOMPUTER CENTER

Fran Berman

UCSD

UNIVERSITY OF CALIFORNIA

Download