Novo-G - NC State University

advertisement
12 Jan 2012
2012 NSF I/UCRC Annual Meeting
CHREC and Novo-G:
An Innovative and Synergistic Research Project and
The World’s Most Powerful Reconfigurable Supercomputer
Alan D. George, Ph.D.
Director, NSF CHREC Center
Professor of ECE, University of Florida
Herman Lam, Ph.D.
Assoc. Professor of ECE, University of Florida
Research highlighted in this presentation was supported in part by the I/UCRC Program of the
National Science Foundation under Grant No. EEC-0642422.
Outline

CHREC






Novo-G





Center overview
CHREC sites, faculty, & students
CHREC members & memberships
Industry impact & technology transfer
Reconfigurable computing
Overview
Machine architecture
Application acceleration
International Novo-G forum
Conclusions and Looking Ahead
2
I/UCRC grant originated in Sep. 2006
• 1 university site, 9 membership commitments
Strong growth in first 5 years (Phase-I)
• Grown to 4 university sites (UF, GW, BYU, VT)
• Grown to 29 members (aerospace, IT, etc.)
Center Mission and Theme
• Grown to 42 memberships (all full, $35K/ea)
• R&D to advance S&T in nexus of reconfigurable,
high-performance, and/or high-performance
Strong scholarship record
embedded computing (i.e., RC, HPC, HPEC)
• >115 refereed journal & conference papers
• Computing performance, power, adaptivity,
• Several NSF CAREER awards
scalability, productivity, cost, size, weight, etc.
• Best-paper awards, keynotes, etc.
• From space satellites to supercomputers!
Strong graduation record
• Dozens of Ph.D. & M.S. graduates to date
• Many hired by CHREC members
• Dozens more served with members as interns
World-class facilities developed in-house
• Novo-G: world’s top reconfigurable computer
• HokieSpeed: GPU-centric supercomputer
• Pyramid: CPU-centric supercomputer
CHREC Faculty

University of Florida (lead)








Brigham Young University





Dr. Brent E. Nelson, Professor of ECE – BYU Site Director
Dr. Michael J. Wirthlin, Professor of ECE
Dr. Brad L. Hutchings, Professor of ECE
Dr. Michael Rice, Professor of ECE
George Washington University



Dr. Alan D. George, Professor of ECE – Center Director
Dr. Herman Lam, Associate Professor of ECE
Dr. Ann Gordon-Ross, Assistant Professor of ECE
Dr. Greg Stitt, Assistant Professor of ECE
Dr. Jose Principe, Distinguished Professor of ECE and BME
Dr. Andy Li, Associate Professor of ECE
Dr. Vikas Aggarwal, Research Scientist in ECE
Dr. Tarek El-Ghazawi, Professor of ECE – GWU Site Director
Dr. Vikram Narayana, Assistant Research Professor in ECE
Virginia Tech




Dr. Peter Athanas, Professor of ECE – VT Site Director
Dr. Wu-Chun Feng, Associate Professor of CS and ECE
Dr. Patrick Shaumont, Assistant Professor of ECE
Dr. Heshan Lin, Senior Research Associate in CS
4
Most importantly,
CHREC features an
exceptional team of
>40 graduate
students spanning
our 4 university sites.
1.
CHREC Members
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
42 memberships ($35K/ea)
from 29 members in 2011
26.
27.
28.
29.
5
AFRL Munitions Directorate (4)
AFRL Sensors Directorate
AFRL Space Vehicles Directorate (2)
Altera
AMD
Arctic Region Supercomputing Center (2)
Army RD&E Command
Boeing Research & Technology
GiDEL
Harris
Honeywell (2)
Intel
Lockheed Martin MFC
Lockheed Martin SSC
Lockheed Martin SVIL
Los Alamos National Laboratory (2)
Mentor Graphics
Monsanto
NASA Goddard Space Flight Center
NASA Marshall Space Flight Center
National Instruments (2)
National Security Agency (4)
Northrop-Grumman Aerospace Systems
Oak Ridge National Laboratory (2)
Office of Naval Research
Sandia National Laboratories
SEAKR Engineering
Veritomyx
Xilinx (2)
Industry Impact & Tech Transfer

12 projects spanning broad areas of RC, HPC, HPEC

Performance – optimizing speed, power, scalability, adaptability


Productivity – reducing design complexity for developers and users


Space-based processing, reliable architectures, partial reconfiguration
Industry impact

CHREC drives & influences many industry programs



Design concepts, tools, modeling, middleware, compilation, integration
Aerospace – addressing unique needs in this key community


Parallel algorithms, applications, architectures (FPGA, GPU, Manycore)
Annual surveys routinely cite millions of $ per year in industry impact
Many very close relationships between sites & members
Technology transfer to date


Dozens of industry personnel hires, dozens of internships
>115 new papers and >30 new tools crafted with/for members
6
What is Reconfigurable Computing?

General characteristics:

Architecture adapts to match unique needs of each app


Relatively new and revolutionary paradigm of computing


Limited but growing list of available devices, tools, systems, and apps
Technical advantages:

GREAT performance when app not well suited to fixed processor



e.g., FPGA; “Custom Fit” usage strategy; reconfigurable by task or app
Why? Customized hardware parallelism (width, depth), data precision
(size, format), operations and units (type, quantity), memory structure, etc.
LOWER energy consumption than fixed processors (CPU, GPU)
Technical disadvantages:


Relatively new and immature paradigm of computing
Programming complexity with adaptive hardware

Causes: inherent with novelty of approach; “newness” of field and tools
7
What is Novo-G?

Motivation



Growing computational demands in many science and
engineering domains becoming principal bottleneck
Scalable RC systems (e.g., Novo-G) uniquely capable
of both high performance and low energy, cooling, TCO
Goals: Investigate, develop, evaluate, & showcase:




Emphases


Most powerful RC machine ever fielded for research
Innovative suite of productivity tools for app development
Impactful set of scalable kernels/apps in key science areas
Performance (system), Productivity (concepts/tools), Impact (apps)
Theme

Novo-G is an RC-centric machine (not merely CPUs with accelerators!)



Features FPGA/RAM coupling (4.25 or 8.5 GB in 3 banks coupled to each FPGA)
Features FPGA/FPGA coupling (up to 8 coupled; e.g., systolic array, virtual FPGA)
CPUs and GPUs serve in supporting role (e.g., I/O, preprocessing, postprocessing)
8
Novo-G Machine
* Our cluster vendor is Ace Computers
Novo-G Annual Growth
2009: 96 top-end Stratix-III FPGAs,
each with 4.25GB SDRAM
2010: 96 more Stratix-III FPGAs,
each with 4.25GB SDRAM
2011: 96 top-end Stratix-IV FPGAs,
each with 8.5GB SDRAM
2012: 96 more Stratix-IV FPGAs,
each with 8.5GB SDRAM
1 head-node server (1U) with:
• 2 Xeon E5520 2.26 GHz quad-core CPUs
• 24GB ECC DDR3, 3 x 1TB SATA2
24 compute servers (4U), each with:
192
• Xeon E5520 quad-core CPU
FPGAs
• 6GB ECC DDR3, 250GB SATA2
• 2 GiDEL ProcStar-III PCIe x8 cards, each
with 4 Stratix-III E260 FPGAs and
4x4.25 = 17GB RAM
6 compute servers (4U), each with:
• 2 Xeon E5620 2.4GHz quad-core CPUs
• 16GB ECC DDR3, 2TB SATA2
• 4 GiDEL ProcStar-IV PCIe x8 cards, each
with 4 Stratix-IV E530 FPGAs and
4x8.5 = 32GB RAM
96
FPGAs
• GTX-480 GPU
9
Impactful Novo-G App Research: BioRC examples



Needleman-Wunsch (NW)
Optimal alg. for local alignment
of DNA and RNA sequences
Needleman-Wunsch (NW)


Smith-Waterman w/ trace-back
Smith-Waterman (SW)
with trace-back
Optimal alg. for global alignment
of DNA and RNA sequences
Novel systolic array
architecture



Complex-controller performance
with simple-controller overhead
Extendable across FPGAs using
neighbor bus
Computation of trace-back for
SW overlapped with hardware
processing of next sequence
Baseline: Database length 226 Bases v 512, length 500 Seqs
Software Runtime: 7,126 CPU hours on 2.4 GHz Opteron
Baseline: 192∙225, length 850 Sequence Comparisons
Software Runtime: 11,026 CPU∙hours on 2.4GHz Opteron
# FPGAs
Runtime (sec)
Speedup
# FPGAs
Runtime (sec)
Speedup
1
25,927
989
1
47,616
833
4
6,482
3,958
4
12,014
3,304
96
271
94,639
96
503
78,914
128
206
124,710
128
391
101,518
192 (est.)
137
187,492
192 (est.)
270
147,013
Each 3D chart (for Smith-Waterman and Needleman-Wunsch) illustrates performance of a single FPGA under varying
input conditions. Each table shows scaling performance with varying number of FPGAs under optimal input conditions.
Jaguar supercomputer @ ORNL: 224,256 cores (2.4 GHz Hexacore Opterons) @ 6.95 MWs
K Computer in Japan (largest supercomputer in world): 548,352 cores ; “uses enough electricity to
power almost 10,000 homes at a cost of about $10 million per year” (New York Times - 06/19/11)
By contrast, with 192+192 FPGAs (Summer 2012), for key BioRC
10
apps, Novo-G speedup approaching
500K cores @ <16KW
Broad Range of Novo-G App Research
Broad range of Novo-G research
 BioRC


Technology Transfer
 CHREC BLAST Toolset (Monsanto)
Smith-Waterman (w/ or w/o
traceback), Needleman-Wunsch,
Needle-Distance, Isoformic
proteomics, BLASTp
(collaboration with Boston
University), CHREC BLAST
Toolset (Novo-BLAST and BSW:
BLAST-wrapped SW)



FinRC: e.g., Barrier options using
Heston model

DSP: e.g., Information-Theoretic
approach to image segmentation



Domain exploration in other

Isotopic Pattern Calculator
(Veritomyx)

science and engineering fields
Very promising results (speed,

energy)
 50x to 5000x speedup per FPGA
vs. fast CPU core

11
Computation demand in bioinformatics
becoming prohibitive bottleneck
Novo-BLAST: accelerates BLAST’s word
matching algorithm up to 19x on single
Stratix III
BLAST-wrapped SW: Smith-Waterman
core (previous slide) with BLAST
wrapper; SSEARCH-like accuracy with
BLAST-like performance
Code transfer & field test in 1st qtr. 2012
Dominating bottleneck in proteomics app
for cancer research
Measured up to 470x speedup for
single Stratix IV FPGA
Code transfer & field test in 1st qtr. 2012
International Novo-G Forum

Founded in January 2010


International community research forum to explore performance,
productivity, and sustainability of RC at scale
Consists of 11 academic teams using common platform








Each team working on its own research apps and/or tools
Each team has one or more local Novo-G quad-FPGA boards
Remote access to big Novo-G @ Florida for large-scale runs
Boston University
Clemson University
Federal University of
Pernambuco (Brazil)
University of Florida
George Washington University
12






University of Glasgow (UK)
Imperial College (UK)
Northeastern University
University of South Carolina
University of Tennessee
Washington University in St. Louis
Conclusions and Looking Ahead

RC: revolutionary paradigm of computing


Architecture adapts to match unique needs of each app
CHREC Novo-G reconfigurable supercomputer


Most powerful RC machine ever fielded for research
World-class speedups for key apps in science and engineering
Rivaling the world’s largest conventional supercomputers
 But at a tiny fraction of their size, power, cost, and weight


Synergistic activity




Leverages private, state, and federal funding resources
Close partnership with CHREC member organizations:
 Altera, GiDEL, Monsanto, Veritomyx, et al.
Novo-G Forum: international team of 11 universities
Novo-G future: science and engineering domain exploration


New RC-amenable apps in BioRC, DSP, and FinRC
Explore new promising domains

e.g., computational chemistry, cryptanalysis
13
CORBI anyone?
Download