The Ranger Supercomputer and ots legacyx

The Ranger Supercomputer and it’s
Dan Stanzione
Texas Advanced Computing Center
The University of Texas at Austin
December 2, 2013
[email protected]
The Texas Advanced Computing Center: A
World Leader in High Performance Computing
1,000,000x performance increase in UT computing
capability in 10 years. PS Computation 1Mx – network 100x
Ranger: 62,976 Processor Cores,
123TB RAM, 579 TeraFlops, Fastest
Open Science Machine in the World,
Lonestar: 23,000 processors,
44TB RAM, Shared Mem and GPU
subsystems, #25 in the world 2011
Stampede: #7 in the world today.
Somewhere around half a million
processor cores with Intel Sandy Bridge
and Intel MIC, Dell: >10 Petaflops.
NSF Cyberinfrastructure Strategic Plan
circa 2007 – much of this never happened
NSF Cyberinfrastructure Strategic Plan
released March 2007
NSF investing in world-class computing
Articulates importance of CI overall
Chapters on computing, data, collaboration,
and workforce development
Annual “Track2” HPC systems ($30M)
Single “Track1” HPC system in 2011 ($200M)
Complementary solicitations for software,
applications, education
Software Development for CI (SDCI)
Strategic Technologies for CI (STCI)
Petascale Applications (PetaApps)
CI-Training, Education, Advancement,
Mentoring (CI-TEAM)
Cyber-enabled Discovery & Innovation
(CDI) starting in 2008: $0.75B!
First NSF Track2 System: 1/2 Petaflop
• TACC selected for first
NSF ‘Track2’ HPC
– $30M system acquisition
– Sun Constellation Cluster
– AMD Opteron processors
• Project included 4 years
operations and support
System maintenance
User support
Technology insertion
Extended to 5 years
Ranger System Summary
• Compute power - 579 Teraflops
– 3,936 Sun four-socket blades
– 15,744 AMD Opteron “Barcelona” processors
• Quad-core, 2.0 GHz, four flops/cycle (dual pipelines)
• Memory - 125 Terabytes
– 2 GB/core, 32 GB/node
– 132 GB/s aggregate bandwidth
• Disk subsystem - 1.7 Petabytes
– 72 Sun x4500 “Thumper” I/O servers, 24TB each
– ~72 GB/sec total aggregate bandwidth
– 1 PB in largest /work filesystem
• Interconnect - 10 Gbps / 2.3 sec latency
– Sun InfiniBand-based switches (2) with 3456 ports each
– Full non-blocking 7-stage Clos fabric
– Mellanox ConnectX IB cards
Ranger I/O Subsystem
Disk Object Storage Servers (OSS) based on Sun x4500 “Thumper” servers
– Each x4500:
• 48 SATA II 500GB drives (24TB total)
• running internal software RAID
• Dual Socket/Dual-Core Opterons @ 2.6 GHz
– Downside is that these nodes have PCI-X - raw I/O bandwidth can exceed a
single PCI-X 4X InfiniBand HCA
• We use dual PCI-X
– 72 Servers Total: 1.7 PB raw storage
Metadata Servers (MDS) based on Sun
Fire x4600s
MDS is Fibre-channel connected to
9TB Flexline Storage
Target Performance
– Aggregate bandwidth: 70+ GB/sec
– To largest $WORK filesystem: ~40 GB/sec
Ranger Space, Power, and Cooling
• Total Project Power: 3.4 MW
• System: 2.4 MW
– 96 racks – 82 compute, 12 support, plus 2 switches
– 116 APC In-Row cooling units
– 2,054 sqft total footprint (~4,500 sqft including PDUs)
• Cooling: ~1 MW
– In-row units fed by three 350-ton chillers (N+1)
– Enclosed hot-aisles by APC
– Supplemental 280-tons of cooling from CRAC units
• Observations:
– Space less an issue than power
– Cooling > 25kW per rack difficult
– Power distribution a challenge, almost 1,400 circuits
Interconnect Architecture
Ranger InfiniBand Topology
12x InfiniBand
3 cables combined
Who Used Ranger?
• On Ranger alone, TACC has ~6,000 users
who have run about three million simulations
over the last four years.
– UT-Austin
– UT System (through UT Research
– Texas A&M and Texas Tech (through Lonestar
– Industry (through the STAR program)
– Users from around the nation and world (through
NSF’s TeraGrid/XSEDE)
Japanese Earthquake Simulation
Simulation of the seismic wave from the
earthquake in Japan, propagating
through an earth model
Researchers using TACC’s Ranger
supercomputer have modeled the
processes responsible for continental
drift and plate tectonics in greater detail
than any previous simulation.
Modeling the propagation of seismic
waves through the earth is an essential
first step to inferring the structure of
earth's interior.
This research is led by Omar Ghattas at
The University of Texas at Austin
Studying H1N1 (“Swine Flu”)
Researchers at the
University of Illinois
and the University of
Utah used Ranger to
simulate the molecular
dynamics of antiviral
drugs interacting with
different kinds of flu.
Image produced by Brandt Westing, TACC
• They discovered how commercial medications reach the “binding
pocket” – and why Tamiflu wasn’t working on the new swine flu
• UT researcher Lauren Meyers also used Lonestar to predict the
best course of action in the event of an outbreak
Science at the Center of the Storm
Using the Ranger supercomputer at the Texas
Advanced Computing Center, National Oceanic and
Atmospheric Administration (NOAA) scientists and
their university colleagues, tracked Hurricane Ike and
Gustav during the recent storms.
The real-time, high-resolution global and mesoscale
(regional) weather predictions they produced used up
to 40,000 processing cores at once — nearly twothirds of Ranger — and included for the first time
data streamed directly from NOAA planes inside the
storm. The forecasts also took advantage of ensemble
modeling, a method of prediction that runs dozens of
simulations with slightly different starting points in
order to determine the most likely path and intensity
This new method and workflow was only possible
because of the massive parallel processing power that
TeraGrid resources can devote to complex scientific
problems and the interagency collaboration that
brought scientists, resources and infrastructure
together seamlessly.
A simulation of Hurricane Ike on TACC's
Ranger supercomputer shortly before the storm
made landfall in Galveston, Texas, on Sept. 13,
2008. Credit: NOAA; Bill Barth, John Cazes,
Greg P. Johnson, Romy Schneider and Karl
Schulz, TACC
Large Eddy Simulation of the Near-Nozzle Region
of Jets Exhausting from Chevron Nozzles
Noise from jet engines causes hearing damage in the
military and angers communities near airports. With
funding from NASA, Ali Uzun (Florida State University)
is using Ranger to simulate new exhaust designs that may
significantly reduce jet noise.
One way to minimize jet noise is to modify the turbulent
mixing process using special control devices, such as
chevrons—triangle-shaped protrusion at the end of the
nozzle. Since noise is a by-product of the turbulent
mixing of jet exhaust with ambient air, one can reduce the
noise by modifying the mixing process.
To determine how a given design would react to highspeed jet exhaust, Uzun first created a computer model of
the chevron-shaped exhaust nozzle. This was then
integrated into a parallel simulation code that calculated
the turbulence of the air as exhaust was forced through
the nozzle. Uzun’s simulations had unprecedented
resolution and detail. They proved that computational
simulations can match experimental results, while
supplying much more detailed information about minute
physical processes.
A picture depicting a two-dimensional cut through the
jet flow. The picture visualizes the turbulence in the jet
flow and the resulting noise radiation away from the jet.
Ranger Project Costs
• NSF Award: $59M
– Purchases full system, plus initial test equipment
– Includes 4 years of system maintenance
– Covers 4 years of operations and scientific support
UT Austin providing power: $1M/year
UT Austin upgraded data center infrastructure: $10-15M
TACC upgrading storage archival system: $1M
Total cost $75-80M
– Thus, system cost > $50K/operational day
– Must enable user to conduct world-class science every day!
Ranger-Era TeraGrid HPC Systems
Big Deployments Always Have
• We’ve gotten extremely good in bringing in large deployments
on time, but it is not an easy process.
Impossible to rely solely on vendors, must be a cooperative process.
• Ranger slipped several months, and was changed from the
original proposed plan:
– Original 2 phase deployment scrapped in favor of a single larger phase.
– Several “early product” design flaws detected and corrected through the
course of the project.
Cable Manufacturing Defect
Illustration of example problematic InfiniBand 12X cables as a results of kinks
imposed by the initial manufacturing process: (left) dismantled cable with
inner foil removed and (b) cracked twinax as seen through a microscope.
Ranger: Circa 2007
Ranger Lives On
• 20 Ranger cabinets have been sent to CHPC for
distribution to South African Universities
• 16 more racks have been shipped to Tanzania.
• 4 awaiting shipment to Botswana
• Other components are at Texas A&M, Baylor College of
Medicine, ARL (UT classified facility).
• Original Ranger user community now migrated to
• After a remarkably successful production run, Ranger will
continue to deliver science and educate HPC
researchers around the world.
Ongoing Partnerships
• We at TACC are eager to use Ranger as a
basis for building sustained and meaningful
• Hardware is a start (and there is always the
*next* system) but training, staff
development, data sharing, etc. provide new
opportunities as well.
Related flashcards

Physical geography

20 cards


20 cards


28 cards

Create Flashcards