SEMCO SIG-Computing:
Compute Clusters—Building Blocks of the
Public Cloud
Innovation Intelligence®
Jeff Marraccini, Vice President, Computer Systems
jeff@altair.com
14-Dec-2014
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Overview of today’s talk
• Why clusters?
• Some history
• “Private cloud” clusters
• Architecture
• Failures
• The Virtual Machine era
• “Public cloud” clusters
• Facebook and the Open Data Center
• Appliance Computing
• Resources to learn more
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Why clusters? And what’s the big deal?
• Mainframe costs
• Microcomputer performance and Moore’s Law
• Networking + computers + “cluster stack” = often vast power
• What do we do with these 3-5 year old computers on a 7-10 year budget
cycle?
• Sony PlayStations and Apple Xserves 
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Universities, government agencies, companies,
and basements near you…
• They got us started…
•
NASA (you may be using an Ethernet driver based on their work!)
•
NSA fed back scalability ideas (!!)
•
Universities world wide – open source contributions
•
Military projects
•
The Beowulf Project
•
Basement clusters
•
Renderman users
•
LucasFilm
•
MASSIVE (Peter Jackson/WETA Digital!)
•
Older operating systems: Digital VAX/VMS & OpenVMS, Some UNIX, Microsoft
Windows Server Clusters
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
What do they do?
• Scientific and engineering computing – the start of it all
• Render farms – special effects for movies, TV, commercials, games, live
TV and sports overlays…
• Media conversion (YouTube!)
• Web services
• E-Mail
• Databases, “Big Data”
• Storage
• Building and testing software!
• Social media (combining a lot of the above)
• Cracking passwords, encryption
• Neural networks / expert systems / IBM WATSON
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Some of the largest clusters are…
• 10’s-100’s of thousands of cores
•
NSA (probably), along with other governments’ security arms
•
Other classified installations
•
CERN
•
TOP 500 Supercomputers
•
Public clouds (Google, Amazon, Microsoft, Rackspace, IBM, others)
• 1’s-10’s of thousands of cores
•
Square Kilometer Array (Australia / South Africa, just got back from there)
•
Weather forecasting
•
Japan’s Earth project (early 2000’s)
•
Government Labs
•
Large organizations (corporate, universities, “smaller” public cloud providers)
• Small businesses often have dozens to hundreds of cores, and may not
realize it if leasing public cloud services!
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Software development for clusters & architecture
• Message passing (MPI), can achieve huge scales
• Shared memory with proprietary interconnect (Some Cray, NEC, SGI
Altix)
• Single instance (Beowulf, OpenMOSIX, some Cray, NEC, SGI Altix)
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
“Private Cloud”
• Internal use clusters
• Sometimes accessible via remote access, Virtual Private Networks
• “Secret sauce” behind internal tools
• Requires a forging of networking, storage, and computing teams
• Oracle 10g databases often first exposure
• Scalable internal storage (EMC Isilon, ExaGrid, HP 3PAR, etc.)
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Altair’s Internal Clusters
• We use PBS Professional for all (it’s our product!)
• HyperWorks Unlimited – “cluster in a box” – several around the world,
hundreds to 2048 cores
• Legacy “E-Compute” & Compute Manager (newer) – several clusters of
a couple hundred cores each
• HyperWorks build/QA – several hundred cores, Windows, Linux, Mac,
Michigan and India, 50+ compilations and thousands of tests daily
• Test clusters – 128-256 cores, often restaged
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
High Availability Private Cluster Block Diagram
Firewall
Head Nodes
• Protects often unpatched cluster
software and firmware
• Load balancer
• Remote access
• 1
• 2
• Authentication, Scheduling,
Staging, Reloading, Push
notifications, Periodic
Checkpointing
Switch Fabrics
Execution Nodes
• 1
• 2
• Infiniband, Myrinet,
1/10/40/100GB Ethernet,
Proprietary (Cray!)
• 1…N
• Local storage
Shared Storage Pools
• Staging
• Checkpoints
256 core SuperMicro TwinBlade
chassis w/ 100TB storage, QDR
InfiniBand, no HA
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
A regular cluster (or a basement one!)
Head Node
Cluster fabric(s)
• Authentication, Scheduling,
Staging, (Reloading, Push
notifications, Periodic
Checkpointing)
• Ethernet switch
• Infiniband switch
• Storage Area Network
Execution Nodes
Shared Storage Pools
• 1 … N (could be varying
hardware)
• Local storage (maybe!)
• Staging
• Checkpoints (maybe!)
• Could be FreeNAS, NFS
over ZFS, Lustre…
Could well ALL be
running on a single
virtual machine
hypervisor!
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
The Fabric – cluster scaling and speed
• Infiniband
• MyriNet
• PCIe
• Ethernet
• Proprietary (CrayLink and others)
• Virtual network switches
256 core SGI half rack, QDR InfiniBand, Nvidia GPU’s, no HA
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Storage
Varying needs = varying capacities (CFD, “crash”, chemistry,
atmospherics, password cracking…)
Cluster storage is HARD, especially scale out
Reliability - High availability often is more than 2X the cost
Local storage limits (blades)
Spinning it down = complex
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Management
• Staging nodes – potentially thousands
• Herding cats = scheduling different user communities’ requirements
• Failures and recovery
• Staging jobs in/out
• Push notifications
• Portals
• Continuous resource monitoring
• Checkpointing
• Energy efficiency
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
When it breaks
• Nodes will fail
• We have hardware failures every week, bigger clusters may have hourly
failures or even more
• Checkpointing = costly in storage and processing time
• Restoring from a checkpoint
• Restaging
• Job migration
• Jeff’s “I meant to type a 11 and typed 1” glitch
• The dreaded faulty Infiniband cable
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
The Virtual Machine Cluster
• Great way to demo cluster software
• SIMH & OpenVMS (Jeff’s VMS cluster on a Surface Pro 2 tablet)
• Virtual network switches
• “Pull” the virtual network cable
• Test your upgrades
• Learn without spending $50,000
• Hypervisors add I/O latency 
• Fabric support limited
• = Scalability limited
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
The Virtual Machine Cluster
• Great way to demo cluster software
• SIMH & OpenVMS (Jeff’s VMS cluster on a Surface Pro 2 tablet)
• Virtual network switches
• “Pull” the virtual network cable
• Test your upgrades
• Learn without spending $50,000
• Hypervisors add I/O latency 
• Fabric support limited
• = Scalability limited
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
“I’m out of oomph” -> BURSTING
• “Promise” of the Public Cloud
• Credit card financed computing
• Possibly loosely coupled
• Fabric compromises
• Getting better!
Internal Cluster
VPN to Amazon
AWS/Microsoft
Azure
Cloud Execution
Nodes
Cloud storage
Cloud fabric
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Spread out clusters
• May be in the “Public Cloud” or at multiple “Private Cloud” sites
(research centers, remote data centers, leased private capacity)
• Redundancy – Google File System, etc. quickly copy object data and
store archival copies, etc.
• Scalability
• Lots of “dark fiber” available for leasing
• Latency sensitivity
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Facebook and Open Compute Project
• Mainly useful for big organizations
• Power efficiency, reduce impact
•
Shared power supplies
•
Optimized cooling
•
Storage & node spin-down
• Designed to fail and be easily serviceable
• Quick upgrades
• Scalability beyond conventional designs
• Might slow down commodity server price drops, volume decreasing
• http://www.opencompute.org/
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Appliances and Platform as a Service (PaaS)
• “Cluster in a box” (well, racks!)
• Bursting
• Project-based computing
• Nimble
• Geek skills embedded
• Easy portal / front ends
Copyright © 2014 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Where do we go from here?
• Public library access to Lynda.com – Amazon AWS & Microsoft Azure
“Up and Running” courses
• Install SIMH and set up a hobbyist OpenVMS cluster!
https://vanalboom.org/node/18
• OpenStack on virtual machines: http://www.openstack.org/ and
http://docs.openstack.org/developer/devstack/#quick-start
• Example appliance: http://www.altair.com/hwul/
• PBS Professional, IBM LSF, GridEngine, other cluster mgmt. software
• Lustre storage free software - http://wiki.lustre.org/
• Aside from security, the ability to build and maintain private and public
cluster systems are near the top of the pay scale in IT!