High Performance Cluster (HPC)

advertisement
High Performance Cluster (HPC)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
The Emory Life Physical Sciences cluster (ELLIPSE), a 256 node, 1024 CPU, highperformance Sun computing cluster is the latest arrival in the Emory High Performance
Compute Cluster (EHPCC), a subscription-based, shared resource for the University
community that is managed by the High Performance Computing (HPC) Group.
The addition of ELLIPSE to existing campus computational resources, such as those at
the Biomolecular Computing Resource (BIMCORE) and Cherry Emerson Center for
Scientific Computation, moves the University into the top tier of institutions for
conducting computational investigations in neural simulation, genomic comparison,
biological sequence analysis, statistical research, algorithm research and development,
and numerical analysis.
Computing clusters provide a reasonably inexpensive method to aggregate computing
power and dramatically cut the time needed to find answers in research that requires the
analysis of vast amounts of data. Eight and one-half hours of ELLIPSE operation is
equivalent to an entire year of 24-hour days on a fast desktop, and four to five days is
equivalent to two months on its 128 CPU predecessor, which is still in service.
Additional Information:
Keven Haynes, khaynes@emory.edu
HPC FAQs:
What does High Performance Computing (HPC) mean?






Computing used for scientific research
Performs highly calculation-intensive tasks (e.g., weather forecasting, molecular
modeling, string matching)
A large collection of computers, connected via high speed network or fabric
Uses multiple CPUs to distribute computational load, aggregate Input/Output
Computation runs in parallel
Work managed via a “job scheduler”
HPC Cluster






256 dual-core, dual-socket AMD Opteron-based compute nodes - 1024 cores total
8 GB RAM/node, 2 GB RAM/core
250 GB local storage per node
~ 8 TB global storage (parallel file system)
Gigabit Ethernet, with separate management network
11 additional servers
Cluster Nodes






256 Sun x2200s
AMD Opteron 2218 processors
CentOS 4 Linux (whitebox Red Hat)
8 GB DDR2 RAM, except “Fat” Nodes with 32 GB RAM, local 250 GB SATA
drive
Single gigabit data connection to switch
Global file system (IBRIX) mounted
Networking




Separate Data and Management networks
Data Network: Foundry BI-RX 16
Management network: 9 Foundry stackables
MRV console switches
Storage



Global, parallel file system: IBRIX
Sun StorEdge 6140, five trays of 16 15Krpm FC drives, connected via 4 GB fibre
connections.
Five Sun x4100 file-system servers: one IBRIX Fusion Mgr, four Segment servers
w/four bonded ethernet connections.
IBRIX file system





Looks like an ext3 file system, because it is (not NFS 4) - Segmenting ext3.
Scales (horizontally) to thousands of servers, hundreds of petabytes
Efficient with both small and large I/O
Partial online operation, dynamic load balancing
Will run on any Linux hardware
The scheduler




Users submit work to cluster via SGE (‘qsub’ command) and ssh
SGE can manage up to 200,000 job submissions
Distributed Resource Management (DRM)
Policy-based resource allocation algorithms (queues)
Applications




MATLAB
Geant4
Genesis (Neuroscience)
Supported Soon:
o iNquiry (BioInformatics)
o Gcc compilers (soon: PGI compilers)
Performance




Estimated ~3 Teraflops at 80% efficiency (theoretical)
Achieved 2 GB/sec writes over the network
10 minutes of cluster operation = ~7 days on a fast desktop
8.5 hours -> entire year of 24-hour days
Rate Structure
The "long.q" cluster queue ($.025/CPU hour)
 Currently spans all compute nodes
 All users can access this queue
 No hard or soft job/time limits
 1 job slot per compute node at this time
 All jobs are run with a Unix NICE level of +15
 Queue suspend threshold of NP_LOAD_AVG=1.75
 Queue instance will close/subordinate if there are 1 or more express jobs
running
 Queue instance will close/subordinate if there are 2 or more general jobs
running
The "all.q" cluster queue ($.03/CPU hour)
 This is the default or "general" queue
 Spans all compute nodes





No access control list
Queue suspend threshold of NP_LOAD_AVG=1.75
4 job slots per compute node at this time
12 hour hard limit (h_cpu = 43200) on all jobs
Queue instance will close/subordinate if there is 1 or more express job running
The "express.q" cluster queue ($ .05/CPU hour)
 Spans all compute nodes
 Has an access control list (currently only expressUsers user list can access)
 Queue suspend threshold of NONE
 2 job slots per compute node at this time hour hard limit (h_cpu = 7200) on all
jobs
 Not subordinate to any other queue
Additional Resources
BIMCORE
http://www.bimcore.emory.edu/
Contact: Steve Pittard, wsp@emory.edu
Cherry Emerson Center for Scientific Computing
http://www.emerson.emory.edu/
Contact:
Download