New presentation - University of Calgary

advertisement
Clusters in Molecular
Sciences Applications
Serguei Patchkovskii@#, Rochus Schmid@,
Tom Ziegler@,
Siu Pang Chan#, Andrew McCormack#, Roger
Rousseau#, Ian Skanes#
@Department
#Theory
of Chemistry, University of Calgary, 2500 University Dr. NW,
Calgary, Alberta, T2N 1N4 Canada
and Computation Group, SIMS, NRC, 100 Sussex Dr., Ottawa,
Ontario, K1A 0R6
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 1
Overview
• Beowulf-style clusters entered mainstream
• Are clusters a lasting, efficient investment?
• Odysseus: an internal cluster at the SIMS
theory group
• Clusters in molecular science applications:
software availability and performance
• Three war stories, and a cautionary message
• Summary and conclusions
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 2
Shared, Academic Clusters in
Canada
Location
CPUs
URL of other info
Carleton U.
8xPII-400
www.scs.carleton.ca/~gis/
UBC
256xPIII-1000
www.gdcfd.ubc.ca/Monster
U of Calgary
179xAlpha
www.maci-cluster.ucalgary.ca
U of Western Ontario
144xAlpha
GreatWhite.sharcnet.ca
U of Western Ontario
48xAlpha
DeepPurple.sharcnet.ca
McMaster U
106xAlpha
Idra.physics.mcmaster.ca
U of Guelph
120xAlpha
Hammerhead.uoguelph.ca
U of Wundsor
8xAlpha
Winfrid Laurier U
8xAlpha
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 3
Canadian top-500 facilities
Cluster
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 4
Internal, “workhorse” clusters
Location
CPUs
URL or other
U of Alberta
98xPIII-450
www.phys.ualberta.ca/THOR
U of Calgary
94x21164-500
www.cobalt.chem.ucalgary.ca
U of Calgary
120xPIII-1000
www.ucalgary.ca/~tieleman/elk.html
U of Calgary
32xPIII
Memorial U
32xPII-300
weland.esd.mun.ca
MDS Proteomics
400xPIII-1000
www.mdsproteomics.com
ICPET, NRC
80xPIII-800
DRAO, NRC
16xPII-450
SIMS, NRC
32xPIII-933
Samuel Lunenfeld Research Institute
224xPIII-450
Sherbrooke U
64xPII-400
U of Saskatchewan
12xAthlon-800
Sasquatch.usask.ca
Simon Frazer U
16xPIII-500
www.sfu.ca/acs/cluster/
U of Victoria
39xPIII-450
Pingu.phys.uvic.ca/muse/ (?)
McMaster U
32xPIII-700
www.cim.mcgill.ca/~cvr/beowulf/
CERCA, Montreal
16xAthlon-1200
www.cerca.umontreal.ca/~fourmano/
U of Western Ontario
various
www.baldric.uwo.ca
Bioinfo.mshri.on.ca/yac/
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 5
Clusters are everywhere
Lemma 1: A computationally-intensive research group
in Canada can be in one of the three states:
a) It owns a cluster, or
b) It builds a cluster, or
c) It plans building a cluster RSN
Clusters became a mainstream research tool – useful,
but not automatically worthy of a separate mention
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 6
Cobalt: Hardware
World
100BaseTx
Node 1
(half-duplex)
Switch
93x100BaseTx
Computers on benches
all linked together
Node 93
2x100BaseTx
128Mb memory
18Gbytes RAID-1 (4 spindles)
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 7
Cobalt: Nodes and Network
Digital/Compaq Personal Workstation 500au.
CPU
Cache
Peak flops
SpecInt 95
SpecFP 95
Alpha 21164A, 500 MHz
96Kb on-chip (L1 and L2)
109 Flop/second
15.7 (estimate)
19.5 (estimate)
4 x 3COM SuperStack II 3300
Peak aggregate b/w
Peak internode b/w (TCP)
NFS read/write
Round-trip (TCP)
Round-trip (UDP)
500.0 MB/s
11.2 MB/s
3.4/4.1 MB/s
360 μs
354 μs
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 8
Cobalt: Software
OS, communications, and cluster management:
Base OS: Tru64, using DMS, NIS, and NFS
Compilers: Digital/Compaq C, C++, Fortran
Communications: PVM, MPICH
Batch queuing: DQS
Application software:
ADF: Amsterdam Density Functional (PVM)
PAW: Projector-Augmented Wave (MPI)
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 9
Cobalt: Return on the Investment
Investment:
Dollars
Total cost
… including:
Initial purchase
390,800
346,000
Total publications
… including:
Organometallics
15,800
24,000
J. Am. Chem. Soc.
J. Phys. Chem.
J. Chem. Phys.
Operating (’98-’01)
power (6¢/kWh)
admin (20% PDF)
spare parts
Payback: Research Articles
5,000
Inorg. Chem.
92
21
12
11
10
6
ROI: 1 publication / $4,250
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 10
Odysseus: Low-tech solution for
high-tech problems
1
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 11
Odysseus: Low-tech solution for
high-tech problems
2
Nodes (16+1)
ABIT VP6 motherboard
2xPIII-933, 133MHz FSB
4x256Mbytes RAM
3COM 3C905C
36Gb 7200rpm IDE
… plus, on the front end:
Intel PRO/1000
Adaptec AHA-2940UW
60Gb 7200rpm IDE
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 12
Odysseus: Low-tech solution for
high-tech problems
3
Network: SCI + 100Mbit
Dolphin D339 (2D SCI)
H ring
V ring
HP Procurve 2524 + 1Gig
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 13
Odysseus: Low-tech solution for
high-tech problems
4
Backup unit:
VXAtape (www.ecrix.com)
35Gbytes/cartridge (physical)
TreeFrog autoloader
(www.spectralogic.com)
16 cartridge capacity
UPS Unit:
Powerware 5119
2880VA
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 14
Odysseus: Low-tech solution for
high-tech problems
5
Four little wheels
Odysseus at a glance
Processors:
32 (+2)
Memory:
16Gbytes
Disk:
636Gbytes
Peak flops:
29.9GFlops/sec
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 15
Odysseus: cost overview
Expense
dollars
Nodes
40,640
SCI network (cards & cables)
26,771
Backup unit (tape+robot)
5,860
Spare parts in stock
5,024
Ethernet (switch, cables, and head node link)
4,190
Compiler (PGI)
3,780
UPS
2,265
Backup tapes (16+1)
1,911
Total:
90,441
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 16
Clusters in molecular science –
software availability
•
•
•
•
•
Gaussian
Turbomole
GAMESS
NWChem
GROMOS
•
•
•
•
•
•
•
ADF
PAW
CPMD
AMBER
VASP
PWSCF
ABINIT
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 17
Software: ADF
ADF – Amsterdam Density
Functional (www.scm.com)
Speedup
Example: Cr(N)Porph
Number of Cobalt nodes
Full geometry optimization
38 atoms
580 basis functions
C4v symmetry
45Mbytes of memory
Serial time: 683 minutes
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 18
Software: PAW
PAW – “Projector-Augmented Wave”
Speedup
(www.pt.tu-clausthal.de/~ptpb/PAW/pawmain.html)
Cobalt Nodes
Example: SN2 reaction
CH3I + [Rh(CO)2I2]11Å unit cell
Serial time per step: 83 seconds
Memory: 231Mbytes
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 19
Software: CPMD
CPMD – Car-Parinello Molecular Dynamic
(www.mpi-stuttgart.mpg.de/parinello/)
Example: H in Si64
odysseus
65 atoms, periodic
40Ryd cut-off
Geometry opt (2 steps) +
free MD (70 steps)
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 20
Software: AMBER
AMBER – “Assisted Model
Building with Energy
Refinement”
(www.amber.ucsf.edu/amber/)
Time (hour)
Example:
22-residue polypeptide+4K+
+2500 H2O
1ns MD
Ncpu
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 21
Software: VASP
VASP – Vienna Ab-initio Simulation Package
(cms.mpi.univie.ac.at/vasp/)
odysseus
Example: Li198
1000GPa
300 eV cutoff
9 K-points
10 WF optimization steps
+ stress tensor
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 22
Software: PWSCF
PWSCF and PHONON – Plane wave pseudopotential codes,
optimized for phonon spectra calculations (www.pwscf.org/)
odysseus
Example: MgB2 solid
Geometry opt.
40 Ryd cut-off
60 K-points
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 23
Software: ABINIT
ABINIT (www.mapr.ucl.ac.be/ABINIT/)
Example:
SiO2 (stishovite)
70Ryd cut-off
6 K-points
12 SCF iterations
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 24
War Story #1
Odysseus hardware maintenance log, Oct 19, 2001:
Overnight, node 6 had a kernel OOPS … it responds to
network pings and keyboard, but no new processes can be
started …
Reason:
Heat sink on CPU#1 became loose, resulting
in overheating under heavy load.
Resolution:
Reinstall the heat sink
Detected by:
Elevated temperature readings for the
CPU#1 (lm_sensors)
Downtime:
20 minutes (the affected node)
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 25
War Story #2
Odysseus hardware maintenance log, Nov 12, 2001:
A large, 16-CPU VASP job fails with “LAPACK: Routine
ZPOTRF failed”, or random total energy
Reason:
DIMM in bank #0 on node 17 developed a singlebit failure at the address 0xfd9f0c
Resolution:
Replace memory module in bank #0
Detected by:
Rerunning failing job with different sets of nodes,
followed by the memory diagnostic on the affected
node (memtest32)
Downtime:
1 day (the whole cluster) + 2 days (the affected node)
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 26
War Story #3
Odysseus hardware maintenance log, Dec 10, 2001:
Apparently random application failures are observed
Reason:
Multiple single-bit memory
failures, on the nodes (bank #):
6 (#2), 7 (#2,#3), 8 (#0),
10 (#0), 11 (#0)
Resolution:
Replace memory modules
Detected by:
Cluster-wide memory diagnostic (memtest32)
Downtime:
3 days (the whole cluster)
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 27
Cautionary Note
• Using inexpensive, consumer-grade hardware
potentially exposes you to low-quality components
• Never use components which have no built-in
hardware monitoring and error detection capability
• Always configure your clusters to report corrected
errors and out-of-range hardware sensors readings.
• Act on the early warnings
• Otherwise, you run a risk of producing garbage
science, and never knowing it
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 28
Hardware Monitoring with Linux
Category
Parameter
Package
Motherboard
Temperature; Power supply
voltage; Fan status
lm_sensors#
Hard drives
Corrected error counts;
Impending failure indicators
ide-smart$
S.M.A.R.T. Suite%
Memory
Corrected error counts
ecc.o^
Network
Hardware-dependent
#
http://www2.lm-sensors.nu/~lm78/
$ http://www.linux-ide.org/smart.html
% http://csl.cse.ucsc.edu/smart.shtml
^ http://www.anime.net/~goemon/linux-ecc/ (2.2 kernels only)
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 29
Summary and Conclusions
• Clusters are no longer a techno-geek’s toy, and will
remain the primary workhorse of many research
groups, at least for a while
• Clusters give an impressive return on the investment,
and may remain useful longer than expected
• Many (most?) useful research codes in molecular
sciences are readily available on clusters
• Configuring and operating PC clusters can be tricky.
Consider a reputable system integrator with Beowulf
hardware and software experience
“Clusters in Molecular Sciences Applications”, 2nd Annual iHPC Cluster Workshop, Ottawa Jan 11, 2002. p. 30
Download