Parallel Computing in Chemistry

advertisement
Parallel Computing in Chemistry
Brian W. Hopkins
Mississippi Center for Supercomputing Research
4 September 2008
What We’re Doing Here
• Define some common terms of parallel and
high-performance computing.
• Discuss HPC concepts and how the
relevance to chemistry thereof.
• Discuss the particular needs of various
computational chemistry applications and
methodologies.
• Briefly introduce the systems and applications
in use MCSR.
Why We’re Doing It
• Increasing your research throughput
– More Data
– Less Time
– More Papers
– More Grants
– &c.
• Stop me if there are questions!
Some Common Terms
• Processors: x86, ia64, em64t, &c.
• Architectures: distributed vs. shared
memory.
• Parallelism: message-passing vs.
shared-memory parallelism.
Processor Types: Basics
• Integer bit length: the number of individual binary “bits”
used to store an integer in memory.
– 32-bit systems: 232 bits define an integer
• Signed: -2,147,483,648 to +2,147,483,647
• Unsigned: 0 to +4,294,967,295
– 64-bit systems: 264 bits define an integer
• Signed: −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807
• Unsigned: 0 to +18,446,744,073,709,551,615
• Because integers are used for all kinds of stuff, integer
bit length is an important constraint on what a
processor can do.
Processor Types: In Use Today
• x86: Pentium, &c.
– The most common microprocessor family in history.
– Technically includes 16- and 64-bit processors, but “x86” is
most commonly used to describe 32-bit systems.
• em64t: Athlon, Opteron, Xeon, Core 2
– A 64-bit extension to the x86 family of processors.
– Sometimes referred to as x86-64 or amd64.
• ia64: Itanium, Itanium2
– A different, non-x86 compatible 64-bit architecture from Intel.
• mips: R12000, R14000
– 64-bit RISC-type processors used in SGI supercomputers
SUPER-computing
• Modern supercomputers typically consist of a
very large number of off-the-shelf
microprocessors hooked together in one or
more ways.
– clusters
– symmetric multiprocessors
• All commonly used processor types are in use
in current “super”-computers.
Supercomputers: Clusters
• The easiest and cheapest way to build a
supercomputer is to hook many individual computers
together with network cables.
• These supercomputers tend to have “distributed
memory”, meaning each processor principally works
with a memory bank located in in the case with it.
Clusters: The Interconnect
• With a cluster-type supercomputer, the performance of
the network connecting the nodes is critically
important.
• Network fabrics can be optimized for either latency or
bandwidth, but not both.
• Due to the special needs of high-performance
clusters, a number of special networking technologies
have been developed for these systems
– hardware and software together
– Myrinet, Infiniband, &c.
Supercomputers: SMP Systems
• Alternatively, it is possible to custom-build a computer
case and circuitry to hold a large number of
processors.
• These systems are distinguished by having “shared”
memory, meaning that all or many of the processors
can access a huge shared pool of memory.
Supercomputers: Hybrid Systems
• Many modern supercomputers consist of a high-speed
cluster of SMP machines.
• The systems contain large numbers of distributedmemory nodes, each of which has a set of processors
using shared memory.
Parallel Applications
• Each type of supercomputer architecture
presents its own unique programming
challenges.
• Cluster systems require parallel programs to
call special message-passing routines to
communicate information between nodes.
• Symmetric multiprocessing systems require
special care to prevent each spawned thread
from altering or deleting data that is needed by
the other threads.
• Hybrid systems typically require both of these.
Parallel Programming for Clusters
• The nodes in a cluster talk to each other via a
network connection.
• Because the network connection is a
bottleneck, it is important to minimize the
internode communication required by the
program.
• Most commonly, programs use a messagepassing library for communication between
nodes.
Message Passing
• Cluster computing is extremely popular.
• To streamline programming for these
architectures, various libraries of standard
message passing functions have been
developed.
– MPI
– TCGMSG
– Linda, &c.
• These vary in portability and flexibility.
The Ubiquitous MPI
• The most popular message passing library is the
message passing interface (MPI).
– AMBER
• Most companies marketing supercomputers also
market their own versions of MPI specially tuned to
maximize performance of the systems they sell.
– SGI’s MPT
– IBM MPI
– HP-MPI
• In principle, MPI functions are standardized, so any
program built to work with Intel MPI should also work
with HP-MPI. In practice…not so much.
• In addition, an open-source, portable MPI is available
through the MPICH project.
TCGMSG
• The TCGMSG library is a stripped-down message
passing library designed specifically for the needs of
computational chemistry applications.
– NWChem
– Molpro
• TCGMSG is generally more efficient than MPI for the
specific internode communication operations most
common in comp-chem programs
• However, from a programmer’s point of view
TCGMSG is less flexible and capable than MPI.
• There is some chatter about TCGMSG falling into
disuse and becoming a legacy library.
A Word On Linda et. al.
• Some software vendors want extra money to use their
software on a cluster rather than a single machine.
• The surest way to get that money is to build your very
own MP library, build the parallel version of your code
to work only with it, and then sell the MP library for
extra $$.
• Hence, Linda and similar MP interfaces.
• The nature of these libraries is such that you’re
unlikely to need to code with them, and can expect
system administrators to build and link them as
needed.
Programming for SMP Systems
• There are different approaches to programming
for these machines
– Multithreaded programs
– Message passing
• The most common approach to SMP
programming is the OpenMP Interface
– Watch out! “MP” here stands for “multiprocessing,”
not “message passing.”
Programming for Hybrid Systems
• Hybrid systems have characteristics of both clusters
and SMP systems.
• The most efficient programs for hybrid systems have
hybrid design.
– multithreaded or OpenMP parallelism within a node
– message-passing parallelism between nodes
• Because hybrid systems are rapidly increasing in
popularity, true hybrid programming remains very rare.
• Consequently it’s common for coders working on
hybrid systems to use a pure message-passing
approach.
Parallel Efficiency
•
•
Whenever you use multiple procs instead of one,
you incur two problems:
–
Parts of a calculation cannot be parallelized.
–
The processors must perform extra communication tasks.
As a result, most calculations will require more
resources to do in parallel than in serial.
Practical HPC, or, HPC and You
• Computational Chemistry applications fall into
four broad categories:
–
–
–
–
Quantum Chemistry
Molecular Simulation
Data Analysis
Visualization
• Each of these categories presents its own
challenges.
Quantum Chemistry
• Lots of Q-Chem at UM and around the state
– UM: Tschumper, Davis, Doerksen, others
– MSU: Saebo, Gwaltney
– JSU: Leszczynski and many others.
• Quantum programs are the biggest consumer of
resources at MCSR by far:
– Redwood: 99% (98 of 99 jobs)
– Mimosa: 100% (86 of 86 jobs)
– Sweetgum: 100% (24 of 24 jobs)
Features of QC Programs
• Very memory intensive
– Limits of 32-bit systems will show
– Limits on total memory in a box will also show
• Very cycle intensive
– Clock speed of procs is very important
– Fastest procs at MCSR are in mimosa; use them!
• Ideally not very disk intensive
– Watch out! If there’s a memory shortage, QC programs will
start building read/write files.
– I/O will absolutely kill you.
– To the extent that I/O cannot be avoided, do NOT do it over
a network.
Project-Level Parallelism
• Quantum projects typically many dozens to
thousands of individual calculations.
• Typically, the most efficient way to finish 100
jobs on 100 available processors is to run 100
1-proc jobs simultaneously.
• Total wall time needed increases with every
increase in individual job parallelism:
– 100x1 < 50x2 < 25x4 < 10x10 < 5x20 < 1x100
When to Parallelize a Job
• Some jobs simply will not run in serial in any
useful amount of time.
• These jobs should be parallelized as little as
possible.
• In addition, jobs that require extensive
parallelization should be run using special highperformance QC programs
Implications for Gaussian 03
• Do not allow G03 to choose its own calculation
algorithms.
– MP2(fulldirect), SCF(direct)
• Jobs that will run on mimosa ought to.
• Jobs that will run in serial ought to.
• Use local disk for unavoidable I/O.
– copy files back en masse after a job.
• For truly intensive jobs, consider moving to another
program suite.
– We’re here to help!
Molecular Simulation
• Simulation is not as common at MCSR as quantum
chemistry.
• Still, some simulation packages are available
–
–
–
–
AMBER
NWChem
CPMD
CP2K
• And some scientists here do use them
– MedChem, Wadkins, &c.
Features of MD Programs
• At their core, MD programs perform a single, simple
calculation millions of times.
• As a result, MD programs consume a LOT of clock
cycles.
• These programs must constantly write large molecular
configurations to output, and so are often I/O
restricted.
• Memory load from ordinary MD programs is trivial.
• Communication load and be quite high: long-range
ES.
• Almost all MD programs are MPI-enabled.
Parallelizing Simulations
• MD simulations generally parallelize much more
readily than QC jobs.
• Also, individual MD sims tend to take much longer
than individual QC jobs: 1 sim vs. 100s of EPs.
• As a result, simulations are almost always run in
parallel.
Notes on Parallel Simulations
• Scale better with increasing size
• Highly program- and methods-dependent
• Because jobs are long, pre-production profiling is
essential.
• Help is available!
Data Analysis
• Computational chemists typically produce our
own data analysis programs.
• These can be constructed to run in parallel
using MPI, OpenMP, or both.
• For you data analysis needs, compilers and
libraries are available on all of our systems.
• MCR Consultants will work with you to help
build and debug parallel analysis codes.
Visualization
• We have some molecular visualization
software available and will help you with
it.
• However, one of the best programs is
free and will usually run better locally.
– VMD
MCSR Systems: A Snapshot
•
•
•
•
Redwood
– SGI Altix SMP system
– Fast, 64-bit IA64 processors
– Very busy = long queue times
Mimosa
– Linux Cluster w/ distributed memory
– Very fast 32-bit x86 processors
– Less busy = shorter queue times
Sweetgum
– SGI Origin SMP system
– Slow 64-bit MIPS processors
– Mostly unused = no queue time at all
Sequoia
– SGI Altix XE Hybrid cluster
– Very fast, multicore x86-64 procesors
– Not open for production yet, but soon
Seminars to Come
• Chemistry in Parallel Computing
• Building Computational Chemistry
Applications
• Using Gaussian 03
• Using AMBER
• Using NWChem
• Whatever you need!
Packages available at MCSR
• Gaussian (our most popular)
• NWChem (flexible; better for parallel jobs)
• MOLPRO (high-tech code for high-accuracy
ES calculations)
• MPQC (very scalable code with limited
methodologies
• AMBER (popular MD code)
• CPMD, CP2K (ab initio MD codes for
specialists)
• Anything you need (within budget)
Download