TEN GOOD PRACTICES IN SCIENTIFIC PROGRAMMING

advertisement
.
SCIENTIFIC PROGRAMMING
Editor: Paul F. Dubois, dubois1@llnl.gov
TEN GOOD PRACTICES IN SCIENTIFIC PROGRAMMING
O
VER THE LAST FIVE YEARS IN COMPUTERS IN PHYSICS, “SCIENTIFIC PROGRAMMING” HAS COVERED A WIDE VARIETY
OF TOOLS AND TECHNIQUES FOR WRITING SCIENTIFIC PROGRAMS.
With your help, I hope to cover an
even wider spectrum in the future, as is
appropriate to our new readership. As
always, I’ll focus on giving you practical information that will help you do
your job.
The following “Ten Good Practices” distill some of the ideas from
past articles and from my own experience. I hope this list will give new
readers an idea of the kinds of subjects
the department has covered in the
past, while reminding old readers of
topics they meant to learn more about.
#1: Organize for change
Center your programming decisions
around the problem of dealing with
change. Change is the hallmark of scientific programming—changes in the
hardware, models, user interface, operating system, and libraries you use.
To illustrate, the large Fortran application I support is more than 20 years
old, yet we make at least 75 significant
changes each year. It has survived
changes in host computers, operating
systems, languages, user interfaces, and
personnel.
Programming decisions can reflect
this commitment to change: use only
standard language and library features,
never a proprietary feature or preprocessor. Use techniques for portable
precision and mixed-language computing. Encapsulate library calls to
MPI or graphics so that you do not
JANUARY–FEBRUARY 1999
have information that must be changed
in many different places when fashion
switches direction. I discuss a methodology for dealing with change in Object
Technology for Scientific Programming
(Prentice-Hall, 1997).
The object-oriented programming
revolution has swept over most areas of
computing precisely because it gives
people a better way of dealing with
change. If you haven’t learned about
OOP, I recommend Bertrand Meyer’s
Object-Oriented Software Construction
(Prentice Hall, 1997). OOP is worth
learning even if you remain committed
to Fortran.
#2: Write a script instead of a
compiled program
When possible, write in an interactive
scripting language instead of a compiled
language. You get immediate feedback
on your algorithm, and the language
usually incorporates sophisticated data
structures such as associative arrays.
The “Big Three” general-purpose
scripting languages are Perl,1,2 Tcl,3,4
and Python.5,6 There are commercial
interpreters for numerical calculations,
including Matlab and IDL, and some
free products.
My favorite choice is Python—a
free, small, easy-to-learn, object-oriented interpreter (http://www.python.
org). It is especially easy to extend with
compiled code.7–9 A numerical package10 adds array calculations with near-
compiled speed, because only the decision to do the operation takes place in
the interpreter, while the compiled
routines do the actual work. There is
an interface to Tcl’s Tk toolkit for creating portable GUIs.
Of the Big Three, Python has the
cleanest, most scientist- or engineerfriendly syntax and semantics, and excellent performance.
#3: Use a modern source-code
management system
Most large scientific programs start life
as an effort by one or two people to
give themselves a useful tool. Later, the
successful efforts grow rapidly, and
there is a painful transition (which
some are unable to make) to an effort
with several or many authors.
Even in the earliest stages, it pays to
use a modern source-code management
tool. It costs no more effort than the
traditional tar-and-move approach,
leaves a trail of documentation about
your decisions, and supports alternate
development and multiauthor collaboration. Early SCM tools such as SCCS
were useful for one-directory programs.
Today, we always use a tool that can deal
with a multiple-directory project.
My favorite SCM tool is Perforce
(Perforce Software, Alameda, Calif.).
You can read about it and download it
from http://www.perforce.com. Perforce
is especially attractive to scientists:
• Perforce licenses are per person, not
per machine, and are inexpensive.
You can use it for free for up to two
users, each on as many computers as
he or she likes.
• It is simple to learn and has excellent
7
.
SCIENTIFIC PROGRAMMING
Object-oriented
numerics list
documentation.
• Perforce works across networks and
doesn’t require any system hacks or
common NSF file systems or user IDs.
• The server can be on Windows NT
or almost any Unix platform, while
the clients can be on Windows
95/98/NT or Unix. Text files are
given the correct format on each
machine.
#4: Use “Design by Contract”
Bertrand Meyer developed the Design
by Contract theory in the context of his
work on the Eiffel programming language.11 (See http://www.eiffel.com for
more information about Eiffel and Design by Contract.) I’m not alone in believing that using Design by Contract is
the single most effective way to improve correctness and hasten development. While Eiffel has the best support
for Design by Contract, the concept
works with any programming language.
(For Design by Contract in C, see the
GNU Nana project, http://www.cs.ntu.
edu.au/homepages/pjm/nana-home. For
Design by Contract in Fortran, a preprocessor such as MPPL might be useful. MPPL is part of Basis; see http://
xfiles.llnl.gov/Basis.)
Using this theory, you add to each
routine a statement of a contract between the routine and its callers in the
form of assertion statements, which set
forth the requirements on the callers of
this routine and the results they may
expect. Tools can extract this contract
as a document, and users can check the
statements at runtime during development to catch violators of the contract.
While the idea is simple, there are
some nuances for object-oriented languages and for the ways of implementing the assertions.
#5: Use Fortran 95, not Fortran 77
By the time the Fortran 90 standard
8
was released, I had become enamored
of object-oriented languages and was
busy learning the subtle points of such
things as C++, Eiffel, and Python. One
day in about 1991, I was giving a review talk and someone asked, “What
about Fortran 90?” to which I arrogantly replied, “I don’t know, I never
plan to learn it.”
Last year, a paper by Randy Roberts
and Mark Gray12 of Los Alamos National Laboratory’s Scientific Programming Department and some correspondence with UCLA’s Viktor
Decyk and JPL’s Charles Norton
changed my mind. I was just in time to
admit my mistake, for finally Fortran
90 compilers have become available
that perform well and implement the
standard. In the meantime, the Fortran
95 standard has been accepted, which
adds another small dose of capability.13
Now I see that Fortran 95 is a very big
improvement to Fortran.
Fortran 77 is a subset of Fortran 95,
so all your old programs should still
work. I advocate embracing the new
parts of Fortran 95, using free-form input, employing modules instead of
common blocks, putting many functions inside modules, declaring interfaces for others, and combining userdefined types and modules in an
“object-based” style as exemplified by
Roberts and Gray. (You can view four
lectures I gave last year on “ObjectBased Programming in Fortran 90”
and download the class materials from
ftp-icf.llnl.gov/pub/OBF90.)
#6: Steer your calculation
Adding compiled routines to an interpreter greatly extends the interpreter’s
power both by connecting it to other libraries and by increasing the variety of
problems that can be solved in a reasonable time. Some systems also allow direct manipulation of compiled variables.
To subscribe, mail to majordomo@
monet.uwaterloo.ca, with “subscribe
oon-list” in the body of the message.
The list is also available in digest format: “subscribe oon-digest.” The
Web page is http://monet.uwaterloo.
ca/oon.
When the compiled portion is large
compared to the interpreted portion,
we say that the interpreter is being
used for steering the compiled code.14
The interpreter language becomes the
user-interface language, either directly
or as the implementation engine for a
GUI.
Steering dramatically changes your
life. You build in a graphical debugger
to your program, and you turn the creativity of your users loose. Requests for
program modifications drop because
users can do many things for themselves. Indeed, a steered code is sometimes put to an entirely different purpose than originally intended. I’ve
never seen anyone try it and then give
it up.
Steering is so effective because it
deals with change. Bertrand Meyer
wrote, “The problem with top-down
structured design is that real problems
have no top.” By replacing that all-toovolatile “top” with an interpreter, you
are enabling significant change without
source modifications.
In our newest local project, we use
Python for steering a C++ code. Three
older, more established systems for creating steered programs are available
free from Lawrence Livermore National Laboratory:
• PACT is Stuart Brown’s system for
creating steered programs with a
Scheme-based interpreter. It includes a scientific database package,
PDB, a graphics package, and many
other facilities.15 See http://www.
llnl.gov/adiv/pact.html.
• Basis is my group’s system for con-
COMPUTING IN SCIENCE & ENGINEERING
.
Figure 1. C++ is an excellent choice of programming language for a 3D program that
must deal with unstructured meshes. Shown here is a test of how well we can maintain the geometry of an imploding ICF capsule using a deliberately bizarre unstructured (Vornoi) mesh. (Courtesy of Michael Owens, LLNL)
structing Fortran applications with
an interpreted array-Fortran-like
user interface. Packages are available
to link to the graphics package from
the National Center for Atmospheric Research and to PDB files.
See http://xfiles.llnl.gov/Basis.16
• Yorick is Dave Munro’s C-based interpreter and graphics package, with
a strong ability to interface to binary
files written in various formats. Information is available at http://xfiles.
llnl.gov/Yorick.17
#7: Use multiple languages (wisely)
The same program can use more than
one language.18,19 This allows you to use
an appropriate mode of expression, depending on the data structures and algo-
JANUARY–FEBRUARY 1999
rithms you want. Some projects combine
Fortran, C++, and Basis, for example.
You need to be aware of the portability problems involved; ensuring
portability is hard because you only
have access to a limited number of platforms yourself. An upcoming installment of this department will offer
some information about how to combine Fortran and C++ and will describe
general multilanguage, multiplatform
programming.
Here’s one hint for combining Fortran with C: be sure the C routines to
be called from Fortran have names that
are mixed-case, such as “MixedCase.”
#8: Use C++ rather than C
C++ is like a real power tool: you can
do amazing things with it, and you can
also cut off your foot. It is absolutely
great when your algorithms and data
structures are complicated yet speed is
important. (See Figure 1.) If you are
going to try C++, absolutely promise
yourself that you are not going to just
read the book and start coding.23 No
matter how smart you are, you are going to need training and a mentor
(whose chief job is to explain the error
messages from the compiler) to help
you through the first months. Be sure
to buy and memorize Scott Meyer’s Effective C++ and More Effective C++ (Addison-Wesley, 1996 and 1998).
If you are a C programmer, you
should still use C++. A C++ compiler
checks more things about your program, and you can slowly start to replace those parts of the program that are
clumsy or impossible to do in C but easy
or reasonable in C++, such as exception
handling and resource management.
Arguments you might hear from
people—that C++ doesn’t have a standard or isn’t as portable as C—are outdated. C++ has an ISO standard, and
the areas in which some C++ computers do not yet meet the standard are
not things you’re going to encounter as
a beginner. You only need to be careful
when using the high-performance template technology described below.
#9: Use templates in C++
A template is a parameter of the definition of some class or function. For example, if I write a particle pusher, I
might have some functions
void move_alpha
(AlphaParticle& a);
void move_electron
(ElectronParticle& e);
The implementations of these might
be exactly the same, using such meth-
9
.
SCIENTIFIC PROGRAMMING
Bienvenue au Café Dubois
ods of a and e as mass(),
charge(), x(), y(), and z(). Using a template means only writing such
things once:
template<class T>
void move (T& t);
where I refer to t.mass (),
t.charge (), and so on. If a new
kind of particle is added to my system,
I don’t have to add a function to move
it; as long as my particle has the requisite methods, the compiler will use the
template to write a new version of
move for my new particle.
Some such problems can also often be
handled using polymorphism, and deciding which to use is often difficult.
Geoffrey Furnish’s article “ContainerFree Numerical Algorithms in C++” will
help you understand the trade-offs.21
Besides using your own templated
classes and functions, be sure to learn
and use the Standard Template
Library.22
Amazing template tricks as described
in papers by Todd Veldhuizen, Scott
Haney, and Geoffrey Furnish have led
to Fortran-level performance in C++,
including Veldhuizen’s vector/matrix
facility for C++ programmers called
Blitz++, described at http://monet.
uwaterloo.ca/oon.23–26
#10: Embrace your Inner
Programmer
If you ask the average scientist or engineer who programs a computer what
his job is, he will never answer, “professional programmer.” But in fact, closer
examination will reveal that he programs computers all day—and for
money. Although your self-identification and chief value to your profession
might not come from your skills as a
professional programmer, you can’t
avoid being one.
10
I would like to exchange information with you frequently about
interesting developments.
Because I do not write the
main essay every month,
I’m including this “Café
Dubois” box each issue so
that you can pull up a
chair via dubois1@llnl.gov.
We can use “Café
Dubois” to share information about interesting software, books, projects, and
hardware. So if you see
something others might
find interesting, please let
me know about it.
Meet the proprietor
I was originally a pure mathematician, with a thesis in infinite Abelian
groups. After six years of postdoctoral and teaching experience, I became a numerical mathematician at Lawrence Livermore National Laboratory in 1976,
where I still am. I led a group devoted to improving the mathematical algorithms in the large simulation programs used there. We advised scientists and
engineers in every discipline who needed to solve mathematical problems and
choose mathematical software.
During this work, I learned and used the systems that built many programs.
This gave me an idea for a reusable system for developing steered Fortran 77
programs. (The main article explains steering.) I got the chance to actually create this system, called Basis, beginning in 1984, as part of a magnetic-fusion
simulation. Since 1987, I have led the computer science efforts of the group
that develops Lasnex, a large inertial-confinement fusion simulation written in
Fortran.
Because of Basis’ wide use, I have worked with many developers about their
problems. I will try to use that experience to choose essays that will interest you.
How to write for “Scientific Programming”
Readers of this department are a very diverse group, so I need essays submitted
by readers that cover the wide range of topics of interest to you.
Write to me at dubois1@llnl.gov and describe an essay you propose to write. If
I believe your idea might fit the needs of the readers, I will encourage you to
send me a draft. If I accept the draft, I will give you a publication date and work
with you to produce the final paper. The magazine’s staff will supply further editing and support. Typically, an essay should be about 3,000 words including up
to eight illustrations or tables and no more than 15 references.
We know that “real” professional
programmers must stay up night and
day to keep up on their computer sci-
ence. How can you do all that and
your own profession too? Here are
some suggestions:
COMPUTING IN SCIENCE & ENGINEERING
.
• Respect that
sity of California,
part of you that
Lawrence LiverIf you ask the average
is the profesmore National
Laboratory, under
sional programscientist or engineer who
contract W-7405mer. We might
ENG-48 (Concall this your
programs a computer
tract 48) between
Inner Programthe US Departmer (sorry, I live
what his job is, he will
ment of Energy
in California;
and the Regents
we can’t help
never answer,
of the University
it). The Inner
of California for
Programmer
“professional
the operation of
needs to attend
UC LLNL. The
a course or
programmer.”
views and opinseminar now
ions of the author
and then and do
expressed herein
some reading.
do not necessarily
• Find computer scientists you respect and listen to their state or reflect those of the US Govrecommendations. When they are ernment or the University of Califorstill making the same recommenda- nia, and shall not be used for advertising or product endorsement purposes.
tions a year later, look into it.
• Lose all interest in hardware. There
is always someone around who knows
what to buy. Spend your limited time
keeping up on software ideas.
References
I
hope “Scientific Programming” will
be a useful part of embracing your
Inner Programmer. In addition to the
main articles, I’ll be bringing you news
and ideas each issue in my sidebar,
“Café Dubois.”
1. L. Wall, T. Christiansen, and R.L. Schwartz,
Programming Perl, 2nd ed. O’Reilly Associates, Sebastapol, Calif., 1996.
2. P.F. Dubois, “Perl by Example,” Computers in
Physics, Vol. 7, No. 5, Sept./Oct. 1993, pp.
545–550.
3. J. Ousterhout, Tcl and the Tk Toolkit, AddisonWesley, Reading, Mass., 1994.
4. E.F. Johnson, Graphical Applications with Tcl &
Tk, M&T Books, New York, 1996.
5. M. Lutz, Programming Python, O’Reilly Associates, Sebastapol, Calif., 1996.
6. A. Watters, G. von Rossum, and J.C. Ahlstrom, Internet Programming with Python,
M&T Books, 1996.
7. P.F. Dubois and T.-Y. Yang, “Extending Python,” Computers in Physics, Vol. 10, No. 4,
July/Aug. 1996, pp. 359–365.
Acknowledgment
This work was produced at the Univer-
JANUARY–FEBRUARY 1999
8. P.F. Dubois, “A Facility for Extending Python in
C++,” to be published in Proc. Seventh Int’l
Python Conf., Fortec Seminars, Inc., Reston Va.,
1998, pp. 61–68; http://www.python.org.
9. D. Beazley, “SWIG and Automated C/C++
Scripting Extensions,” Dr. Dobb’s Journal, Vol.
282, Feb. 1998, pp. 30–36.
10. P.F. Dubois, K. Hinsen, and J. Hugunin, “Numerical Python,” Computers in Physics, Vol. 10,
No. 3, May/June 1996, pp. 262–267; documentation can be downloaded from ftp-icf.
llnl.gov/pub/python/LLNLDistribution.tgz.
11. J.-M. Jézéquel and B. Meyer, “Design by Contract: The Lessons of Ariane,” Computer, Vol.
30, No. 2, Jan. 1997, pp. 129–130.
12. M.G. Gray and R.M. Roberts, “Object-Based
Programming in Fortran 90,” Computers in
Physics, Vol. 11, No. 4, July/Aug. 1997, pp.
355–361.
13. J.C. Adams et al., Fortran 95 Handbook: Complete ISO/ANSI Reference, MIT Press, Cambridge, Mass., 1997.
14. P.F. Dubois, “Making Applications Programmable,” Computers in Physics, Vol. 8, No. 1,
Jan./Feb. 1994, pp. 70–73.
15. S.A. Brown, P.F. Dubois, and D.H. Munro,
“Creating and Using PDB Files,” Computers in
Physics, Vol. 9, No. 2, Mar./Apr. 1995, pp.
173–176.
16. P.F. Dubois et. al., The Basis System, UCRLMA-118543. Lawrence Livermore Nat’l Laboratory, Livermore, Calif., 1988–1998; http://
xfiles.llnl.gov/Basis.
17. D.H. Munro, “Using the Yorick Interpreted
Language,” Computers in Physics, Vol. 9, No.
6, Nov./Dec. 1995, pp. 609–615.
18. L. Busby and P.F. Dubois, “Powerful, Portable
Fortran Programming,” Computers in Physics,
Vol 7, No. 2, Jan./Feb. 1993, pp. 38–43.
19. L. Busby and P.F. Dubois, “Portable Programming and the Fortran Standard,” Computers
in Physics, Vol. 7, No. 2, Mar./Apr. 1993, pp.
162–165.
20. B. Stroustrop, The C++ Programming Language, 3rd ed., Addison-Wesley, 1997.
21. G. Furnish, “Container-Free Numerical Algorithms in C++” Computers in Physics, Vol. 12,
No. 3, May/June 1998, pp. 258–265.
22. M. Nelson, C++ Programmer’s Guide to the
Standard Template Library, IDG Books Worldwide, Foster City, Calif., 1995.
23. T. Velduizen, “Expression Templates,” C++
Report, Vol. 7, No. 5, May/June 1995, pp.
26–31.
24. S.W. Haney, “Beating the Abstraction Penalty
in C++ Using Expression Templates,” Computers in Physics, Vol. 10, No. 6, 1996, pp.
552–557; “Correction,” Computers in Physics,
Vol. 11, No. 1, Jan./Feb. 1997, p. 14.
25. G. Furnish, “Disambiguated Glommable Expression Templates,” Computers in Physics,
Vol. 11, No. 3, May/June 1997, pp. 263–269.
26. A.D. Robison, “C++ Gets Faster for Scientific
Computing,” Computers in Physics, Vol. 10,
No. 5, Oct./Nov. 1996, pp. 458–462.
11
Download