. SCIENTIFIC PROGRAMMING Editor: Paul F. Dubois, dubois1@llnl.gov TEN GOOD PRACTICES IN SCIENTIFIC PROGRAMMING O VER THE LAST FIVE YEARS IN COMPUTERS IN PHYSICS, “SCIENTIFIC PROGRAMMING” HAS COVERED A WIDE VARIETY OF TOOLS AND TECHNIQUES FOR WRITING SCIENTIFIC PROGRAMS. With your help, I hope to cover an even wider spectrum in the future, as is appropriate to our new readership. As always, I’ll focus on giving you practical information that will help you do your job. The following “Ten Good Practices” distill some of the ideas from past articles and from my own experience. I hope this list will give new readers an idea of the kinds of subjects the department has covered in the past, while reminding old readers of topics they meant to learn more about. #1: Organize for change Center your programming decisions around the problem of dealing with change. Change is the hallmark of scientific programming—changes in the hardware, models, user interface, operating system, and libraries you use. To illustrate, the large Fortran application I support is more than 20 years old, yet we make at least 75 significant changes each year. It has survived changes in host computers, operating systems, languages, user interfaces, and personnel. Programming decisions can reflect this commitment to change: use only standard language and library features, never a proprietary feature or preprocessor. Use techniques for portable precision and mixed-language computing. Encapsulate library calls to MPI or graphics so that you do not JANUARY–FEBRUARY 1999 have information that must be changed in many different places when fashion switches direction. I discuss a methodology for dealing with change in Object Technology for Scientific Programming (Prentice-Hall, 1997). The object-oriented programming revolution has swept over most areas of computing precisely because it gives people a better way of dealing with change. If you haven’t learned about OOP, I recommend Bertrand Meyer’s Object-Oriented Software Construction (Prentice Hall, 1997). OOP is worth learning even if you remain committed to Fortran. #2: Write a script instead of a compiled program When possible, write in an interactive scripting language instead of a compiled language. You get immediate feedback on your algorithm, and the language usually incorporates sophisticated data structures such as associative arrays. The “Big Three” general-purpose scripting languages are Perl,1,2 Tcl,3,4 and Python.5,6 There are commercial interpreters for numerical calculations, including Matlab and IDL, and some free products. My favorite choice is Python—a free, small, easy-to-learn, object-oriented interpreter (http://www.python. org). It is especially easy to extend with compiled code.7–9 A numerical package10 adds array calculations with near- compiled speed, because only the decision to do the operation takes place in the interpreter, while the compiled routines do the actual work. There is an interface to Tcl’s Tk toolkit for creating portable GUIs. Of the Big Three, Python has the cleanest, most scientist- or engineerfriendly syntax and semantics, and excellent performance. #3: Use a modern source-code management system Most large scientific programs start life as an effort by one or two people to give themselves a useful tool. Later, the successful efforts grow rapidly, and there is a painful transition (which some are unable to make) to an effort with several or many authors. Even in the earliest stages, it pays to use a modern source-code management tool. It costs no more effort than the traditional tar-and-move approach, leaves a trail of documentation about your decisions, and supports alternate development and multiauthor collaboration. Early SCM tools such as SCCS were useful for one-directory programs. Today, we always use a tool that can deal with a multiple-directory project. My favorite SCM tool is Perforce (Perforce Software, Alameda, Calif.). You can read about it and download it from http://www.perforce.com. Perforce is especially attractive to scientists: • Perforce licenses are per person, not per machine, and are inexpensive. You can use it for free for up to two users, each on as many computers as he or she likes. • It is simple to learn and has excellent 7 . SCIENTIFIC PROGRAMMING Object-oriented numerics list documentation. • Perforce works across networks and doesn’t require any system hacks or common NSF file systems or user IDs. • The server can be on Windows NT or almost any Unix platform, while the clients can be on Windows 95/98/NT or Unix. Text files are given the correct format on each machine. #4: Use “Design by Contract” Bertrand Meyer developed the Design by Contract theory in the context of his work on the Eiffel programming language.11 (See http://www.eiffel.com for more information about Eiffel and Design by Contract.) I’m not alone in believing that using Design by Contract is the single most effective way to improve correctness and hasten development. While Eiffel has the best support for Design by Contract, the concept works with any programming language. (For Design by Contract in C, see the GNU Nana project, http://www.cs.ntu. edu.au/homepages/pjm/nana-home. For Design by Contract in Fortran, a preprocessor such as MPPL might be useful. MPPL is part of Basis; see http:// xfiles.llnl.gov/Basis.) Using this theory, you add to each routine a statement of a contract between the routine and its callers in the form of assertion statements, which set forth the requirements on the callers of this routine and the results they may expect. Tools can extract this contract as a document, and users can check the statements at runtime during development to catch violators of the contract. While the idea is simple, there are some nuances for object-oriented languages and for the ways of implementing the assertions. #5: Use Fortran 95, not Fortran 77 By the time the Fortran 90 standard 8 was released, I had become enamored of object-oriented languages and was busy learning the subtle points of such things as C++, Eiffel, and Python. One day in about 1991, I was giving a review talk and someone asked, “What about Fortran 90?” to which I arrogantly replied, “I don’t know, I never plan to learn it.” Last year, a paper by Randy Roberts and Mark Gray12 of Los Alamos National Laboratory’s Scientific Programming Department and some correspondence with UCLA’s Viktor Decyk and JPL’s Charles Norton changed my mind. I was just in time to admit my mistake, for finally Fortran 90 compilers have become available that perform well and implement the standard. In the meantime, the Fortran 95 standard has been accepted, which adds another small dose of capability.13 Now I see that Fortran 95 is a very big improvement to Fortran. Fortran 77 is a subset of Fortran 95, so all your old programs should still work. I advocate embracing the new parts of Fortran 95, using free-form input, employing modules instead of common blocks, putting many functions inside modules, declaring interfaces for others, and combining userdefined types and modules in an “object-based” style as exemplified by Roberts and Gray. (You can view four lectures I gave last year on “ObjectBased Programming in Fortran 90” and download the class materials from ftp-icf.llnl.gov/pub/OBF90.) #6: Steer your calculation Adding compiled routines to an interpreter greatly extends the interpreter’s power both by connecting it to other libraries and by increasing the variety of problems that can be solved in a reasonable time. Some systems also allow direct manipulation of compiled variables. To subscribe, mail to majordomo@ monet.uwaterloo.ca, with “subscribe oon-list” in the body of the message. The list is also available in digest format: “subscribe oon-digest.” The Web page is http://monet.uwaterloo. ca/oon. When the compiled portion is large compared to the interpreted portion, we say that the interpreter is being used for steering the compiled code.14 The interpreter language becomes the user-interface language, either directly or as the implementation engine for a GUI. Steering dramatically changes your life. You build in a graphical debugger to your program, and you turn the creativity of your users loose. Requests for program modifications drop because users can do many things for themselves. Indeed, a steered code is sometimes put to an entirely different purpose than originally intended. I’ve never seen anyone try it and then give it up. Steering is so effective because it deals with change. Bertrand Meyer wrote, “The problem with top-down structured design is that real problems have no top.” By replacing that all-toovolatile “top” with an interpreter, you are enabling significant change without source modifications. In our newest local project, we use Python for steering a C++ code. Three older, more established systems for creating steered programs are available free from Lawrence Livermore National Laboratory: • PACT is Stuart Brown’s system for creating steered programs with a Scheme-based interpreter. It includes a scientific database package, PDB, a graphics package, and many other facilities.15 See http://www. llnl.gov/adiv/pact.html. • Basis is my group’s system for con- COMPUTING IN SCIENCE & ENGINEERING . Figure 1. C++ is an excellent choice of programming language for a 3D program that must deal with unstructured meshes. Shown here is a test of how well we can maintain the geometry of an imploding ICF capsule using a deliberately bizarre unstructured (Vornoi) mesh. (Courtesy of Michael Owens, LLNL) structing Fortran applications with an interpreted array-Fortran-like user interface. Packages are available to link to the graphics package from the National Center for Atmospheric Research and to PDB files. See http://xfiles.llnl.gov/Basis.16 • Yorick is Dave Munro’s C-based interpreter and graphics package, with a strong ability to interface to binary files written in various formats. Information is available at http://xfiles. llnl.gov/Yorick.17 #7: Use multiple languages (wisely) The same program can use more than one language.18,19 This allows you to use an appropriate mode of expression, depending on the data structures and algo- JANUARY–FEBRUARY 1999 rithms you want. Some projects combine Fortran, C++, and Basis, for example. You need to be aware of the portability problems involved; ensuring portability is hard because you only have access to a limited number of platforms yourself. An upcoming installment of this department will offer some information about how to combine Fortran and C++ and will describe general multilanguage, multiplatform programming. Here’s one hint for combining Fortran with C: be sure the C routines to be called from Fortran have names that are mixed-case, such as “MixedCase.” #8: Use C++ rather than C C++ is like a real power tool: you can do amazing things with it, and you can also cut off your foot. It is absolutely great when your algorithms and data structures are complicated yet speed is important. (See Figure 1.) If you are going to try C++, absolutely promise yourself that you are not going to just read the book and start coding.23 No matter how smart you are, you are going to need training and a mentor (whose chief job is to explain the error messages from the compiler) to help you through the first months. Be sure to buy and memorize Scott Meyer’s Effective C++ and More Effective C++ (Addison-Wesley, 1996 and 1998). If you are a C programmer, you should still use C++. A C++ compiler checks more things about your program, and you can slowly start to replace those parts of the program that are clumsy or impossible to do in C but easy or reasonable in C++, such as exception handling and resource management. Arguments you might hear from people—that C++ doesn’t have a standard or isn’t as portable as C—are outdated. C++ has an ISO standard, and the areas in which some C++ computers do not yet meet the standard are not things you’re going to encounter as a beginner. You only need to be careful when using the high-performance template technology described below. #9: Use templates in C++ A template is a parameter of the definition of some class or function. For example, if I write a particle pusher, I might have some functions void move_alpha (AlphaParticle& a); void move_electron (ElectronParticle& e); The implementations of these might be exactly the same, using such meth- 9 . SCIENTIFIC PROGRAMMING Bienvenue au Café Dubois ods of a and e as mass(), charge(), x(), y(), and z(). Using a template means only writing such things once: template<class T> void move (T& t); where I refer to t.mass (), t.charge (), and so on. If a new kind of particle is added to my system, I don’t have to add a function to move it; as long as my particle has the requisite methods, the compiler will use the template to write a new version of move for my new particle. Some such problems can also often be handled using polymorphism, and deciding which to use is often difficult. Geoffrey Furnish’s article “ContainerFree Numerical Algorithms in C++” will help you understand the trade-offs.21 Besides using your own templated classes and functions, be sure to learn and use the Standard Template Library.22 Amazing template tricks as described in papers by Todd Veldhuizen, Scott Haney, and Geoffrey Furnish have led to Fortran-level performance in C++, including Veldhuizen’s vector/matrix facility for C++ programmers called Blitz++, described at http://monet. uwaterloo.ca/oon.23–26 #10: Embrace your Inner Programmer If you ask the average scientist or engineer who programs a computer what his job is, he will never answer, “professional programmer.” But in fact, closer examination will reveal that he programs computers all day—and for money. Although your self-identification and chief value to your profession might not come from your skills as a professional programmer, you can’t avoid being one. 10 I would like to exchange information with you frequently about interesting developments. Because I do not write the main essay every month, I’m including this “Café Dubois” box each issue so that you can pull up a chair via dubois1@llnl.gov. We can use “Café Dubois” to share information about interesting software, books, projects, and hardware. So if you see something others might find interesting, please let me know about it. Meet the proprietor I was originally a pure mathematician, with a thesis in infinite Abelian groups. After six years of postdoctoral and teaching experience, I became a numerical mathematician at Lawrence Livermore National Laboratory in 1976, where I still am. I led a group devoted to improving the mathematical algorithms in the large simulation programs used there. We advised scientists and engineers in every discipline who needed to solve mathematical problems and choose mathematical software. During this work, I learned and used the systems that built many programs. This gave me an idea for a reusable system for developing steered Fortran 77 programs. (The main article explains steering.) I got the chance to actually create this system, called Basis, beginning in 1984, as part of a magnetic-fusion simulation. Since 1987, I have led the computer science efforts of the group that develops Lasnex, a large inertial-confinement fusion simulation written in Fortran. Because of Basis’ wide use, I have worked with many developers about their problems. I will try to use that experience to choose essays that will interest you. How to write for “Scientific Programming” Readers of this department are a very diverse group, so I need essays submitted by readers that cover the wide range of topics of interest to you. Write to me at dubois1@llnl.gov and describe an essay you propose to write. If I believe your idea might fit the needs of the readers, I will encourage you to send me a draft. If I accept the draft, I will give you a publication date and work with you to produce the final paper. The magazine’s staff will supply further editing and support. Typically, an essay should be about 3,000 words including up to eight illustrations or tables and no more than 15 references. We know that “real” professional programmers must stay up night and day to keep up on their computer sci- ence. How can you do all that and your own profession too? Here are some suggestions: COMPUTING IN SCIENCE & ENGINEERING . • Respect that sity of California, part of you that Lawrence LiverIf you ask the average is the profesmore National Laboratory, under sional programscientist or engineer who contract W-7405mer. We might ENG-48 (Concall this your programs a computer tract 48) between Inner Programthe US Departmer (sorry, I live what his job is, he will ment of Energy in California; and the Regents we can’t help never answer, of the University it). The Inner of California for Programmer “professional the operation of needs to attend UC LLNL. The a course or programmer.” views and opinseminar now ions of the author and then and do expressed herein some reading. do not necessarily • Find computer scientists you respect and listen to their state or reflect those of the US Govrecommendations. When they are ernment or the University of Califorstill making the same recommenda- nia, and shall not be used for advertising or product endorsement purposes. tions a year later, look into it. • Lose all interest in hardware. There is always someone around who knows what to buy. Spend your limited time keeping up on software ideas. References I hope “Scientific Programming” will be a useful part of embracing your Inner Programmer. In addition to the main articles, I’ll be bringing you news and ideas each issue in my sidebar, “Café Dubois.” 1. L. Wall, T. Christiansen, and R.L. Schwartz, Programming Perl, 2nd ed. O’Reilly Associates, Sebastapol, Calif., 1996. 2. P.F. Dubois, “Perl by Example,” Computers in Physics, Vol. 7, No. 5, Sept./Oct. 1993, pp. 545–550. 3. J. Ousterhout, Tcl and the Tk Toolkit, AddisonWesley, Reading, Mass., 1994. 4. E.F. Johnson, Graphical Applications with Tcl & Tk, M&T Books, New York, 1996. 5. M. Lutz, Programming Python, O’Reilly Associates, Sebastapol, Calif., 1996. 6. A. Watters, G. von Rossum, and J.C. Ahlstrom, Internet Programming with Python, M&T Books, 1996. 7. P.F. Dubois and T.-Y. Yang, “Extending Python,” Computers in Physics, Vol. 10, No. 4, July/Aug. 1996, pp. 359–365. Acknowledgment This work was produced at the Univer- JANUARY–FEBRUARY 1999 8. P.F. Dubois, “A Facility for Extending Python in C++,” to be published in Proc. Seventh Int’l Python Conf., Fortec Seminars, Inc., Reston Va., 1998, pp. 61–68; http://www.python.org. 9. D. Beazley, “SWIG and Automated C/C++ Scripting Extensions,” Dr. Dobb’s Journal, Vol. 282, Feb. 1998, pp. 30–36. 10. P.F. Dubois, K. Hinsen, and J. Hugunin, “Numerical Python,” Computers in Physics, Vol. 10, No. 3, May/June 1996, pp. 262–267; documentation can be downloaded from ftp-icf. llnl.gov/pub/python/LLNLDistribution.tgz. 11. J.-M. Jézéquel and B. Meyer, “Design by Contract: The Lessons of Ariane,” Computer, Vol. 30, No. 2, Jan. 1997, pp. 129–130. 12. M.G. Gray and R.M. Roberts, “Object-Based Programming in Fortran 90,” Computers in Physics, Vol. 11, No. 4, July/Aug. 1997, pp. 355–361. 13. J.C. Adams et al., Fortran 95 Handbook: Complete ISO/ANSI Reference, MIT Press, Cambridge, Mass., 1997. 14. P.F. Dubois, “Making Applications Programmable,” Computers in Physics, Vol. 8, No. 1, Jan./Feb. 1994, pp. 70–73. 15. S.A. Brown, P.F. Dubois, and D.H. Munro, “Creating and Using PDB Files,” Computers in Physics, Vol. 9, No. 2, Mar./Apr. 1995, pp. 173–176. 16. P.F. Dubois et. al., The Basis System, UCRLMA-118543. Lawrence Livermore Nat’l Laboratory, Livermore, Calif., 1988–1998; http:// xfiles.llnl.gov/Basis. 17. D.H. Munro, “Using the Yorick Interpreted Language,” Computers in Physics, Vol. 9, No. 6, Nov./Dec. 1995, pp. 609–615. 18. L. Busby and P.F. Dubois, “Powerful, Portable Fortran Programming,” Computers in Physics, Vol 7, No. 2, Jan./Feb. 1993, pp. 38–43. 19. L. Busby and P.F. Dubois, “Portable Programming and the Fortran Standard,” Computers in Physics, Vol. 7, No. 2, Mar./Apr. 1993, pp. 162–165. 20. B. Stroustrop, The C++ Programming Language, 3rd ed., Addison-Wesley, 1997. 21. G. Furnish, “Container-Free Numerical Algorithms in C++” Computers in Physics, Vol. 12, No. 3, May/June 1998, pp. 258–265. 22. M. Nelson, C++ Programmer’s Guide to the Standard Template Library, IDG Books Worldwide, Foster City, Calif., 1995. 23. T. Velduizen, “Expression Templates,” C++ Report, Vol. 7, No. 5, May/June 1995, pp. 26–31. 24. S.W. Haney, “Beating the Abstraction Penalty in C++ Using Expression Templates,” Computers in Physics, Vol. 10, No. 6, 1996, pp. 552–557; “Correction,” Computers in Physics, Vol. 11, No. 1, Jan./Feb. 1997, p. 14. 25. G. Furnish, “Disambiguated Glommable Expression Templates,” Computers in Physics, Vol. 11, No. 3, May/June 1997, pp. 263–269. 26. A.D. Robison, “C++ Gets Faster for Scientific Computing,” Computers in Physics, Vol. 10, No. 5, Oct./Nov. 1996, pp. 458–462. 11