An essay on the craft of programming Nuno A. Fonseca e-mail: nf@ncc.up.pt November 29, 2006 1 Introduction Programming is a kind of an art. Why? Well, it is like painting in the sense that the programmer starts with a blank sheet and, with a combination of science, art and craft, sketches the overall shape of the programmer and then fills the details. As painters need to know when to stop work on the details, a programmer should also know when to stop refining and embellish a program - it can never be perfect. A programmer is a listener, adviser, interpreter and dictator in the process of making a computer do something. The programmer tries to capture elusive requirements and express them using some programming language so that the computer does what is expected to do. In the process, a programmer also tries to document the work, so that others (and himself!) can understand it, and to develop the work so that others can build on it. Furthermore, all this is done against the clock. Software development is usually a long process that takes time (and money). Programmers spend most of their time not in developing but on debugging, fixing bugs, and maintaining their programs. Being a good programmer [1] is difficult and noble aspiration. Several issues related to programming are important, namely designing, testing, debugging, among other things. Here we will enumerate some good programming practices that, when followed, may reduce the developing time, including time spent in debugging and maintenance. The outcome of following good programming practices is also the production of good code (a somewhat subjective definition). This essay enumerates several good programming practices and programming rules. These practices and rules are general, and therefore can be applied to most programming languages. 2 Programming Rules A pragmatic philosophy of programming encloses a bottom-up strategy (as opposed to top-down): complex programs (systems) are constructed by building a set of simpler com1 2 PROGRAMMING RULES ponents (functions, methods or programs) that perform simple things, but do it well. A paradigmatic example is the Unix philosophy: write programs that do one thing and do it well; write programs to work together; and write programs to handle text streams because that is a universal interface. Several rules of good practices of programming are next outlined (most of them are found in [2]). The term program and function are used interchangeably. 1. Modularity Rule (a) Write simple parts and connect them by clean interfaces; (b) It allows the reduction of the debugging time (that often dominates development time). 2. Clarity Rule (a) Clarity is better than cleverness; (b) Code that is clear is less likely to break and is easily comprehended by the next person that has to change/fix it (which can be you); (c) Remember: Software maintenance is important (and expensive). 3. Simplicity Rule (a) Main motto: “small programs/functions are beautiful”! (b) Design for simplicity - add complexity only when strictly necessary; 4. Parsimony Rule (a) Write a big program/function only when it is clear by demonstration that nothing else will do; (b) A program/function is big in the sense of large in volume of code and/or of internal complexity. 5. Composition Rule (a) Write programs that can be connected to other programs; (b) Bigger programs can be a composition of smaller ones; (c) Connection can be made in many ways, namely through interprocess communication mechanisms or simply by using text streams. 6. Separation Rule (a) Separate policy from mechanism and separate GUI from engines; (b) It allows changing the policy or GUI without destabilizing the mechanism/engines. 2 2 PROGRAMMING RULES 7. Transparency Rule (a) A program is transparent if one can look at it working and see what is doing and how; (b) A program is discoverable when it has facilities for monitoring and displaying the internal state; (c) Design for visibility to make inspection and debugging easier. 8. Robustness Rule (a) A software is robust when it performs well under unexpected conditions which stresses the designer assumptions; (b) Results from transparency and simplicity; (c) Unfortunately, most software systems are fragile and buggy. 9. Representation Rule (a) Data is more tractable than program logic (eg. a diagram of a 20 node tree is easily understood than the flowchart of a program that generates the tree); (b) Look for ways to shift the complexity from code to data; (c) Fold knowledge into data so program logic can be simple and robust. 10. Least surprise Rule (a) In interface design always do the least surprising thing; (b) Interfaces should follow the KISS principle -Keep It Stupid and Simple. 11. Silence Rule (a) When a program has nothing interesting to say, it should say nothing (no spurious messages). 12. Repair Rule (a) When the program fails, fail noisily and as soon as possible; (b) Programs should cope with incorrect inputs and its own execution errors as elegantly as possible; (c) When errors cannot be dealt with, the programs should fail in a way that makes the diagnosis of the problem as easy as possible. 13. Economy Rule (a) A programmer time is expensive - conserve it in preference to machine time. 3 2 PROGRAMMING RULES 14. Generation Rule (a) Use code generators to automate error-prone tasks. (b) Avoid hand-hacking - write programs to write programs whenever possible; (c) Hand-hacking or manually customizing programs is error prone (eg. some details are often neglected); 15. Optimization Rule (a) Prototype before polishing, get it working before optimizing; (b) Make it work first, then make it work better (faster); (c) When in doubt, use brute force approach; (d) Bottlenecks often occur in surprising places. Do not try to guess and optimize the code until you have proof of where the bottleneck is; (e) Don’t tune for speed until you have measured, and then only when one part of the code overwhelms the rest; (f) Fancy algorithms are slow when n (input size) is small, and n is often small. Before using fancy algorithms check previous point; (g) Fancy algorithms are much harder to implement and therefore, often, buggier than simpler ones. Use simple algorithms and data structures first. (h) When tuning, do it systematically so that bigger performance gains can be achieved with minimum increase in code complexity. 16. Extensibility Rule (a) Design for the future because it will be here sooner than you think; (b) Don’t assume that there is a single solution (one true way) to solve a problem; (c) Leave room to the code and data formats to grow; (d) Add comments in the code of the type “if you ever need to do X then ...”. 17. SPOT Rule (Single Point of Truth) (a) DRY Principle (Dont Repeat Yourself) - Don’t repeat code: every piece of knowledge must have a single, unambiguous, authoritative representation within the system; (b) Repetition leads to inconsistency and broken code (eg. when some repetitions are modified instead of all of them); (c) Code repetition can be removed by refactoring. Refacturing is the process of rewriting, reworking and re-architecturing code. Refactoring is used to eliminate duplication, non-orthogonal design, outdated knowledge and to improve performance. 4 3 PROGRAMMING PHILOSOPHY (d) Data repetition motto: No junk, No confusion. The data structure (the model) should be minimal, e.g., should not be too general that it can represent situations which cannot exist; (e) Seek for data structures whose states have one to one correspondence with the states of the real world. 18. Pragmatic rule: there are no perfect software. 19. Beware with offered code: avoid wizard code or other code that you don’t understand. 20. The users rule: When in doubt, Work with a user to think like a user. 3 Programming Philosophy 3.1 Duplication is evil The DRY principle - Dont Repeat Yourself - states that the programmer should not duplicate knowledge throughout the system. One way to avoid it is to be aware of the main types of duplication: • Imposed duplication (the programmer does not have a choice) • inadvert duplication • impatient duplication (laziness and because is easier to copy code) • interdevelopper duplication 3.2 Orthogonality is good Do not split pieces of knowledge across multiple system components. Organize the code/system around functionality and not around job functions (of the client). Try to have independent and decoupled components. Doing this eliminates effects between unrelated things/components and increases productivity since changes and tests are local. Therefore, a programmer should • keep the code decouple - modules/components do not reveal anything to each others - manipulation is done through some kind of API • avoid global data - it allows components to leak information to each other in an uncontrolled way; • avoid similar functions; • perform testing at components/modules level. 5 3.3 Target for modularity 3.3 3 PROGRAMMING PHILOSOPHY Target for modularity Are the functions/methods too large? It is not a metric based on the number of lines but is based on the complexity of the function in terms of what it does. If you can’t describe what it does in one line then it is big. Another hint that suggest that a function should be divided into smaller functions is when it has too many levels of indentation or too many local variables. In order to profit from modularity, one needs to have a good interface, an interface that makes sense without looking at the implementation behind it. Test: try to describe it to another programmer by the phone and see if he understands it. 3.4 Tracer coding When doing something novel, and where the requirements are vague, a programmer should use something like a trace bullet. Tracer bullets are bullets that burn very brightly during their flight making them visible to the naked eye. They are used by soldiers and placed among the other bullets, allowing the shooter to follow the bullet trajectory relative to the target in order to make corrections to the aim. The same concept can be applied to programming. Instead of specifying the system in every detail and producing tons of paper, a programmer develops the framework with the basic functionality (to see if all works together). Then, incrementally, the remaining functionalities are added. The programmer starts developping the underlying structure. The code addded at each stage kept, thererefore it should contain error checking, be structured and have documentation. Another advantage of this approach is that a demo is available early for both programmers and users to see, thus allowing one to better see the progress and correct the course if necessary. 3.5 A prototype? Prototypes are used to analyze, expose risk, and often to answer a few questions. What to prototype? Anything experimental or that you do not have experience with. The goal of prototyping is to learn - it is the only time that the value does not reside on the code. For instance, you can prototype • architectures • new functionalities • structure or contents of external data • third-party tools or components • performance issues • user interface design 6 3.6 Always Automate 3.6 3 PROGRAMMING PHILOSOPHY Always Automate Automation ensures consistency, repeatability and accuracy. Do not use manual procedures (to compile, test, backups, versions, website generation, ...), but always automate them. One example of automation is code generators, ie. code that writes code. There are two main types of code generators: • Passive code generators - are run once to produce a result. Eg. creating new source files (templates, copyright notices, ...), performing one-off conversions among programming languages; • Active code generators - are used each time the results are required Eg. generate the source code for data structures from a database scheme. 3.7 Debugging Debugging a program is seen by many programmers as a nighmare, a task to avoid at all costs, because it may be tedious, frustrating, and often takes a long time. The problem gets worst when a programmer needs to debug third-party code. However, debugging can be rather painless, and maybe fun, it is attacked as a puzzle to be solved. Use a debugger to pinpoint the problem. Whenever possible use a debugger that allows you to visualize the data. Often, is necessary to use tracing statements - little diagnostic messages printed by a program to the screen or a file that say things like “i got here” and “the value of X=10”. Tracing is important in concurrent systems, real-time systems and event based applications. An useful technique for finding the cause of a problem is simply to explain it to someone else. They do not need to say anything, the simple act of explaining the problem, step by step, often causes the reason of the problem to be understood. In the process of debugging do not assume anything, you should prove it. Identify the reasons that caused the bug and check if they exist anywhere else in the code. 3.8 Meta-programming The systems should be highly configurable. The choice of algorithms, database product, midleware technology and user interface style should be implemented as configuration options. To this effect, a programmer should use metadata 1 to describe configuration options for an application: tuning parameters, user preferences,, installation directory, etc. 3.9 Avoid programming by coincidence A strategy often used by novice programmers is what is called programming by coincidence. It works as follows: the programmer types some code, tries it and it seems to work; he 1 Metadata is data about data, is any data that describes the application. 7 3.10 Testing 3 PROGRAMMING PHILOSOPHY then types more code, tries it and continues this process until the program seems to work. Naturally, after some time the program stops working and then the programmer will have great difficulties in fixing the bug. The problem with the strategy relies in the fact that does not know why the code worked the first place, therefore understanding why it does not work becomes a more difficult job. 3.10 Testing Test early, often, and automatically. What to test? • unit testing: Software unit test is code that exercises a module, by establishing a artificial environment and then testing routines in the module being tested. Testing should be done at the module level, in isolation, to verify its behavior. • integration testing: evaluates the major subsystems that make up the project work well with each other. • validation and verification: validate the results produced by the system. • resource exhaustion or errors: check what happens when the resources fail (lack of memory, dis ck space, network,...) • performance testing: verifies how the systems behaves under stress and how evaluates its performance. • usability testing: is done by users. How to test? • Regression tests - compares the output of the system with previously known values • test data (synthetic and real) • exercising GUI • testing the tests - use saboteurs that introduce errors in the code and then check if the tests detect the errors 3.11 Optimization The most important thing to know about optimization for performance is to know when not to do it. The smartest, cheapest, and often fastest way to improve performance gains is to wait a few months for the hardware to become fastest (exploit Moore’s law implies that you can improve performance 26% in six months just by buying new hardware). A programmer, before optimizing the code, should measure where the program spends time (profile your code). 8 3.12 Complexity 3.12 4 ABOUT USER INTERFACES Complexity “Everything should be made as simple as possible, but no simpler.” - Albert Einstein How to define simplicity in programming? There are several ways • Implementation complexity (programmer view) – degree of difficulty that a programmer will experience while attempting to understand a program so that he can mentally model or debug it • Interface complexity (user’s view) • Number of lines of code (codebase size): more lines of code tend to represent more bugs 3.13 Develop multivalent programs A multivalent program should have the following traits • The application logic lies in a library with a documented API and can be linked to other programs; • One UI mode is a GUI either linked directly to the core library or acting as a separate process. 3.14 Estimating Times When you estimate the duration of some task, like developping a system, you should take attention to the units used. if you say that something takes about 120 working days, then peoplle will be expect it to be completed in a date pretty close (one or less days difference). But if you say 6 months, people will expect the conclusion between 5 to 7 months. Therefore, select the quote estimate unit (days, weeks, or months) wisely. How do estimate? The easiest and provably the more accurate approach is to ask someone that has done the same thing previously. If that does not apply then, after understanding what is being asked, you build a model of the system (a rough picture of how the system, and respective components, is going to be implemented) and estimate the time for developing each component. The estimation should get more accurate with experience. 4 About User Interfaces 4.1 User Interface Designs • Apply the rule of least surprise whenever possible. 9 4.2 Web Browser as a Universal Front End 5 FINAL REMARKS • A program interface should be – transparent (WYSIWYG) – concise – expressive – scriptable • A program interface is scriptable if it is easily manipulated by other programs (allows task automation); • A program interface is concise when the length and complexity of the actions required to perform some task can be done easily (few keystrokes, mouse clicks, ...); • A program interface is expressive when can be readily used to performs a variety of actions; 4.2 Web Browser as a Universal Front End For a large class of applications it makes increasing sense to use web browsers as interface. The main advantages are • the GUI does not need to be implemented, instead it can be described using languages like HTML • avoids complex and expensive coding to implement the interface • the application becomes Internet ready On the other hand, the possible disadvantage is making a batch style interaction. 5 Final Remarks To conclude, a good (pragmatic) programmer [3] besides following the above rules, philosophies, and principles, should also be: • early adopter/fast adopter • inquisitive • critical thinker • realistic • a jack of all trades To become a good programmer, or just to keep updated, one should: 10 REFERENCES REFERENCES • invest regularly on improving oneself: eg. learn a new language every year, read a technical book each quarter,...; • Diversify: eg. read a non-technical book periodically; • invest on learning emerging technologies; • review and balance your methods and choices. Finally, a good programmer does not leave bad design or poor code, he/she fixes them when discovered to avoid the entropy increase in the code. The problem of inaction is similar to what happens when you have a broken window in a building that is left unrepaired - it instills a sense of abandonment of the building. References [1] Robert L Read. How to be a programmer: A short, comprehensive, and personal summary. http://samizdat.mines.edu/howto/HowToBeAProgrammer.html. [2] Eric S. Raymond. The Art of UNIX Programming. Addison-Wesley, 2004. [3] Andrew Hunt and David Thomas. The Pragmatic Programmer. Addison-Wesley, 17th edition, 1964. 11