Programming Languages: Design, Specification, and Implementation G22.2210-001 Rob Strom September 7, 2006 Outline Conceptual Background Language Paradigms Requirements of Programming Languages Specification vs. Implementation Syntax, Semantics, Types, Static Analysis Imperative: FORTRAN, Cobol, Algol, Pascal, PL/I, C Applicative: LISP, Scheme, ML Object-oriented: Smalltalk, Java “Fourth-generation”: SETL, SQL Logic Programming: Prolog Concurrent-Distributed: Concurrent Pascal, Hermes Languages vs. “Tools”/Patterns Implementation Issues Compile time: parsing, type analysis, static checking Run time: parameter passing, garbage collection, method dispatching, remote invocation, just-in-time compiling, parallelization Tentative Outline Motivations, Universals Fortran and Algol 60 models: recursion and the stack Algol 68, parameter passing, PL/I, C, heap Functional programming: Scheme Formal type systems: ML Object-oriented languages: objects, inheritance, C++ Ada: packages, generics Patterns and Pitfalls in Imperative Languages Logic Programming 4th generation languages Concurrency and Distribution: memory models, Implementation issues Readings Main Text: Secondary Texts David Gelernter and Suresh Jagannathan: “Programming Linguistics”, MIT Press, 1990. Michael L. Scott: “Programming Language Pragmatics”, Academic Press, 2000. Benjamin C. Pierce: “Types and Programming Languages”, MIT Press, 2002 Language References: Giannesini et al: “Prolog”, Addison-Wesley 1986. Gosling et al: “The Java Language Specification”, http://java.sun.com/docs/books/jls/ Dewhurst & Stark, “Programming in C++”, Prentice Hall, 1989. Ada 95 Reference Manual, http://www.adahome.com/rm95/ MIT Scheme Reference • http://www-swiss.ai.mit.edu/projects/scheme/documentation/scheme.html Strom et al: “Hermes: A Language for Distributed Computing”, Prentice-Hall, 1991. Other sources: • R. Kent Dybvig, “The SCHEME Programming Language”, Prentice Hall, 1987. • Jan Skansholm, “ADA 95 From the Beginning”, Addison Wesley, 1997. Grading Projects – 60% Other required homework – 10% Final examination – 30% Programming First 2 Lectures: Readings Pierce, (formal operational semantics) Optional exercises 3.5.16, 8.3.8 Scott, chapters 2, 3, 8 chapter 2, section 1 How do we define tokens and syntax Gelernter chapter 3, sections 1-2 Introduction to FORTRAN and Algol 60 Issues in Language Design Dijkstra, “Goto Statement Considered Harmful”, Backus, “Can Programming Be Liberated from the von Neumann Style?” http://www.spatial.maine.edu/~worboys/processes/hoare%20axiomatic.pdf Hoare, “The Emperor’s Old Clothes”, http://www.stanford.edu/class/cs242/readings/backus.pdf Hoare, “An Axiomatic Basis For Computer Programming”, http://www.acm.org/classics/oct95/#WIRTH66 http://www.braithwaite-lee.com/opinions/p75-hoare.pdf Parnas, “On the Criteria to be Used in Decomposing Systems into Modules”, http://www.acm.org/classics/may96/ Scope General purpose high-level programming languages Excludes: Machine languages Assembly languages “Scripting” languages Markup languages Special purpose languages (e.g. report writing) Graphical languages How do we judge languages? Compactness – writability/expressibility Readability – ease of validation Familiarity of Model Less Error-Prone Portability Hides Details – simpler model Early detection of errors Modularity – Reuse Modularity – Composability Modularity – Isolation Performance Transparency Optimizability Program Specifications What properties a solution will meet E.g: Accept a list of input tuples looking like <k, v>, where k is an integer and v is any string Deliver a list of output tuples such that • If tuple <k, v> appears n times in input, it appears n times in output • If tuple <k, v> appears n times in output, it appears n times in input • For any two successive tuples <k1, v1> and <k2,v2> in output, k1 <= k2 Provided (some restriction, e.g. max number/range of tuples) Doesn’t necessarily say how to compute a solution, and preferably allows for many possible solutions Usually more compact than an implementation Language Specifications Given a “program text” How to tell whether it is a valid expression in the language What it “means” as a specification or an implementation of a program Usually, also: How to take a “chunk” of a “program text” How to determine what it means as a specification of a component How to put together the specifications of components to define the specification of the program Universals Syntax: lexical and syntactic levels Naming: Defining and applied occurrences Scope Types Semantics Operational Denotational Algebraic $18.5 Million Bug (?) IF (TVAL .LT. 0.2E-2) GOTO 40 DO 40 M = 1, 3 W0 = (M-1)*0.5 X = H*1.74533E-2*W0 DO 20 N0 = 1, 8 EPS = 5.0*10.0**(N0-7) CALL BESJ(X, 0, B0, EPS, IER) IF (IER .EQ. 0) GOTO 10 20 CONTINUE DO 5 K = 1. 3 T(K) = W0 Z = 1.0/(X**2)*B1**2+3.0977E-4*B0**2 D(K) = 3.076E-2*2.0*(1.0/X*B0*B1+3.0977E-4*(B0**2-X*B0*B1))/Z E(K) = H**2*93.2943*W0/SIN(W0)*Z H = D(K)-E(K) 5 CONTINUE 10 CONTINUE Y = H/W0-1 40 CONTINUE http://www-aix.gsi.de/~giese/swr/mariner1.html 2003 North American blackout The XA/21 monitoring software runs on Unix and is made of of several subsysystems. According to hacker journalist Kevin Poulsen, the bug was a race condition in the one-million lines of C++ code that made up the event processing subsystem. According to Mike Unum, manager at GE Energy in Melbourne, Florida: “There was a couple of processes that were in contention for a common data structure, and through a software coding error in one of the application processes, they were both able to get write access to a data structure at the same time. And that corruption led to the alarm event application getting into an infinite loop and spinning.” Therac-25 Radiation Therapy Software bugs caused overdoses; 5 died The design did not have any hardware interlocks to prevent the electron-beam from operating in its high-energy mode without the target in place. The engineer had reused software from older models. These models had hardware interlocks that masked their software defects. Those hardware safeties had no way of reporting that they had been triggered, to at least indicate the existence of faulty software commands. The hardware provided no way for the software to verify that sensors were working correctly (see open-loop controller). The table-position system was the first implicated in Therac-25's failures; the manufacturer gave it redundant switches to cross-check their operation. The equipment control task did not properly synchronize with the operator interface task, so that race conditions occurred if the operator changed the setup too quickly. This was evidently missed during testing, since it took some practice before operators were able to work quickly enough for the problem to occur. The software set a flag variable by incrementing it. Occasionally an arithmetic overflow occurred, causing the software to bypass safety checks. The software was written in assembly language. While this was more common at the time than it is today, assembly language is harder to debug than most highlevel languages. BNF expr ::= expr “+” term | expr “–” term | term term ::= term “*” factor | term “/” factor | factor factor ::= number | identifier | “(“ expr “)”