Chapter 1 Preliminaries ISBN 0-321-33025-0 Purpose of This Book • To examine carefully • the underlying concepts of the various constructs and capabilities of programming languages 1-2 Chapter 1 Topics • Reasons for Studying Concepts of Programming Languages • Programming Domains • Language Evaluation Criteria • Influences on Language Design • Language Categories • Language Design Trade-Offs • Implementation Methods • Programming Environments 1-3 Reasons for Studying Concepts of Programming Languages (1) • Increased ability to express ideas – The language in which programmers develop software places limits on the kinds of control structures, data structures, and abstraction they can use • Awareness of a wider variety of programming language features can reduce such limitations in software development – Languages constructs can be simulated in other languages that do not support those constructs directly; • however, the simulation is often – less elegant – more cumbersome – less safe 1-4 Reasons for Studying Concepts of Programming Languages (2) • Improved background for choosing appropriate languages • Increased ability to learn new languages – According to TIOBE, C, Java, and C++ were the three most popular languages in use in Feb. 2015. 1-5 Reasons for Studying Concepts of Programming Languages (3) • Better understanding of significance of implementation – Program bugs – Performance • Better use of languages that are already known • Overall advancement of computing – Those in positions to choose languages were not sufficiently familiar with programming language concepts 1-6 Programming Domains (1) • Scientific applications – Large number of floating point computations – Simple data structures – Fortran • Originally developed by IBM in the 1950s • Business applications – Facilities for • producing elaborate reports, • Storing decimal numbers and character data • The ability to specify decimal arithmetic operations – COBOL • The initial version appeared in 1960 • Artificial intelligence – Symbols, consisting of names rather than numbers, are manipulated – LISP • Appeared in 1959 1-7 Programming Domains (2) • Systems programming – The operating system and all of the programming support tools of a computer system are collectively known as its systems software. • Systems software is used almost continuously and so must be efficient. – A language for this domain must provide • fast execution • having low-level features that allow the software interfaces to external devices to be written –C • The Unix OS is written almost entirely in C 1-8 Programming Domains (3) • Web Software – Eclectic collection of languages: markup (e.g., XHTML), scripting (e.g., PHP), general-purpose (e.g., Java) 1-9 Language Evaluation Criteria • Readability: the ease with which programs can be read and understood • Writability: the ease with which a language can be used to create programs • Reliability: conform to specifications (i.e., performs to its specifications) • Cost: the ultimate total cost 1-10 Why Readability Is Important • Maintenance was recognized as a major part of the software life cycle, particularly in terms of cost. • Ease of maintenance is determined in large part by the readability of programs. 1-11 Characteristics Contributing to the Readability - Simplicity • Overall simplicity – A manageable set of features and constructs • Readability problems occur whenever the program’s author has learned a different subset from that subset with which the reader is familiar. – Few feature multiplicity (methods of doing the same operation) • For example, in Java the following ways could be used to increase a integer variable count = count + 1 count += 1 count++ ++count – Minimal operator overloading (a single operator symbol has more than one meaning) • Overloading may simplify a language by reducing the number of operators; however, it can lead to reduced readability if users are allowed to create their own overloading and do not do it sensibly. 1-12 Excessive Simplicity • Simplicity improves readability; however, excessive simplicity may also reduce readability. – For example: • The form and meaning of most assembly language statements are models of simplicity. • This very simplicity, however, makes assembly language programs less readable. Because they lack more complex control statements, program structure is less obvious. 1-13 Characteristics Contributing to the Readability – Control Statements • The presence of well-known control structures (e.g., while statement) • A program that can be read from top to bottom is much easier to understand than a program that requires the reader to jump from one statement to some nonadjacent statement in order to follow the execution order. 1-14 Characteristics Contributing to the Readability - Data Types and Structures • The presence of adequate facilities for defining data structures • Example: – If a language doesn’t have a Boolean type, then it may need to use a numeric type as an indicator flag timeOut =1 – Comparing with a language providing Boolean type, the following state is much more readable timeOut = true 1-15 Characteristics Contributing to the Readability - Syntax Considerations (1) • Identifier forms: flexible composition – Restricting identifiers to very short lengths detracts from readability 1-16 Characteristics Contributing to the Readability - Syntax Considerations (2) • Special words – Program appearance and thus program readability are strongly influenced by the forms of a language’s special words – Whether the special words of a language can be used as names for program variables? • methods of forming compound statements – C and its descendants use braces to specify compound statements. – All of these languages suffer because statements groups are always terminated in the same way, which makes it difficult to determine which group is being ended when an end or } appears. 1-17 Characteristics Contributing to the Readability - Syntax Considerations (3) • Form and meaning: self-descriptive constructs, meaningful keywords 1-18 Evaluation Criteria: Writability • Writability is a measure of how easily a language can be used to create programs for a chosen problem domain. • Most of the language characteristics that affect readibility also affect writability. – This follows directly from the fact that the process of writing a program requires the programmer frequently to reread the part of the program that is already written 1-19 Writability Comparison between Two Different Languages • It is simply not reasonable to compare the writability of two languages in the realm of a particular application when one was designed for that application and the other was not. 1-20 Characteristics Contributing to the Writability - Support for Abstraction • Abstraction - the ability to define and use complex structures or operations in ways that allow details to be ignored • Programming languages can support two distinct categories of abstraction: – Process – Data 1-21 Process Abstraction • A simple example of process abstraction is the use of a subprogram to implement a sort of algorithm that is required several times in a program. 1-22 Data Abstraction (1) [Wikipedia] • Data abstraction enforces a clear separation between the abstract properties of a data type and the concrete details of its implementation. • The abstract properties are those that are visible to client code that makes use of the data type—the interface to the data type—while the concrete implementation is kept entirely private, and indeed can change, for example to incorporate efficiency improvements over time. • The idea is that such changes are not supposed to have any impact on client code, since they involve no difference in the abstract behavior. 1-23 Data Abstraction (2) • For example, one could define an abstract data type called lookup table which uniquely associates keys with values, and in which values may be retrieved by specifying their corresponding keys. • Such a lookup table may be implemented in various ways: as – a hash table – a binary search tree or – even a simple linear list of (key:value) pairs. • As far as client code is concerned, the abstract properties of the type are the same in each case. 1-24 Data Abstraction (3) • A binary tree – Fortran 77 – use integer arrays to implement – C++ and Java – use a class with two pointers (or references) and an integer 1-25 Characteristics Contributing to the Writability - Expressivity • A set of relatively convenient ways of specifying operations – Example: • the inclusion of for statement in many modern languages makes writing counting loops easier than with the use of while. 1-26 Evaluation Criteria: Reliability • A program is said to be reliable if it performs to its specifications under all conditions. 1-27 Characteristics Contributing to the Reliability – Type Checking • Testing for type errors in a given program, either by the compiler or during program execution. – – – – Run-time type checking is expensive Compile-time type checking is more desirable The earlier errors in programs are detected, the less expensive it is to make the required repairs Example void greater_than(unsigned int a, int b) { if(a<b) printf("a<b\n"); else printf("a>b\n"); } bar() { greater_than(-1,2); } 1-28 Characteristics Contributing to the Reliability – Exception Handling • Exception handling – Intercept run-time errors and – take corrective measures and – then continue the corresponding program’s execution 1-29 Characteristics Damaging the Reliability – Aliasing • Presence of two or more distinct referencing methods for the same memory location • It is now widely accepted that aliasing is a dangerous feature in a programming language • Most programming languages allow some kind of aliasing – – for example, two pointers set to point to the same variable. 1-30 Characteristics Contributing to the Reliability – Readability and Writability • Readability and writability – A language that does not support “natural” ways of expressing an algorithm will necessarily use “unnatural” approaches, and hence reduced reliability 1-31 Evaluation Criteria: Cost • • • • • Training programmers to use language Writing programs Compiling programs Executing programs Language implementation system: availability of free compilers • Reliability: poor reliability leads to high costs • Maintaining programs 1-32 Evaluation Criteria: Others • Portability – The ease with which programs can be moved from one implementation to another • Generality – The applicability to a wide range of applications • Well-definedness – The completeness and precision of the language’s official definition 1-33 Influences on Language Design • Computer Architecture – Languages are developed around the prevalent computer architecture, known as the von Neumann architecture • Programming Methodologies – New software development methodologies (e.g., object-oriented software development) led to new programming paradigms and by extension, new programming languages 1-34 Von Neumann Architecture • Most of the popular languages of the past 50 years have been designed around the prevalent computer architecture: Von Neumann architecture • These language are called imperative languages. – Data and programs are stored in the same memory – Memory is separate from CPU – Instructions and data are transmitted from memory to CPU – Results of operations in the CPU must be moved back to memory • Nearly all digital computers built since the 1940s have been based on the von Neumann architecture 1-35 The Motherboard of a Computer 1-36 The von Neumann Architecture 1-37 Central Features of Imperative Languages • Variables: model memory cells • Assignment statements: model piping • Iteration is fast on von Neumann computers because instructions are stored in adjacent cells of memory and repeating the execution of a section of code requires only a simple branch instruction 1-38 Program Execution on a Von Neumann Computer • The execution of a machine code program on a von Neumann architecture computer occurs in a process called the fetch-execute cycle. • Each instruction to be executed must be moved from memory to the processor. • The address of the next instruction to be executed is maintained in a register called the program counter. 1-39 Fetch-execute-cycle (on a von Neumann Architecture) initialize the program counter repeat forever fetch the instruction pointed by the counter increment the counter decode the instruction execute the instruction end repeat P.S.: the ``decode the instruction’’ step in the algorithm means the instruction is examined to determine what action it specifies. 1-40 Functional Language Programs Executed on a Von Neumann Machine • A functional language is one in which the primary means of computation is applying functions to given parameters. • Programming can be done in a functional language – without the kind of variables that are used in imperative languages – without assignment statements and – without iteration. • Although many computer scientists have expounded on the myriad benefits of functional languages, it is unlikely that they will displace the imperative language until a non-von Neumann computer is designed that allows efficient execution of programs in functional languages 1-41 Evolution of Programming Methodologies (1) • 1950s and early 1960s: – simple applications – worry about machine efficiency • 1970s: – hardware costs decreased – programmer costs increased – larger and more complex problems were being solved by computers – Emphasis: • structured programming • top-down design and step-wise refinement – Deficiency: • Incompleteness of type checking 1-42 Evolution of Programming Methodologies (2) • Late 1970s: – shift from procedure-oriented to data-oriented – emphasize data design, focusing on the use of abstract data types to solve problems – most languages designed since the late 1970s support data abstraction • Middle 1980s: Object-oriented programming – data abstraction • encapsulates processing with data objects • controls access to data – Inheritance • enhances the potential reuse of existing software, thereby providing the possibility of significant increases in software development productivity – dynamic method binding • allow more flexible use of inheritance • overloaded method • overridden method 1-43 All of the evolutionary steps in software development methodologies led to new language constructs to support them. 1-44 Programming Language Categories • Imperative – Central features are variables, assignment statements, and iteration – Examples: C, Pascal • Functional – Main means of making computations is by applying functions to given parameters – Examples: LISP, Scheme • Logic – Rule-based (rules are specified in no particular order) – Example: Prolog • Object-oriented – Data abstraction, inheritance, late binding – Examples: Java, C++ 1-45 Should Languages Support Object-oriented Programming Form a Separate Language Category? • The author of this book claimed that he does not consider languages that support object-oriented programming to form a separate category of language, because, both imperative languages and function languages support object-oriented programming. 1-46 Subcategories of Imperative Languages • Visual languages: – e.g. Visual BASIC and Visual BASIC .NET – These languages include capabilities for dragand-drop generation of code segments. – Once called fourth-generation Languages – Provide a simple way to generate graphical user interfaces to programs. • Scripting Languages – e.g. Perl, JavaScript, and Ruby 1-47 A Typical Session in Microsoft Visual Basic 6 1-48 Execution Order of Programs • In an imperative language, – an algorithm is specified in great detail and – the specific order of execution of the instructions or statements must be included. • In a rule-based language, however, rules are specified in NO particular order – The language implementation system must choose an execution order that produces the desired result. 1-49 Markup Programming hybrid languages • not a programming language, but used to specify the layout of information in Web documents – examples: XHTML, XML – However, some programming capability has crept into some extensions to XHTML and XML 1-50 Benefits of Modular Design • Modular design brings with it great productivity improvements. – First of all, small modules can be coded quickly and easily. – Secondly, general purpose modules can be reused, leading to faster development of subsequent programs. – Thirdly, the modules of a program can be tested independently, helping to reduce the time spent debugging. 1-51 Language Design Trade-offs • The programming language evaluation criteria provide a framework for language design; however, that framework is selfcontradictory. 1-52 Instances of Language Design Trade-Offs • Reliability vs. cost of execution – Conflicting criteria – Example: Java demands all references to array elements be checked to ensure that the index is in it legal ranges but that leads to increased execution costs • Readability vs. writability – Another conflicting criteria – Example: APL provides many powerful operators (and a large number of new symbols), allowing complex computations to be written in a compact program but at the cost of poor readability • Writability (flexibility) vs. reliability – Another conflicting criteria – Example: C++ pointers are powerful and very flexible but not reliably used 1-53 Primary Components of a Computer • Internal Memory – Used to store data and program • Processor – a collection of circuits that provides a realization of a set of primitive operations, or machine instructions, such as those for arithmetic and logic operations. 1-54 The Machine Language of a Computer • Is its set of instructions. • Is the ONLY language that the hardware of the computer can understand directly. • Provide the most commonly needed primitive operations. • Programs written by high level languages require system software (language implementation systems) to translate them into corresponding machine language versions. 1-55 Operating Systems • Supply Higher-level primitives than those of the machine language. • These primitives provide – – – – – system resource management input and output operations a file management system text and/or program editors a variety of other commonly needed functions 1-56 Language Implementation Systems and an Operating Systems • Because language implementation systems need many of the operating system facilities, they utilize the operating system to do their work rather than develop their own code to interact with the hardware directly. 1-57 Implementation Methods • Compilation – Programs are translated into machine language, which can be executed directly on the computer • Pure Interpretation – Programs are interpreted by another program known as an interpreter • Hybrid Implementation Systems – A compromise between compilers and pure interpreters 1-58 Compilation • Translate high-level program (source language) into machine code (machine language) • Slow translation, fast execution 1-59 Phases of Compilation Process • lexical analysis: gathers the characters of the source program into lexical units. – lexical units: identifiers, special words, operators and punctuation symbols • syntax analysis: transforms lexical units into parse trees which represent the syntactic structure of program • intermediate code generation: translate a source program into an intermediate language one – semantics analysis: check for errors that are difficult if not impossible to detect during syntax analysis, such as type errors. • code generation: machine code is generated 1-60 Optimization • Improve programs (usually in their intermediate code version) by making them smaller or faster or both, is often an optional part of compilation. • Some compilers are incapable of doing any significant optimization. • Optimization may – omit some code in your program – change the execution order of code in your program • P.S.: Sometimes, especially when synchronization between processes is required, the above results may create some bugs in your programs which cannot be detected by just checking the source code. 1-61 Optimization vs. Reliability a: process 2 process 1 memory int a; int foo() { a=1; if(a>0) a=3; else a=-1; return a; } int a; foo() optimization { a=3; return a; } If a is not a volatile variable, the optimization improve performance; otherwise, it introduces race condition problem. 1-62 Symbol Table • The symbol table serves as a database for the compilation process. • The primary contents of the symbol table are – the type and attribute information of each userdefined name in the program. • P.S.: This information is placed in the symbol table by the lexical and syntax analyzers and is used by the semantic analyzer and the code generator. 1-63 The Compilation Process 1-64 User Program Supporting Code • The machine language generated by a compiler can be executed directly on the hardware; however, it must nearly always be run along with some other code. • Most user programs also require functions from the OS. • Among the most common of these are functions for input and output. 1-65 Linking Operation • Before the machine language programs produced by a compiler can be executed, the required functions from the OS must be found and linked to the user program. • The linking operation connects the user program to the system functions by placing the addresses of the entry points of the system functions in the calls to them in the user program. 1-66 Combine a User Program and All Supporting Functions Together address space of a process 0x40ffffff printf: linking compilation main() loading main: main: call add_of_printf call { printf() 0x40ffffff } 1-67 Linking Operation • Load module (executable image): the user and system code together • Linking and loading (linking): the operation of collecting system functions and linking them to user programs – Accomplished by a systems program called a linker 1-68 Libraries • In addition to system functions, user programs must often be linked to previously compiled user functions that reside in libraries. • The linker not only links a given program to system functions, it may also link it to other user functions. 1-69 Von Neumann Bottleneck • Connection speed between a computer’s memory and its processor determines the speed of a computer • Program instructions often can be executed a lot faster than the above connection speed; the connection speed thus results in a bottleneck • Known as von Neumann bottleneck; it is the primary limiting factor in the speed of computers 1-70 Interpreter • Programs are interpreted by another program called an interpreter, with no translation whatever. • The interpreter program acts as a software simulation of a machine whose fetchexecute cycle deals with high-level language program statements rather than machine instructions. • This software simulation obviously provides a virtual machine for the language. 1-71 Advantages of Interpretation • Allowing easy implementation of many source-level debugging operations, because all run-time error messages can refer to source-level unit. – For example, if an array is found to be out of rang, the error message can easily indicate the source line and the name of the array. 1-72 Disadvantages of Interpretation (1) • Slower execution (10 to 100 times slower than compiled programs) – The decoding of the high-level language statements are far more complex than machine language instruction. – Regardless of how many times a statement is executed, it must be decoded every time. – Therefore, statement decoding, rather than the connection between the processor and memory, is the bottleneck of a pure interpreter. 1-73 Disadvantages of Interpretation (2) • Often requires more space. – In addition to the source program, the symbol table must be present during interpretation – The source program may be stored in a form designed for easy access and modification rather than one that provides for minimal size 1-74 Popularity of Interpretation • Some simple early languages of the 1960s (APL, SNOBOL, and LISP) were purely interpreted. • By the 1980s, the approach was rarely used on high-level languages. • In recent years, pure interpretation has made a significant comeback with some Web scripting languages, such as JavaScript and PHP, which are now widely used. 1-75 Pure Interpretation Process 1-76 Hybrid Implementation Systems • A compromise between compilers and pure interpreters • A high-level language program is translated to an intermediate language that allows easy interpretation • Faster than pure interpretation 1-77 Example (1) • Perl programs – are partially compiled to detect errors before interpretation to simplify the interpreter. 1-78 Example (2) • Initial implementations of Java – initial implementations of Java were all hybrid – its intermediate form, byte code, provides portability to any machine that has a byte code interpreter and the Java class library. – There are now systems that translate Java byte code into machine code for faster execution. 1-79 Java Bytecode Example [wikipedia] javac translated by a Java compiler Java code (*.java) Java bytecode (*.class) 1-80 Java Virtual Machine [Wikipedia] • A Java virtual machine (JVM) [Wikipedia][zhebel] is an abstract computing machine. – p.s.: computing machine ≡ computer • There are three notions of the JVM: – specification, – implementation, – and instance. 1-81 Java Virtual Machine Specification [Wikipedia] • The specification is a book that formally describes what is required of a JVM implementation. • Having a single specification ensures all implementations are consistent. 1-82 Java Virtual Machine Implementation [Wikipedia] • A JVM implementation is a computer program that implements requirements of the JVM specification. 1-83 Java Virtual Machine Instance[Wikipedia] • An instance of the JVM is a process that executes a computer program compiled into Java bytecode. 1-84 Java Runtime Environment [Wikipedia] • The Oracle Corporation owns the Java trademark. • Oracle distributes the Java Virtual Machine implementation HotSpot together with an implementation of the Java Class Library. • The JVM and the Java class library are named Java Runtime Environment (JRE). 1-85 The java Command [oracle] • The java command starts a Java application. – It does this by starting a Java runtime environment, loading a specified class, and calling that class's main method. 1-86 Java Class Library[Wikipedia] • The Java Class Library (JCL) is a set of dynamically loadable libraries that Java applications can call at run time. • Because the Java Platform is not dependent on a specific operating system, applications cannot rely on any of the platform-native libraries. • Instead, the Java Platform provides a comprehensive set of standard class libraries, containing the functions common to modern operating systems. 1-87 Hybrid Implementation Process 1-88 Just-in-Time (JIT) Implementation Systems • Initially translate programs to an intermediate language • Then during execution, it compiles intermediate language methods into machine code when they are called • Machine code version is kept for subsequent calls • JIT systems are widely used for Java programs • .NET languages are implemented with a JIT system 1-89 Preprocessors • A preprocessor is a program that processes a program immediately before the program is compiled. 1-90 Preprocessor Instructions • Preprocessor instructions are embedded in programs. • Preprocessor instructions are commonly used to specify that code from another file is to be included. – For example, the following C preprocessor instruction #include myLib.c, causes the preprocessor to copy the contents of myLib.c into the program at the position of the #include myLib.c. 1-91 More Preprocessor instructions • Other preprocessor instructions are used to define symbols to represent expressions. – For example, one could use #define max(A, B) ((A) > (B) ? (A): (B)) to determine the largest of two given expressions. 1-92 Programming Environments • The collection of tools used in software development • This collection may consist of only – – – – a a a a file system text editor linker compiler • Or a programming environment may include a large collection of integrated tools, each accessed through a uniform user interface. 1-93 Programming Environment Examples • UNIX – Provides a wide array of powerful support tools for software production and maintenance in a variety of languages. – Nowadays often used through a GUI (e.g., CDE, KDE, or GNOME) that run on top of UNIX • Borland JBuilder – An integrated development environment for Java • Microsoft Visual Studio.NET – A large and elaborate collection of software development tools, all used through a windowed interface. – Used to program in C#, Visual BASIC.NET, Jscript, J#, or C++ 1-94 Summary • The study of programming languages is valuable for a number of reasons: – Increase our capacity to use different constructs – Enable us to choose languages more intelligently – Makes learning new languages easier • Most important criteria for evaluating programming languages include: – Readability, writability, reliability, cost • Major influences on language design have been machine architecture and software development methodologies • The major methods of implementing programming languages are: – compilation, – pure interpretation, – hybrid implementation 1-95