For Java (and most other langauges we will cover), you are given a ticket. We can’t possibly cover everything you need to know, but we have allowed you admitance to the “show”. You know the debugger. You can run hello world. You understand key differences. You know where the documenation is. What you do with your admission ticket is up to you! Chapter 1 Louden, Programming Languages 1 Chapter 1 - Introduction Programming Languages: Principles and Practice, 2nd Ed. © Kenneth C. Louden, 2003 Adapted by Vicki Allan 2006 2 Course Motivation Why are programming languages the way they are? How are particular language features implemented/supported? Chapter 1 Louden, Programming Languages 3 Course Motivation cont… understand the underlying ideas of the main programming paradigms know more about the huge variety of programming languages understand how the syntax and semantics of languages can be defined precisely. know how important features are supported have a deeper understanding of the history and rationale behind languages like C++. Chapter 1 Louden, Programming Languages 4 Relationship between thought and language. The Sapir-Whorf hypothesis in linguistics states that the structure of one's mothertongue influences the way one's mind perceives the world. It has found at best very limited experimental support, at least in its strong form. One study has shown that subjects in memory tests are more likely to remember a given color if their mother language includes a word for that color. Example – if you had no identified concept of recursion, how would that affect the ability to reason about it? Chapter 1 Louden, Programming Languages 5 Why study programming languages? Increased capacity to express ideas improved background for choosing language increased ability to learn new languages Better understanding of significance of implementation Ability to design new languages - or user interface Chapter 1 Louden, Programming Languages 6 Example Beginning students – always wanted to know specific answers:Can I do X? What happens if I do Y? Often hadn’t tried the specific test, but could reason about it from general knowledge of implementation. Ex: What happens if I try to return a reference to a local variable? Chapter 1 Louden, Programming Languages 7 The human-computer semantic gap Human: Interested in modelling the real world More interested in what computer should do than how Computer: Only data it can manipulate is sequences of zeros and ones. Understands low-level “how” instructions. Chapter 1 Louden, Programming Languages 8 What are programming languages… High-level languages bridge the human-computer semantic gap by providing a higher level notation that can still be executed by computer Chapter 1 Louden, Programming Languages 9 What is a Programming Language? Definition: A programming language is a notational system for describing computation in machine-readable and human-readable form. Chapter 1 Louden, Programming Languages 10 Computation: Described by a Turing Machine - a very simple computer that can carry out all known computations (albeit not very efficiently). A programming language is Turing complete if it can be used to describe any computation performed by a Turing Machine. Chapter 1 Louden, Programming Languages 11 Turing Machine – 1936 Alan Turing based on the idea of a person executing a well-defined procedure by changing the contents of an unlimited paper tape, divided into squares that can contain one of a finite set of symbols. "If your state is 42 and the symbol you see is a '0' then replace this with a '1', move one symbol to the right, and assume state 17 as your new state." A Turing machine is equivalent to a pushdown automaton made more powerful by relaxing the last-in-first-out requirement of its stack. More precisely, a Turing machine consists of: – – – – – Chapter 1 A tape which is divided into cells, one next to the other. Each cell contains a symbol from some finite alphabet. The alphabet contains a special blank symbol (here written as '0') and one or more other symbols. The tape is assumed to be arbitrarily extendible to the left and to the right, i.e., the Turing machine is always supplied with as much tape as it needs for its computation. Cells that have not been written to before are assumed to be filled with the blank symbol. A head that can read and write symbols on the tape and move left and right. A state register that stores the state of the Turing machine. An action table (or transition function) that tells the machine what symbol to write, how to move the head ('L' for one step left, and 'R' for one step right) and what its new state will be, given the symbol it has just read on the tape and the state it is currently in. If there is no entry in the table for the current combination of symbol and state then the machine will halt. Note that every part of the machine is finite; it is the potentially unlimited amount of tape that gives it an unbounded amount of storage space. Louden, Programming Languages 12 What is needed for Turing completeness? Virtually nothing: – A programming language is Turing complete provided it has • integer variables and • arithmetic and • sequentially executes statements, which include assignment, selection (if) and loop (while) statements. Even if statements are unnecessary Chapter 1 Louden, Programming Languages 13 Machine-readability: Also not a huge requirement: Basically, the existence of a (more or less) linear-time translation algorithm. Usually boils down to: The syntax must be given by a context-free grammar. Chapter 1 Louden, Programming Languages 14 Human-readability: This is the real issue! Virtually all the complex details of a programming language are there to (supposedly) enhance human readability. Still not very well understood. Is strongly dependent on good choice of abstractions. Chapter 1 Louden, Programming Languages 15 What about human “writability??” Aren’t programming languages there to promote the writing of programs, not the reading of them? Nonsense! Writability is a hacker’s goal: Perl is very writable, but try to read it! Readability is the real goal: many people are going to have to read your program after you have written it. Chapter 1 Louden, Programming Languages 16 Abstractions: S im p le S tru c tu re d U n it D a ta in t, char C o n tro l g o to , = Chapter 1 c la s s , s tru c t if { } e ls e { }, w h ile { }, p ro c e d u re Louden, Programming Languages file , package, AP I, AD T file , package, AP I, AD T 17 Computational Paradigms Programming languages began by imitating the operations of a computer. It is not surprising that the kind of computer for which they were written had significant effect on their design. – variables representing memory – assignment to change values – sequential execution of statements Chapter 1 Louden, Programming Languages 18 Language Paradigms: Imperative (procedural): traditional sequential programming (passive data, active control). Characterized by variables, assignment, and loops. Object-oriented: data-centric, data controls its own use, action by request to data objects. Characterized by messages, instance variables, and protection. Extension of imperative paradigm. Functional: passive data, but no sequential control; all action by function evaluation (“call”), particularly recursion. No local variables! ~ to mathematics Chapter 1 Louden, Programming Languages 19 Language Paradigms (cont.): Logic: Assertions are the basic data; logic inference the basic control. Again, no sequential operation. ~ to mathematics ex: I am your sister if I am female and we have common parents. Parallel: well, maybe not really a paradigm, but some think so. Again, no sequential operation. “Declarative”: Logic and functional paradigms share this property: state “what” needs computing, not “how” (sequence). Chapter 1 Louden, Programming Languages 20 Languages and paradigms Imperative: C, Pascal, core Ada, FORTRAN Functional: Lisp (Scheme), ML, Haskell Object-oriented: C++, Java, Smalltalk Logic: Prolog Parallel: Java (threads), Ada (tasks) Chapter 1 Louden, Programming Languages 21 Perl The overall structure of Perl derives broadly from the programming language C. Perl is a procedural programming language, with variables, expressions, assignment statements, brace-delimited code blocks, control structures, and subroutines. Like the Unix shells, Perl has many built-in functions for common tasks, like sorting, and for accessing system facilities. Perl takes lists from Lisp, associative arrays from awk, and regular expressions from sed. These simplify and facilitate all manner of parsing, text handling, and data management tasks. Perl has many and varied applications. It has been used since the early days of the Web to write CGI scripts, and is an integral component of the popular LAMP (Linux / Apache / MySQL / (Perl / PHP / Python)) platform for web development. Perl is often used as a "glue language", tying together systems and interfaces that were not specifically designed to interoperate. Chapter 1 Louden, Programming Languages 22 Are functional languages Turingcomplete? Previous theorem on Turing-completeness depends on the existence of variables and loops. Functional programs do not have variables or loops. Can all computation be expressed? Yes!: – A programming language is Turing complete if it has integer values, arithmetic functions on those values, and if it has a mechanism for defining new functions using existing functions, selection, and recursion. Chapter 1 Louden, Programming Languages 23 Paradigm use is rarely “pure”: The C program (in text)defined gcd function in a purely functional style, even though C is mainly imperative. The Java program used some imperative code to compute the gcd, and was not completely objectoriented (integers aren’t objects). The Scheme code used sequencing to do I/O, an imperative feature. Chapter 1 Louden, Programming Languages 24 Examples of languages that are pure (mostly): Imperative: (old) FORTRAN Functional: Haskell Object-oriented: Smalltalk Chapter 1 Louden, Programming Languages 25 Language definition Syntax: the structure of a program. Usually given a formal (i.e., mathematical) definition using a context-free language. (Lexical structure - the structure of the words or tokens - uses regular expressions.) Semantics: the actual result of execution. Usually described in English, but can be done mathematically. Semantics can have a static component: type checking, definition checking, other consistency checks prior to execution. What are dynamic components of semantics? Chapter 1 Louden, Programming Languages 26 Language translation source compiler executable inputs run outputs Compiler: two-step process that (1) translates source code into target code; then (2) the user executes the target code. Chapter 1 Louden, Programming Languages 27 Language translation source inputs run outputs Interpreter: one-step process in which the source code is executed directly. Hybrids are also possible (Java). inputs source Java compiler bytecode machine dependent interpreter outputs Bytecode is composed of instructions that have been brought to the lowest level possible without making them machine dependent. Chapter 1 Louden, Programming Languages 28 Compilation, Interpretation, and Hybrid systems Consider this piece of code: public class Test { public static void main(String args[]) { int i; i = 2; i = i + 7; } } Chapter 1 Louden, Programming Languages 29 If we were to compile it, we would change it to machine instructions that would only work for one architecture. If we were to interpret it, our interpreter would have to be able to understand the high level code AND would repeatedly parse it (if the code was in a loop). When we use the hybrid approach of Java, we produce the file Test.class, which is a binary file that's not readable by most humans. We can convert the file to a readable form with the javap tool as shown here: Chapter 1 Louden, Programming Languages 30 C:\ > javap -c Test Compiled from Test.java public class Test extends java.lang.Object { public Test(); // a default constructor created public static void main(java.lang.String[]); } Method Test() 0 aload_0 1 invokespecial #3 4 return Method void main(java.lang.String[]) 0 iconst_2 // Put integer 2 on stack 1 istore_1 // Store the top stack value at location 1 2 iload_1 // Put the value at location 1 on stack 3 bipush 7 // Put the value 7 on the stack 5 iadd // Add two top stack values together 6 istore_1 // The sum, on top of stack, stored at location 1 7 return // Finished processing Chapter 1 Louden, Programming Languages 31 Although the bytecode cannot access registers or directly reference memory locations and must obey various other restrictions, the actual JVM (java virtual machine) program can use internally whatever techniques are convenient to use for a particular platform. As long as the Java bytecode sees only a JVM specification compliant system, the JVM programmer has broad discretion for its implementation Chapter 1 Louden, Programming Languages 32 Language Implementation Methods Compilation: lexical analysis: characters grouped into logical chunks (keywords, constants, etc) syntax analysis: figure out what it means usually use parse trees (grammars to define). Like diagraming sentences. small dogs and cats – what is meant? optimization - to make smaller or faster linking: supplying missing addresses to system code load module: user code augmented with system code Chapter 1 Louden, Programming Languages 33 The tall boy ran fast. Chapter 1 Louden, Programming Languages 34 "Maria gave Joe the rice" Chapter 1 Louden, Programming Languages 35 Language Implementation Methods (cont) Pure Interpretation: no translation phase - fetch, decode, and execute the source code (not a machine code) Advantages/Disadvantages 1. easy to write debugger - as source lines are unchanged 2. execution is 10-100 times slower; statement decoding is bottleneck 3. better for simple structure - as not so slow to decode 4. natural for some kinds of features - like dynamic binding of type. Ex: a = a+b If a may be integer, string, or a set, how can we know what code to generate? Chapter 1 Louden, Programming Languages 36 What is meant by dynamic binding? Girls choice dance: – Will you go with Sofie? (early binding) – Will you go with Sofie/Ann/Betty (whoever shows up at your door)? (delayed binding) – No specific partner assigned, but will change throughout the night. (changing binding) Chapter 1 Louden, Programming Languages 37 Hybrid Implementation System Compile into intermediate code. Java - compiles to bytecode. Then bytecode is interpreted. JVM (Java Virtual Machine) is byte code interpreted and run time system. Java is delivered partially compiled. Developers compile it into byte codes, downloaded across the network. Then those byte codes are interpreted by the browser. Architecture-neutral: Sun designed the language so that it is also partially interpreted. Creating Bytecode does about 80% of the compilation work. However, one set of Bytecode can run on any Java-enabled computer. The last 20% is performed at runtime by the Java environment provided by the machine specific browser. Eliminates the version mismatch problems: All external program references are resolved when the application is executed. Chapter 1 Louden, Programming Languages 38 Compilation Steps Figure out the full name of the class to be invoked Determine which method signature to use If there is more than one matching signature, the one that is most specific is chosen. Example: doit(Object o) or doit(ColoredPoint p) or doit (Point p) A method is applicable if – the number of parameters matches – the type of each actual argument can be converted to the type of the corresponding parameter. A method is accessible if the access modifier allows access. Chapter 1 Louden, Programming Languages 39 Error classification Lexical: character-level error, such as illegal character (hard to distinguish from syntax). Syntax: error in structure (e.g., missing semicolon or keyword). Static semantic: non-syntax error prior to execution (e.g., undefined vars, type errors). Dynamic semantic: non-syntax error during execution (e.g., division by 0). Logic: programmer error, program not at fault. Chapter 1 Louden, Programming Languages 40 Notes on error reporting A compiler will report lexical, syntax, and static semantic errors. It cannot report dynamic semantic errors. An interpreter will often only report lexical and syntax errors when loading the program. Static semantic errors may not be reported until just prior to execution. Indeed, most interpreted languages (e.g. Lisp, Smalltalk) do not define any static semantic errors. No translator can report a logic error. Chapter 1 Louden, Programming Languages 41 Sample Errors (Java): public int gcd ( int v# ) // lexical { int z = value // syntax - missing ; y = v; // static semantic - y undefined while ( y >= 0 ) // dynamic semantic // division by zero { int t = y; y = z % y; z = t; } return y; // logic - should return z } Chapter 1 Louden, Programming Languages 42 Language design Good, consistent set of abstractions. Tie-in to other technology: – C : Unix – Java : Internet – C++ : most efficient OO language Now also: – Ease of interface with other languages and systems – Good libraries Chapter 1 Louden, Programming Languages 43