Programming languages The evolution and development of programming languages is really the process of making computing “convenient” and accessible to a broader category of users. As we have already discussed, the first electronic computers were monstrous contraptions, filling several rooms, costing millions of dollars (but with the computing power of modern hand-held calculators) From Machine language to Higher level languages: Initially programmers time was considerably cheaper than computing time, and programs were developed in “machine language”. Machine language is the native language of a computer. It is the notation to which the computer responds directly, and consists of a series of bits that directly control a processor, causing it to add, compare, and move data from one location to another… This is an enormously tedious task. (Machine code typically a series of binary or hexadecimal codes: ) BUT this is what a machine (or family of machines) can directly interpret and execute. 00000010101111001010 (VERY difficult to write & debug), and is largely intelligible to humans. AS computing hardware advanced, and people began wishing to write larger programs, it quickly became apparent that a less error prone notation was required. Programming languages are designed to be both higher level and general purpose. Higher level: independent of the underlying machine architecture. General Purpose: can be applied to a wide range of problems. The first steps in the development of Programming languages was the development of Assembly languages which used names and symbols to represent the actual codes for machine operations, values, and storage locations, making instructions more readable. Assembly language was specific to a particular machine. beq a0, zero ,D Still heavily tied to a specific architecture, and cryptic. Very low level form of programming, yet very efficient. one-to-one correspondence between mnemonics and machine instructions. Translating assembly language into actual machine language became the responsibility of an “ASSEMBLER” Assembly Language Program DIAGRAM Assembler machine Language This necessitated rewriting programs for every new machine. GOAL: To develop machine independent language, in which one could express numerical computations in something which more closely resembled mathematical formulae…. In 1957 the original version of Fortran was developed, soon followed by Lisp and Algol. 1954-57 the inventors of Fortran created the first successful HLL. (arguably the first, although the idea of creating a high level language which is compiled into object code wasn’t new…) Source language program COMPILE R Assembly or machine language Compilers are substantially more complicated that assemblers because the one-to-one correspondence between source and target operations no longer exists. An individual instruction in a high level language can be translated into many assembler language/machine language instructions. The move to higher level languages was strongly influenced by: 1) More readable, familiar notations: Formulas could be expressed in notations using traditional mathematical symbols!!! 2) Machine independence: We could write compilers which were specific to a hardware platform, while allowing the language to be machine independent 3) Availability of program libraries: Libraries of commonly used functions: sin, cosin…, could be created, tested, and distributed with compilers, easing the work of programming. 4) Consistency, syntax checking which can detect some types of errors before execution. There was significant initial resistance to high level languages. 1) programmers could at first write assembler that was more efficient and could run faster than what a compiler could produce… 2) Early compilers were expensive, some buggy, and not standardized. Different vendors may implement their own language extensions!!!!! Over time, compilers became more efficient and now there are many 100’s of programming languages, why? 1: Evolution: Computer science is still a young discipline… 60-70’s structured programming approach, where goto based control flow gave way to while loops and case statements.. late 80’s nested block control structure of languages (algol, ada, pascal) gave way to the object oriented structure of languages such as C++ and smalltalk, which encapsulate both data and operations into the same programming construct. 2. Special Purposes: many languages were designed for a specific problem domain. Fortran was designed for numeric/scientific calculations. Ada was designed for embedded programming. 3. Personal Preference: different people like different things….. matter of taste… some people love the terseness and flexibility of C while others hate it.. some think naturally recursively others prefer iteration….. but some languages are more successful that others, and the reasons why vary: 1) Expressive Power: In a technical sense all languages are equivalent. each can be used if “awkwardly” to write anything written in the others. Still some language features clearly have a huge impact on the programmers ability to write clear concise maintainable code, especially for large systems… 2) Ease of Use for Novices: Each language has its own learning curve. (Basic is typically assumed to have a low learning curve, while Ada and C have a high learning curve.) 3) Ease of Implementation: Ease with which it can be implemented on different machines… (Pascal: Niklaus Wirth developed a simple, portable implementation of the language and shipped it free to universities all over the world.) 4) Excellent compilers: some languages are successful because they have compilers and supporting tools that do an unusually good job of helping the programmer manage very large projects. 5) Economics, patronage, and inertia: COBOL & PL/1 owe their life to IBM, Ada- US Dept of Defense, Some remain long after “better” languages … because of a huge base of installed software and programmer expertise which would cost too much to replace…… Programming Language Families: Existing programming languages can be classified into families based on their model of computation Imperative: are action oriented languages: Pascal, C, Pl/1, Fortran.. The focus is on How the computer should perform its task… Computation is viewed as a sequence of actions. Instructions are viewed as performing actions on data stored in memory!! A program is a series of steps each of which performs a calculation, retrieves input, or produces output. These languages encapsulate: procedural abstraction, assignments, loops, sequences, and conditional statements. Functional Programming: computational model based on the recursive definition of functions. (originated with LISP.) A program is considered a function from inputs to outputs, defined in terms of simpler functions through a process of refinement. A program is a collection of mathematical functions each with an input( domain) and a result ( range). Functions interact and combine with each other using functional composition, conditionals, and recursion: Lisp, Scheme… Object oriented: relatively recent and can trace their roots to Simula 67.. Closely related to imperative languages… they have much more structure and a distributed model of both memory and computation. Rather that picture computation as the operation of a monolithic processor on a monolithic memory, OOL picture it as interactions among semi-independent objects each of which has both its own internal state and executable functions to manage that state. A program is viewed as a collection of objects that interact with one another by passing messages that transform an objects state. Object modeling, classification, inheritance, and information hiding are fundamental building blocks for OO languages: Ada 95, C++, Java Logic programming: (constraint Based Programming)… Inspiration from propositional logic… computation as an attempt to find values that satisfy certain specified relationships using goal directed search through a list of logical rules….. Attempts to use logical reasoning to answer queries. A program is a collection of logical declarations about what outcome a function should accomplish rather than how that outcome should be accomplished. Execution of the program applies these declarations to achieve a series of possible solutions to a problem. Prolog Compilation vs Interpretation There are 2 basic approaches to implementing a program in a higher-level language: 1) The language is brought down or converted to the level of the machine using a translator called a compiler. 2) The Machine is brought up to the level of the language, building a higher level machine (virtual machine) which can run the language directly: interpreter Compilation At the highest level of abstraction, the compilation and execution of a pgm looks like: Source pgm input compiler target program target program output Where the compiler translates the program into a equivalent target program , typically in a machine or assembly language and then goes away… some arbitrary time later the user tells the operating system to run the target program…. It is the target program which is executed, not the source program!!!!! 1) compiler is the focus of control during the compilation 2) Target program is the focus of control during execution. The compiler itself is a machine language program, written in some language….when written to a file in a format understood by the operating system, machine language is commonly known as object code. Alternative is Interpretion Source Interpreter output Input Unlike a compiler, an Interpreter stays around during execution, and is the focus of control during the execution… Interpreters implement a virtual machine, whose machine language is the high-level programming language, the interpreter reads statements one at a time, verifying and executing them as it goes along. Comparing the two: A static property of a program is a property that is evident from the program text. A dynamic property is evident only upon running that program. Compilers are biased toward static properties, while interpreters are biased toward dynamic properties. 1) Greater flexibility and diagnostics (error messages) – code is being executed directly and the Interpreters. can include an excellent source level debugger. 2) Compilation leads to better performance in general./ Although conceptually the difference is clear, many language implementations include a mixture of both: Source pgm Translator Intermediate Pgm intermediate Pgm Virtual machine output Input A language is interpreted when the initial translator is simple… if it is “complex” the language is compiled….( if the translator analyzes the source code thoroughly and the intermediate program doesn’t bear a strong resemblance to the source). Large spectrum of implementation strategies: Most Interpreted languages employ an initial translator (preprocessor) that removes comments and white space, and groups characters together into tokens such as keywords.. identifiers, numbers and symbols… may also expand abbreviations…may identify higher level syntactic structures such as loops and subroutines. GOAL is for the intermediate form to mirror the structure of the source, but in a form that can be interpreted more efficiently. The typical fortran implementation comes close to pure compilation: compiler translates Fortran into machine language. programs are also linked to libraries of subroutines, which are not part of the source program, but provided by the “compiler” to implement common mathematical (string manipulation) functions: sin cos log…. and I/O. diagram Fortran compiler Compiler Incomplete ML Incomplete ML LINKER Mach. Lang P Library routines many compilers generate assembly language instead of machine language… Facilitates debugging since assembly language is easier for people to read, and isolates the compiler from changes in the format of machine language files.. that may be mandated by new releases of the operating system (only the assembler must change… and can be shared by many compilers ) diagram source assembly language compiler Assembler assembly language Mach. Lang Compilers for C and many other languages running on UNIX begin with a preprocessor that removes comments, and expands macros.. #include the preprocessor can also be asked to delete portions of the code providing conditional compilation: (discussed in section 8.8 of your C text.) #if, #ifdef, #ifndef Allows several versions of the pgm to be created from the same source… (eliminate or change platform dependent code) diagram Source preprocessor modified source Modified source compiler assemblylang C++ compilers based on the early At&T compiler actually generate an intermediate program in C instead of assembly language. diagram source modified source ccode preprocessor modified source C++ Compiler c compiler C code Assembly Language The C++ compiler is a true compiler.. performs a complete analysis of the syntax and semantics of the C++ source program, and with very few exceptions generates all of the error messages that a programmer will see prior to running the program. Many programmers are generally unaware that the C compiler is being used behind the scenes… The C++ compiler doesn’t invoke the C compiler unless it can generate C code that should pass through the second round of compilation without producing any error messages…. These examples illustrate.. (and are not a definitive set) of the different variations of a compiler. A difference between compl & intrepre: Overview of compilation Compilers are among the most well studied types of computer programs. In a typical compiler, compilation proceeds through a series of well defined phases. Each phase discovers information of use to later phases.. or transforms the program into a form that is more useful to the subsequent phase…. In general, the phases of a compiler are: Input/output character stream Phase Scanner (Lexical analysis), breaks character steam into tokens token stream parser (syntax analysis), determines if tokens occur in the correct order according to the languages syntax (grammar) parse tree Semantic analysis and intermediate code generation Abstract syntax tree or other intermediate form Modified Intermediate Form Machine independent code improvement (optional) Target code generation Assembly/machine language or other target language Optional Machine specific code improvement Modified target language Symbol Table: list of symbols (variable names) which occur, and where they will be stored in memory. the first few phases (to semantic analysis) serve to figure out the meaning of the program..(It is called the front end). the last few phases construct the target program and are called the backend.. Compilation can be described as a series of passes.. where a pass is a phase or set of phases that is serialized with respect to the rest of compilation: It doesn't’ start until previous phases have completed, and it finishes before any subsequent phases start. In the past a pass may have been written as a separate program which read input from a file and wrote output to a file. A brief discussion of the purpose of the Phases of the compiler can be found in a handout in the metal shelves beside my door. The pages in the handout, and some of the information found in this lecture came from the text book: “Programming language semantics”, by Michael L Scott, chapter 1 From this lecture and the handout you should be able to discuss each of the “review questions” marked with an asterisk (*) found at the end of the handout.