Programming languages lecture #1

advertisement
Programming languages
The evolution and development of programming languages is really the
process of making computing “convenient” and accessible to a broader
category of users.
As we have already discussed, the first electronic computers were monstrous
contraptions, filling several rooms, costing millions of dollars (but with the
computing power of modern hand-held calculators)
From Machine language to Higher level languages:
Initially programmers time was considerably cheaper than computing time,
and programs were developed in “machine language”. Machine language
is the native language of a computer. It is the notation to which the
computer responds directly, and consists of a series of bits that directly
control a processor, causing it to add, compare, and move data from one
location to another… This is an enormously tedious task.
(Machine code typically a series of binary or hexadecimal codes: ) BUT this
is what a machine (or family of machines) can directly interpret and execute.
00000010101111001010 (VERY difficult to write & debug),
and is largely intelligible to humans.
AS computing hardware advanced, and people began wishing to write larger
programs, it quickly became apparent that a less error prone notation was
required. Programming languages are designed to be both higher level and
general purpose. Higher level: independent of the underlying machine
architecture. General Purpose: can be applied to a wide range of
problems.
The first steps in the development of Programming languages was the
development of Assembly languages which used names and symbols to
represent the actual codes for machine operations, values, and storage
locations, making instructions more readable. Assembly language was
specific to a particular machine.
beq
a0, zero ,D
Still heavily tied to a specific architecture, and cryptic. Very low level form
of programming, yet very efficient. one-to-one correspondence between
mnemonics and machine instructions.
Translating assembly language into actual machine language became the
responsibility of an “ASSEMBLER”
Assembly
Language
Program
DIAGRAM
Assembler
machine Language
This necessitated rewriting programs for every new machine.
GOAL: To develop machine independent language, in which one could
express numerical computations in something which more closely resembled
mathematical formulae….
In 1957 the original version of Fortran was developed, soon followed by
Lisp and Algol.
1954-57 the inventors of Fortran created the first successful HLL.
(arguably the first, although the idea of creating a high level language which
is compiled into object code wasn’t new…)
Source
language
program
COMPILE
R
Assembly or
machine
language
Compilers are substantially more complicated that assemblers because the
one-to-one correspondence between source and target operations no longer
exists. An individual instruction in a high level language can be translated
into many assembler language/machine language instructions.
The move to higher level languages was strongly influenced by:
1) More readable, familiar notations: Formulas could be expressed in
notations using traditional mathematical symbols!!!
2) Machine independence: We could write compilers which were
specific to a hardware platform, while allowing the language to be
machine independent
3) Availability of program libraries: Libraries of commonly used
functions: sin, cosin…, could be created, tested, and distributed with
compilers, easing the work of programming.
4) Consistency, syntax checking which can detect some types of errors
before execution.
There was significant initial resistance to high level languages.
1) programmers could at first write assembler that was more efficient
and could run faster than what a compiler could produce…
2) Early compilers were expensive, some buggy, and not standardized.
Different vendors may implement their own language extensions!!!!!
Over time, compilers became more efficient and now there are many 100’s
of programming languages, why?
1: Evolution: Computer science is still a young discipline… 60-70’s
structured programming approach, where goto based control flow gave way
to while loops and case statements.. late 80’s nested block control structure
of languages (algol, ada, pascal) gave way to the object oriented structure of
languages such as C++ and smalltalk, which encapsulate both data and
operations into the same programming construct.
2. Special Purposes: many languages were designed for a specific problem
domain. Fortran was designed for numeric/scientific calculations. Ada
was designed for embedded programming.
3. Personal Preference: different people like different things….. matter of
taste… some people love the terseness and flexibility of C while others
hate it.. some think naturally recursively others prefer iteration…..
but some languages are more successful that others, and the reasons why
vary:
1) Expressive Power: In a technical sense all languages are equivalent.
each can be used if “awkwardly” to write anything written in the others.
Still some language features clearly have a huge impact on the
programmers ability to write clear concise maintainable code, especially
for large systems…
2) Ease of Use for Novices: Each language has its own learning curve.
(Basic is typically assumed to have a low learning curve, while Ada and
C have a high learning curve.)
3) Ease of Implementation: Ease with which it can be implemented on
different machines… (Pascal: Niklaus Wirth developed a simple,
portable implementation of the language and shipped it free to
universities all over the world.)
4) Excellent compilers: some languages are successful because they have
compilers and supporting tools that do an unusually good job of helping
the programmer manage very large projects.
5) Economics, patronage, and inertia: COBOL & PL/1 owe their life to
IBM, Ada- US Dept of Defense, Some remain long after “better”
languages … because of a huge base of installed software and
programmer expertise which would cost too much to replace……
Programming Language Families:
Existing programming languages can be classified into families based on
their model of computation
Imperative: are action oriented languages: Pascal, C, Pl/1, Fortran.. The
focus is on How the computer should perform its task… Computation is
viewed as a sequence of actions. Instructions are viewed as performing
actions on data stored in memory!! A program is a series of steps each of
which performs a calculation, retrieves input, or produces output. These
languages encapsulate: procedural abstraction, assignments, loops,
sequences, and conditional statements.
Functional Programming: computational model based on the recursive
definition of functions. (originated with LISP.) A program is considered a
function from inputs to outputs, defined in terms of simpler functions
through a process of refinement. A program is a collection of mathematical
functions each with an input( domain) and a result ( range). Functions
interact and combine with each other using functional composition,
conditionals, and recursion: Lisp, Scheme…
Object oriented: relatively recent and can trace their roots to Simula 67..
Closely related to imperative languages… they have much more structure
and a distributed model of both memory and computation. Rather that
picture computation as the operation of a monolithic processor on a
monolithic memory, OOL picture it as interactions among semi-independent
objects each of which has both its own internal state and executable
functions to manage that state. A program is viewed as a collection of
objects that interact with one another by passing messages that transform an
objects state. Object modeling, classification, inheritance, and information
hiding are fundamental building blocks for OO languages: Ada 95, C++,
Java
Logic programming: (constraint Based Programming)… Inspiration from
propositional logic… computation as an attempt to find values that satisfy
certain specified relationships using goal directed search through a list of
logical rules….. Attempts to use logical reasoning to answer queries. A
program is a collection of logical declarations about what outcome a
function should accomplish rather than how that outcome should be
accomplished. Execution of the program applies these declarations to
achieve a series of possible solutions to a problem. Prolog
Compilation vs Interpretation
There are 2 basic approaches to implementing a program in a higher-level
language:
1) The language is brought down or converted to the level of the
machine using a translator called a compiler.
2) The Machine is brought up to the level of the language, building a
higher level machine (virtual machine) which can run the language
directly: interpreter
Compilation
At the highest level of abstraction, the compilation and execution of a pgm
looks like:
Source
pgm
input
compiler
target program
target program
output
Where the compiler translates the program into a equivalent target program ,
typically in a machine or assembly language and then goes away… some
arbitrary time later the user tells the operating system to run the target
program…. It is the target program which is executed, not the source
program!!!!!
1) compiler is the focus of control during the compilation
2) Target program is the focus of control during execution.
The compiler itself is a machine language program, written in some
language….when written to a file in a format understood by the operating
system, machine language is commonly known as object code.
Alternative is Interpretion
Source
Interpreter
output
Input
Unlike a compiler, an Interpreter stays around during execution, and is the
focus of control during the execution… Interpreters implement a virtual
machine, whose machine language is the high-level programming language,
the interpreter reads statements one at a time, verifying and executing them
as it goes along.
Comparing the two:
A static property of a program is a property that is evident from the program
text. A dynamic property is evident only upon running that program.
Compilers are biased toward static properties, while interpreters are biased
toward dynamic properties.
1) Greater flexibility and diagnostics (error messages) – code is being
executed directly and the Interpreters. can include an excellent source
level debugger.
2) Compilation leads to better performance in general./
Although conceptually the difference is clear, many language
implementations include a mixture of both:
Source pgm
Translator
Intermediate Pgm
intermediate Pgm
Virtual machine
output
Input
A language is interpreted when the initial translator is simple… if it is
“complex” the language is compiled….( if the translator analyzes the source
code thoroughly and the intermediate program doesn’t bear a strong
resemblance to the source). Large spectrum of implementation strategies:
 Most Interpreted languages employ an initial translator (preprocessor)
that removes comments and white space, and groups characters together
into tokens such as keywords.. identifiers, numbers and symbols… may
also expand abbreviations…may identify higher level syntactic structures
such as loops and subroutines. GOAL is for the intermediate form to
mirror the structure of the source, but in a form that can be interpreted
more efficiently.
 The typical fortran implementation comes close to pure compilation:
compiler translates Fortran into machine language. programs are also
linked to libraries of subroutines, which are not part of the source
program, but provided by the “compiler” to implement common
mathematical (string manipulation) functions: sin cos log…. and I/O.
diagram
Fortran compiler
Compiler
Incomplete ML
Incomplete ML
LINKER
Mach. Lang P
Library routines
 many compilers generate assembly language instead of machine
language… Facilitates debugging since assembly language is easier for
people to read, and isolates the compiler from changes in the format of
machine language files.. that may be mandated by new releases of the
operating system (only the assembler must change… and can be shared
by many compilers )
diagram
source
assembly
language
compiler
Assembler
assembly
language
Mach. Lang
 Compilers for C and many other languages running on UNIX begin with
a preprocessor that removes comments, and expands macros.. #include
the preprocessor can also be asked to delete portions of the code
providing conditional compilation: (discussed in section 8.8 of your C
text.) #if, #ifdef, #ifndef
Allows several versions of the pgm to be created from the same source…
(eliminate or change platform dependent code)
diagram
Source
preprocessor
modified source
Modified
source
compiler
assemblylang
 C++ compilers based on the early At&T compiler actually generate an
intermediate program in C instead of assembly language.
diagram
source
modified
source
ccode
preprocessor
modified
source
C++ Compiler
c
compiler
C code
Assembly
Language
 The C++ compiler is a true compiler.. performs a complete analysis of
the syntax and semantics of the C++ source program, and with very few
exceptions generates all of the error messages that a programmer will see
prior to running the program. Many programmers are generally unaware
that the C compiler is being used behind the scenes… The C++ compiler
doesn’t invoke the C compiler unless it can generate C code that should
pass through the second round of compilation without producing any
error messages….
These examples illustrate.. (and are not a definitive set) of the different
variations of a compiler.
A difference between compl & intrepre:
Overview of compilation
Compilers are among the most well studied types of computer programs. In
a typical compiler, compilation proceeds through a series of well defined
phases. Each phase discovers information of use to later phases.. or
transforms the program into a form that is more useful to the subsequent
phase…. In general, the phases of a compiler are:
Input/output
character
stream
Phase
Scanner (Lexical
analysis), breaks
character steam
into tokens
token stream
parser (syntax analysis), determines
if tokens occur in the correct order
according to the languages syntax
(grammar)
parse tree
Semantic analysis
and intermediate
code generation
Abstract syntax tree
or other intermediate
form
Modified Intermediate
Form
Machine
independent code
improvement
(optional)
Target code
generation
Assembly/machine
language or other target
language
Optional Machine specific
code improvement
Modified target language
Symbol Table: list of
symbols (variable names)
which occur, and where
they will be stored in
memory.
the first few phases (to
semantic analysis) serve
to figure out the
meaning of the
program..(It is called the
front end).
the last few phases
construct the target
program and are called
the backend..
Compilation can be
described as a series of
passes.. where a pass is a
phase or set of phases
that is serialized with
respect to the rest of
compilation: It doesn't’
start until previous
phases have completed,
and it finishes before any
subsequent phases start.
In the past a pass may
have been written as a
separate program which
read input from a file
and wrote output to a
file.
A brief discussion of the purpose of the Phases of the compiler can be
found in a handout in the metal shelves beside my door. The pages in the
handout, and some of the information found in this lecture came from the
text book:
“Programming language semantics”, by Michael L Scott, chapter 1
From this lecture and the handout you should be able to discuss each of
the “review questions” marked with an asterisk (*) found at the end of
the handout.
Download