Document 10834098

advertisement
CSE 305 Introduc0on to Programming Languages Lecture 9 – CFG and Pushdown Automata CSE @ SUNY-­‐Buffalo Zhi Yang Courtesy of Professor Yacov Hel-­‐Or Courtesy of Dr. David Reed No0ce Board •  First, June 27, 2013, you will be having midterm exam, which covers lecture 1 to lecture 10. •  Second, on June 20, 2013(Thursday), you will be having the second long quiz. Our objec0ve •  The first objec0ve of our class, is to comprehend a new programming language within very short 5me period, and because you have this ability to shorten your learning curve, you are going to manipulate the language with an insight learning. •  The second objec0ve is to even engineer your own language! Review what we ve learnt and see future eg: Egyp0an Number System; Complement Number eg: Abacus Number System eg: Gate system, Including different underline device 1st Genera0on language: Machine Code eg: MIPS 2nd Genera0on language: Assembly Code eg: Fortran Regular Expression What s next ? 3rd Genera0on Language: Macro func0on Macro func5on Basic Calcula0on System Lexer Compiler System Virtual Machine Parser Push Down Automata Type Checking Context-­‐Free Grammar Lambda Calculus Theory Context-­‐Free Language •  Languages that are generated by context-free grammars are context-free
languages
•  Context-free grammars are more expressive than finite automata: if a language L
is accepted by a finite automata then L can be generated by a context-free
grammar
•  Beware: The converse is NOT true
Context-­‐Free Grammar Definition. A context-free grammar is a 4-tuple (∑, NT, R, S), where:
•  ∑ is an alphabet (each character in ∑ is called terminal)
•  NT is a set (each element in NT is called nonterminal)
•  R, the set of rules, is a subset of NT × (∑ ∪ NT)*
If (α,β) ∈ R, we write production α à β
•  S, the start symbol, is one of the symbols in NT
β  is called a sentential form
Alternate Defini0on: Context-­‐Free Grammars Grammar Variables G = (V , T , S , P )
Terminal symbols Start variables Produc0ons of the form: A →α
α
is string of variables and terminals Courtesy Costas Busch -­‐ RPI 7 Proper0es of PDAs !"#$%"&'%()#*)!+,(
‡ We had two ways to describe regular
languages:
Regular-Expressions
DFA / NFA
syntactic
computational
‡ How about context-free-languages?
CFG
syntactic
PDA
computational
!"#$%&'
CFG = PDA Theorem: A language is context-free iff
some pushdown automaton recognizes it.
Proof:
‡ CFLoPDA: we show that if L is CFL
then a PDA recognizes it.
‡ PDA oCFL: we show that if a PDA
recognizes L then L is CFL.
!"#$%&!'%(#%)*+
From CFG to PDA ‡ Proof idea: Use PDA to simulate leftmost
derivations.
‡ Leftmost derivation : A derivation of a string is
a leftmost derivation if at every step the leftmost
remaining variable is the one replaced.
‡ We use the stack to store the suffix that has not
been derived so far.
‡ Any terminal symbols appearing before the
leftmost variable are matched right away.
Different deriva0ons for the same !"##$%$&'()$%"*+'",&-(#,%('.$(-+/$(
parse tree 0+%-$('%$$
CFG: EoEuE | E+E
Eo0 | 1 | 2 _«_9
1$#'/,-'
)$%"*+'",&
E
E
E
5
E
E
+
3
x
2
%"2.'/,-'
)$%"*+'",&
E Ÿ EuE
Ÿ E EuE
Ÿ 5 EuE
Ÿ 5 3u E
Ÿ 5 3u 2
E Ÿ EuE
Ÿ Eu2
Ÿ E Eu2
Ÿ E 3u 2
Ÿ 5 3u 2
7
/$0%$1#2(!"#3124%0$1"#5
!"#$%"&
input: '()(*(+(,
stack: .
E
!"#$%"&
E Ÿ EuE
input: '()(*(+(,
EuE
stack: +
.
!"#$%"&
E Ÿ EuE
Ÿ E EuE
input: '()(*(+(,
E+EuE
stack: )
+
.
!"#$%"&
E Ÿ EuE
Ÿ E EuE
Ÿ 5 EuE
input: '()(*(+(,
5+EuE
'
stack: )
+
.
!"#$%"&
E Ÿ EuE
Ÿ E EuE
Ÿ 5 EuE
input: '()(*(+(,
EuE
E Ÿ EuE
Ÿ E EuE
Ÿ 5 EuE
Ÿ 5 3u E
stack: +
.
!"#$%"&
input: '()(*(+(,
3uE
stack: *
+
.
E Ÿ EuE
!"#$%"&
Ÿ E EuE
Ÿ 5 EuE
Ÿ 5 3u E
input: '()(*(+(,
stack: .
E
E Ÿ EuE
Ÿ E EuE
Ÿ 5 EuE
Ÿ 5 3u E
Ÿ 5 3u 2
!"#$%"&
input:
'()(*(+(,
2
stack: ,
.
E Ÿ EuE
Ÿ E EuE
Ÿ 5 EuE
Ÿ 5 3u E
Ÿ 5 3u 2
!"#$%"&
input: '()(*(+(,
stack: -
7KHVWULQJ¶'()(*(+(,·LVDFFHSWHG
!"#$%&!'%(#%)*+
From CFG to PDA Informally:
1. Place the marker symbol $ and the start variable
S on the stack.
2. Repeat the following steps:
± If the top of the stack is a variable A:
&KRRVHDUXOH$ĺD1«Dk and substitute A with D1«Dk
± If the top of the stack is a terminal a:
Read next input symbol and compare to a
,IWKH\GRQ¶WPDWFKUHMHFWGLH
± If top of stack is $, go to accept state
!"#$%&!'%(#%)*+
From CFG to PDA ‡ For a given CFG G=(V,6,S,R),
we construct a PDA P=(Q,6,*,G,q0,F) where:
± Q={qstart, qloop, qaccpt}
± * = V‰6‰{$}
± q0=qstart
± F={qaccpt}
From CFG to PDA !"#$%&!'%(#%)*+
‡ We define G as follows (shorthand notation):
±
±
±
±
G(qstart,H,H)={(qloop,S$)}
G(qloop,H,A)={(qloop, D1«Dk) | for each AoD1«Dk in R}
G(qloop,a,a)={(qloop,H) | for each a6 }
G(qloop,H,$)={(qaccpt,H)}
{H,AoD1«Dk | for rules A oD1«Dk}
{a,aoH | for all a6}
,-(."(
H,HoS$
,/##0
H,$oH
,.110(
-."/0123 Example ‡ Construct a PDA for the following CFG G:
!o"#$ %&$&&&&&&&&&&&'()*+&",$
#o#" %&H
H,SoaTb
H,Sob
H,ToTa
H,ToH
a,aoH
b,boH
456"76
H,HoS$
41880
H,$oH
4"9906
!"#$%&'(%)#%*!+
From PDA to CFG ‫܂‬
context-free grammar
pushdown automaton
‡ First, we simplify the PDA:
± It has a single accept state qf
± $ is always popped exactly before accepting
± Each transition is either a push, or a pop, but
not both
From PDA to CFG !"#$%&'(%)#%*!+
‡ single accept state qf:
H,HoH
H,HoH
!"#$%&'(%)#%*!+
From PDA to CFG ‡ $ is always popped exactly before accepting:
{H,AoH | A*, Az$}
H,$oH
!"#$%&'(%)#%*!+
From PDA to CFG ‡ Each transition is either a push, or a pop:
V,aob
V,aoH
H,Hob
V,HoH
z*
V,Hoz H,zoH
!"#$%&'(%)#%*!+
From PDA to CFG ‡ For any word w accepted by a PDA
P=(Q,6,*,G,q0,qf) the process starts at q0 with an
empty stack and ends at qf with an empty stack.
‡ Definition: for any two states p,qQ we define
Lp,q to be the language that if we starts at p with
an empty stack and run on wLp,q we end at q
with an empty stack.
‡ We define for Lp,q a variables Ap,q s.t.
Lp,q = {w | Ap,q Ÿ* w}
!"#$%&'(%)#%*!+
‡ Note, that L(P)=Lq0,qf
‡ For any word w accepted by a PDA
!"#$%&'(%)#%*!+
From PDA to CFG Let P=(Q, 6, *, G, q0, qf) a given PDA.
We construct a CFL G=(V,6,S,R) as follows*:
‡ V = {Ap,q | p,qQ}
‡ S=A q0,qf
‡ R is a set of rules constructed as follows:
* Proof of correctness and further reading at the supplementary
-
From PDA to CFG !"#$%&'(%)#%*!+
‡ Add the following rules to R:
1. For each p,q,r,sQ, t*, and a,b6H,
if (r,t)G(p,a,H) and (q,H)G(s,b,t) add a rule
Apq o aAr,sb
a,Hot
,
"
b,toH
-
,/-0%)
.
,#,%)
2. For each p,q,rQ, add a rule Ap,qo Ap,r Ar,q
,
"
!"#$%&'(%)#%*!+
3. For each pQ, add the rule Ap,poH
1.
‡
Add
the
following
rules
to
R:
./#,0+%
.
,
!"#$%&'(
)*
)*
Example 0,HoA
1,AoH
H,Ho$
#,HoH
H,$oH
)+
0,HoA
H,Ho$
#,Hoz
)+
),
).
)-
1,AoH
H,zoH
H,$oH
n#1n
L(P)=0
!"#$%&'(
),
)-
!"#$%&'( Example 3)
0,HoA
H,Ho$
#,Hoz
34
35
32
)*#+*,-#+.#/&'(,,012
productions:
!"" ȺH
oA !"" Ⱥ!""!"" 1,AoH
!## ȺH
!"" Ⱥ!"#!#"
!$$H,$oH
ȺH
#,Hoz
!"" Ⱥ!H,zoH
"$!$"
12
36
1,AoH
H,zoH
H,$oH
6
!"" 3Ⱥ!
"%!%"
!"$ Ⱥ!""!"$
!"$ Ⱥ!"#!#$
!"$ Ⱥ!"$!$$
&&&
35!%% ȺH
!(( ȺH
!"#$%&'(
!"% Ⱥ!#$
!#$ Ⱥ#'!#$$
!#$ Ⱥ$((
32
How to determine CFG? !"#$%"#&'(&)*+''$,-#./-.'0
‡ So some languages seem to be not CFL.
‡ The question is which?
‡ This can be determined using the pumping
lemma for context-free languages.
!"#$%&'()*+$,#''-$.
The Pumping Lemma -­‐ /-01+23&*4
background ‡ Let L be a CFL and let G be a simple
grammar (no unit/H rules) generating it.
‡ Let wL be a long enough word (we will say
later what is long).
‡ The parsing tree of w contains a long path
from S to some leaf (terminal).
‡ On this long path some variable R must
repeat (remember, w is long).
The P
umping L
emma -­‐
b
ackground
!"#$%&'()*+$,#''-$. /-01+23&*4
S
‡ Divide w into uvxyz
according to the parse
tree, as in the figure.
R
‡ Each occurrence of R
has a subtree under it.
R
u
v
x
y
z
The P
umping L
emma -­‐
b
ackground
!"#$%&'()*+$,#''-$. /-01+23&*4
‡ The upper occurrence of R has a larger subtree
and generates vxy.
‡ The lower occurrence of R has a smaller
subtree and generates only x.
‡ Both subtrees are generated by the same
variable R.
‡ That means if we substitute one for the other we
will still obtain valid parse trees.
S
R
R
u
v
x
y
z
S
S
R
R
R
x
u
v
v
y
x
z
u
z
y
Replacing the smaller by
the larger repeatedly
generates the string
uvixyiz at each i>0.
Replacing the larger by
the smaller generates the
string uxz or uvixyiz
where i=0.
S
S
Therefore, for all it0, wi = uvixyiz is also in L
R
R
!"#$%&'()*+$,#*+-"
The Pumping Length ‡ That means that every CFL has a special
value called the pumping length such that all
strings longer than the pumping length can
be "pumped".
‡ The string can be divided into 5 parts
w=uvxyz.
‡ The second and fourth can be pumped to
produce additional words in L.
‡ for all kt0, wk = uvkxykz can also be
generated by the grammar.
!"#$%&'()*##+(,-.(/0)
Pumping Lemma for CFL Lemma: Let L be a context-free language.
There is a positive integer p (the pumping
length) such that for all strings wL with
|w|tp, w can be divided into five pieces
w=uvxyz satisfying the following conditions:
1. |vy|>0
2. |vxy|dp
3. for each it0, uvixyizL
Proof -­‐ v'()*+%#$%,
alue of p !"##$%&
‡ First we find out the value of p.
‡ Let G be a CFG for CFL L.
‡ Let b be the maximum number of symbols in
the right side of any rule in G.
‡ So we know that in any parse tree of G a node
can't have more than b children.
‡ So if the height of a parsing tree for wL is h
then |w|< bh
Ÿ(h>logb|w|).
-
-oD.D/D0 }Db
}
!"##$%& '()*+%#$%,
D D D
D
.
/
0
1
‡ First we find out the value of p.
!"##$%&
Proof -­‐ v'()*+%#$%,
alue of p ‡ Let |V| be the number of variables in G.
‡ We set p = b|V|+2 .
Ÿ(h>logb|p|).
‡ Then for any string of length p the parse tree
requires height at least |V|+2 (Note, b>1 since
there are no unit rules).
‡ Given a string wL, s.t. |w| t p , since G has
only |V| variables, at least one of the variables
repeats (height |V|+2 Î |V|+1 variables +
terminal).
!"##$%& '()*+%#$%,
‡ W.l.o.g. assume this variable is R
!"##$%²
Proof – c&#'()*)#'%+
ondi0on 1 ‡ To prove condition 1 (|vy|>0) we have to show it
is impossible that both v and y are H.
‡ We use a grammar without unit rules.
‡ But the only way to have v=y=H, is to have a
rule R o R, which is a unit rule. Contradiction.
S
‡ So condition 1 is satisfied.
R
R
u
!"##$%² &#'()*)#'%+
v
x
y
z
Proof –
c
ondi0on 2
!"##$%² &#'()*)#'%+
‡ To prove condition 2 (|vxy|dp) we will check
the height of the subtree rooted in first R =
the subtree that generates vxy.
‡ Its height is at most |V|+2 (R was selected as
a variable that has two occurrences within
the bottom |V|+1 levels of the parsing tree).
‡ So it can generate a string of length at most
b|V|+2.
‡ Since p= b|V|+2, condition 2 is satisfied.
A/::>*²
Proof – c&:(2'+':(*B
ondi0on 3 S
S
R
R
R
u
v
R
v
x
x
y
z
y
!"#$%&'()*+,"*-.%$$"/*01*
+,"*$%/)"/*/"#"%+"2$1*
)"("/%+"-*+,"*-+/'()*
34 '51 '6 %+*"%&,*'789
u
z
!"#$%&'()*+,"*$%/)"/*01*
+,"*-.%$$"/*)"("/%+"-*
+,"*-+/'()* 356** :/** 34 '51 '6**
;,"/"*'<89
=,"/">:/"?*>:/*%$$*'t8?**;'<34'51'6 '-*%$-:*'(*@**
Syntax •  syntax: the form of expressions, statements, and program units in a programming language programmers & implementers need a clear, unambiguous descrip0on formal methods
for describing syntax:
§  Backus-Naur Form (BNF)
developed to describe ALGOL (originally by Backus, updated by Naur)
allowed for clear, concise ALGOL 60 report
(paralleled grammar work by Chomsky: BNF = context-free grammar)
§  Extended BNF (EBNF)
§  syntax graphs
BNF is a meta-­‐language •  a grammar is a collec0on of rules that define a language §  rules can be conditional using | to represent OR
<IF-STMT> à if <LOGIC-EXPR> then <STMT>
| if <LOGIC-EXPR> then <STMT> else <STMT>
–  BNF rules define abstrac0ons in terms of terminal symbols and abstrac0ons <ASSIGN> à <VAR> := <EXPRESSION>
§  arbitrarily long expressions can be defined using recursion
<IDENT-LIST> à <IDENTIFIER>
| <IDENTIFIER> , <IDENT-LIST>
Deriving expressions from a grammar •  from ALGOL 60: <letter>
à a | b | c | ... | z | A | B | ... | Z
<digit>
à 0 | 1 | 2 | ... | 9
<identifier> à <letter>
| <identifier> <letter>
| <identifier> <digit> can derive language elements (i.e., substitute definitions for abstractions):
<identifier> è
è
è
è
è
è
<identifier> <digit>
<identifier> <letter> <digit>
<letter> <letter> <digit>
C <letter> <digit>
CU <digit>
CU1
the above is a leftmost derivation (expand leftmost abstraction first)
Deriva0ons vs. parse trees <identifier>
è
è
è
è
è
è
<identifier> <digit>
<identifier> <letter> <digit>
<letter> <letter> <digit>
C <letter> <digit>
CU <digit>
CU1
a derivation can be represented hierarchically
as a parse tree
<identifier>
<identifier>
<digit>
–  internal nodes are abstractions
<identifier>
<letter>
1
–  leaf nodes are terminal symbols
<letter>
C
U
Ambiguous grammars •  consider a grammar for simple assignments <assign>
<id>
<expr>
à <id> := <expr>
à A | B | C
à <expr> + <expr>
| <expr> * <expr>
| ( <expr> )
| <id>
•  A grammar is ambiguous if there exist sentences with 2 or more dis0nct parse trees e.g., A := A + B * C
<assign>
<id>
A
:=
<expr>
<expr>
+
<assign>
<id>
A
<expr>
:=
<expr>
<expr>
*
<id>
A
<expr>
<id>
<expr>
*
<expr>
<expr>
+
<expr>
<id>
<id>
<id>
<id>
B
C
A
B
C
Ambiguity is bad! •  programmer perspec0ve –  need to know how code will behave •  language implementer s perspec0ve –  need to know how the compiler/interpreter should behave can build concepts such as operator precedence into grammars
§  introduce a hierarchy of rules, lower level à higher precedence
<assign>
<id>
<expr>
<term>
<factor>
à
à
à
à
à
<id> := <expr>
A | B | C
<expr> + <term> | <term>
<term> * <factor> | <factor>
( <expr> ) | <id>
higher precedence operators bind tighter, e.g., A+B*C ≡ A+(B*C)
Operator precedence <assign>
<id>
<expr>
<term>
<factor>
à
à
à
à
à
<id> := <expr>
A | B | C
<expr> + <term> | <term>
<term> * <factor> | <factor>
( <expr> ) | <id>
A := A + B * C
<assign>
Note: because of hierarchy,
<id>
A
:=
<expr>
<expr>
+
+ must appear above * in the parse tree
<term>
here, if tried * above, would not be able to
derive + from <term>
<term>
<factor>
<id>
A
<term>
*
<factor>
<factor>
<id>
<id>
C
B
In general, lower precedence (looser bind) will
appear above higher precedence operators in
the parse tree
Operator associa0vity When we combine operators to form expressions, the order in which the operators are to be applied may not be obvious. For example, a+b+c can be interpreted as((a + b) + c) or as(a + (b + c)). We say that + isleN-­‐associa5ve if operands are grouped leN to right as in ((a + b) + c). We say it is right-­‐associa5ve if it groups operands in the opposite direc5on, as in (a + (b + c)). A.V. Aho & J.D. Ullman 1977, p. 47
<assign>
<assign> à <id> := <expr>
<id>
à A | B | C
<expr>
à <expr> + <term>
| <term>
<term>
à <term> * <factor>
| <factor>
<factor> à ( <expr> ) | <id>
<id>
A
A := A + B + C
:=
<expr>
<expr>
+
<term>
<factor>
<expr>
+
<term>
<term>
<factor>
<factor>
<id>
<id>
B
A
<id>
C
Right associa0vity •  suppose we wanted exponen0a0on ^ to be right-­‐associa0ve –  need to add right-­‐recursive level to the grammar hierarchy A := A ^ B ^ C
<assign>
<assign> à <id> := <expr>
<id>
à A | B | C
<expr>
à <expr> + <term>
| <term>
<term>
à <term> * <factor>
| <factor>
<factor> à <exp> ^ <factor>
| <exp>
<exp>
à ( <expr> ) | <id>
<id>
:=
A
<expr>
<term>
<factor>
<exp>
<id>
A
^
<factor>
<exp>
^
<factor>
<id>
<exp>
B
<id>
C
In ALGOL 60… • 
• 
• 
• 
• 
• 
<math expr> à <simple math>
| <if clause> <simple math> else <math expr>
<if clause> à if <boolean expr> then
<simple math>
à <term>
| <add op> <term>
| <simple math> <add op> <term>
• 
• 
<term>
<factor>
à <factor> | <term> <mult op> <factor>
à <primary> | <factor> ↑ <primary>
• 
• 
• 
• 
<add op>
<mult op>
<primary>
à + | à х | / | %
à <unsigned number> | <variable>
| <function designator> | ( <math expr> )
•  precedence? associa0vity? Dangling else •  consider the C++ grammar rule: <selection stmt> à if ( <expr> ) <stmt>
| if ( <expr> ) <stmt> else <stmt>
poten0al problems? if (x > 0)
if (x > 100)
cout << foo
else
cout << bar
ambiguity!
•  to which if does the else belong?
<< endl;
<< endl;
in C++, ambiguity remains in the grammar rules
•  is clarified in the English description
(else matches nearest if)
Example1: Fortran/C/Pascal Parse Tree Example2: Perl Number Terminal Tokens <number> ::= ((number_1to999 number_1e6)? (number_1to999 number_1e3)?
number_1to999?) | number_0
<number_0> ::= "zero" <number_1to9> ::=
"one" | "two" | "three" | "four" | "five" | "six" | "seven" | "eight" |
"nine" <number_10to19> ::= "ten" | "eleven" | "twelve" | "thirteen" |
"fourteen" | "fifteen"
| "sixteen" | "seventeen" | "eighteen" |
"nineteen" <number_1to999> ::= (number_1to9? number_100)? (number_1to9
| number_10to19 | (number_tens number_1to9))? <number_tens> ::=
"twenty" | "thirty" | "fourty" | "fifty" | "sixty" | "seventy" |
"eighty" | "ninety" <number_100> ::= "hundred" <number_1e3> ::=
"thousand" <number_1e6> ::= "million"
!
!
!
!
!
Recall …….. • first computers (e.g., ENIAC) were not programmable –  had to be rewired/
reconfigured for different computa0ons • late 40 s / early 50 s: coded directly in machine language –  extremely tedious and error prone –  machine specific –  used numeric codes, absolute 011111110100010101001100010001100000000100000010000000010000000000000000000000
000000000000000000000000000000000000000000000000000000000000000001000000000000
001000000000000000000000000000000001000000000000000000000000000000000000000000
000000000000000000000000000000000000000000001010000100000000000000000000000000
000000000000000000110100000000000000000000000000000000000000000000101000000000
000000100000000000000000010000000000101110011100110110100001110011011101000111
001001110100011000010110001000000000001011100111010001100101011110000111010000
000000001011100111001001101111011001000110000101110100011000010000000000101110
011100110111100101101101011101000110000101100010000000000010111001110011011101
000111001001110100011000010110001000000000001011100111001001100101011011000110
000100101110011101000110010101111000011101000000000000101110011000110110111101
101101011011010110010101101110011101000000000000000000000000000000000010011101
111000111011111110010000000100110000000000000000000000001001000000010010011000
000000000000010101000000000000000000000000100100100001001010100000000000000100
000000000000000000000000000000000001000000000000000000000000101000000001000000
000000000010001001000000010000000000000001000000010101000000000000000000000000
100100100001001010100000000000000100000000000000000000000000000000000001000000
000000000000000000101100000001000000000000000100001000000000000000000000100000
000100000000000000000000000010000001110001111110000000001000100000011110100000
000000000000000000000000000000000000000000000001001000011001010110110001101100
011011110111011101101111011100100110110001100100001000010000000000000000000000
000000000000000000000000000000000000000001000000000000000000000000000000000000
000000000000000000000000000000000100000000001111111111110001000000000000000000
000000000000010000000000000000000000000000000000000000000000000000000000000000
000001000000000011111111111100010000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000001100000000000000000000
001100000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000010000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000110000000
000000000000000100000000000000000000000000001101000000000000000000000000000000
000000000000000000000000000000000000001000000000000000000000000000000000000000
000000000000000100010000000000000000000000000000000000000000000000000000000000
000000000010000000000000000000000000000000000000000000000000000010001100000000
000000000000000000000000000000000000000000000000000000000000100000000000000000
000000000000000000000000000000000000101100000000000000000000000000000000000000
000000000000000000000000000000001000000000000000000000000000000000000000000000
000000001101001000000000000000000000000000000000000000000000000000000000100100
000010010000000000000000000000010000000000000000000000000011011100000000000000
000000000000000000000000000000000000000000000000000000100000000000000000000000
000000000000001101000011001010110110001101100011011110010111001100011011100000
111000000000000011001110110001101100011001100100101111101100011011011110110110
101110000011010010110110001100101011001000010111000000000010111110101000101011
111011100010111010001101111011001000000000001011111010111110110110001110011010
111110101111100110111011011110111001101110100011100100110010101100001011011010
101000001000110010100100011011101101111011100110111010001110010011001010110000
101101101010111110101001000110111011011110111001101110100011100100110010101100
001011011010000000001011111010111110110110001110011010111110101111100110111011
011110111001101110100011100100110010101100001011011010101000001000011011000110
000000001100101011011100110010001101100010111110101111101000110010100100011011
101101111011100110111010001110010011001010110000101101101000000000110110101100
001011010010110111000000000011000110110111101110101011101000000000000000000000
000000000000000000000000000000000000000000000000000000
Recall …….. mid 1950 s: assembly languages
developed
§  mnemonic names replaced
numeric codes
§  relative addressing via names
and labels
a separate program (assembler)
translated from assembly code to
machine code
•  still machine specific, low-level
.file
"hello.cpp"
gcc2_compiled.:
.global _Q_qtod
.section ".rodata"
.align 8
.LLC0: .asciz "Hello world!"
.section ".text"
.align 4
.global main
.type
main,#function
.proc
04
main:
!#PROLOGUE# 0
save %sp,-112,%sp
!#PROLOGUE# 1
sethi %hi(cout),%o1
or %o1,%lo(cout),%o0
sethi %hi(.LLC0),%o2
or %o2,%lo(.LLC0),%o1
call __ls__7ostreamPCc,0
nop
mov %o0,%l0
mov %l0,%o0
sethi %hi(endl__FR7ostream),%o2
or %o2,%lo(endl__FR7ostream),%o1
call __ls__7ostreamPFR7ostream_R7ostream,0
nop
mov 0,%i0
b .LL230
nop
.LL230: ret
restore
.LLfe1: .size
main,.LLfe1-main
.ident "GCC: (GNU) 2.7.2"
Recall …….. late 1950 s: high-level languages
developed
§  allowed user to program at higher
level of abstraction
however, bridging the gap to low-level
hardware was more difficult
•  a compiler translated code all at once
into machine code (e.g., FORTRAN, C
++)
•  an interpreter simulated execution of
the code line-by-line (e.g., BASIC,
Scheme)
// File: hello.cpp
// Author: Dave Reed
//
// This program prints "Hello world!"
////////////////////////////////////////
#include <iostream>
using namespace std;
int main()
{
cout << "Hello world!" << endl;
return 0;
}
SoNware development methodologies…… •  by 70 s, sopware costs rivaled hardware • 
à new development methodologies emerged •  early 70 s: top-­‐down design –  stepwise (itera0ve) refinement (Pascal) •  late 70 s: data-­‐oriented programming –  concentrated on the use of ADT s
(Modula-­‐2, Ada, C/C++) •  early 80 s: object-­‐oriented programming –  ADT s+inheritance+dynamic binding (Smalltalk, C++, Eiffel, Java) •  mid 90's: extreme programming, agile programming (???) Architecture influences design……. •  virtually all computers follow the von Neumann architecture • 
fetch-­‐execute cycle: repeatedly •  fetch instruc0ons/data from memory •  execute in CPU •  write results back to memory •  impera0ve languages parallel this behavior –  variables (memory cells) –  assignments (changes to memory) –  sequen0al execu0on & itera0on (fetch/execute cycle) since features resemble the underlying implementa0on, tend to be efficient declarative languages emphasize problem-solving approaches far-removed from the
underlying hardware
e.g., Prolog (logic): specify facts & rules, interpreter performs logical inference
LISP/Scheme (functional): specify dynamic transformations to symbols & lists
tend to be more flexible and expressive, but not as efficient
FORTRAN (Formula Translator) • FORTRAN was the first* high-­‐level language –  developed by John Backus at IBM –  designed for the IBM 704 computer, all control structures corresponded to 704 machine instruc0ons –  704 compiler completed in 1957 –  despite some early problems, FORTRAN was immensely popular – adopted universally in 50's & 60's –  FORTRAN evolved based on experience and new programming features •  FORTRAN II (1958) •  FORTRAN IV (1962) •  FORTRAN 77 (1977) •  FORTRAN 90 (1990) C
C
C
C
FORTRAN program
Prints "Hello world" 10 times
PROGRAM HELLO
DO 10, I=1,10
PRINT *,'Hello world'
10 CONTINUE
STOP
END
LISP (List Processing) • LISP is a func0onal language –  developed by John McCarthy at MIT –  designed for Ar0ficial Intelligence research – needed to be symbolic, flexible, dynamic –  LISP interpreter completed in 1959 –  LISP syntax is very simple but flexible, based on the λ-­‐calculus of Church –  all memory management is dynamic and automa0c – simple but inefficient –  LISP is s0ll the dominant language in AI –  dialects of LISP have evolved •  Scheme (1975) •  Common LISP (1984) ;;; LISP program
;;; (hello N) will return a list containing
;;;
N copies of "Hello world"
(define (hello N)
(if (zero? N)
'()
(cons "Hello world" (hello (- N 1)))))
> (hello 10)
("Hello world"
"Hello world"
"Hello world"
"Hello world"
"Hello world"
"Hello world"
"Hello world"
"Hello world"
"Hello world"
"Hello world")
>
ALGOL (Algorithmic Language) • ALGOL was an interna0onal effort to design a universal language –  developed by joint commiwee of ACM and GAMM (German equivalent) –  influenced by FORTRAN, but more flexible & powerful, not machine specific –  ALGOL introduced and formalized many common language features of today •  data type •  compound statements •  natural control structures •  parameter passing modes •  recursive rou0nes •  BNF for syntax (Backus & Naur) –  ALGOL evolved (58, 60, 68), but not widely adopted as a programming language •  instead, accepted as a reference language comment ALGOL 60 PROGRAM
displays "Hello world" 10 times;
begin
integer counter;
for counter := 1 step 1 until 10 do
begin
printstring(Hello world");
end
end
C à C++ à Java à JavaScript ALGOL influenced the development of
virtually all modern languages
§  C (1971, Dennis Ritchie at Bell Labs)
•  designed for system programming
(used to implement UNIX)
•  provided high-level constructs and lowlevel machine access
§  C++ (1985, Bjarne Stroustrup at Bell Labs)
•  extended C to include objects
•  allowed for object-oriented
programming, with most of the
efficiency of C
§  Java (1993, Sun Microsystems)
•  based on C++, but simpler & more
reliable
•  purely object-oriented, with better
support for abstraction and networking
§  JavaScript (1995, Netscape)
•  Web scripting language
#include <stdio.h>
main() {
for(int i = 0; i < 10; i++) {
printf ("Hello World!\n");
}
}
#include <iostream>
using namespace std;
int main() {
for(int i = 0; i < 10; i++) {
cout << "Hello World!" << endl;
}
return 0;
}
class HelloWorld {
public static void main (String args[]) {
for(int i = 0; i < 10; i++) {
System.out.print("Hello World ");
}
}
}
<html>
<body>
<script language="JavaScript">
for(i = 0; i < 10; i++) {
document.write("Hello World<br>");
}
</script>
</body>
</html>
Other influen5al languages • 
COBOL (1960, Dept of Defense/Grace Hopper) –  designed for business applica0ons, features for structuring data & managing files • 
BASIC (1964, Kemeny & Kurtz – Dartmouth) –  designed for beginners, unstructured but popular on microcomputers in 70's • 
Simula 67 (1967, Nygaard & Dahl – Norwegian Compu0ng Center) –  designed for simula0ons, extended ALGOL to support classes/objects • 
Pascal (1971, Wirth – Stanford) –  designed as a teaching language but used extensively, emphasized structured programming • 
Prolog (1972, Colmerauer, Roussel – Aix-­‐Marseille, Kowalski – Edinburgh) –  logic programming language, programs stated as collec0on of facts & rules • 
Ada (1983, Dept of Defense) –  large & complex (but powerful) language, designed to be official govt. contract language There is no silver bullet •  remember: there is no best programming language –  each language has its own strengths and weaknesses •  languages can only be judged within a par0cular domain or for a specific applica0on business applica0ons à COBOL ar0ficial intelligence à LISP/Scheme or Prolog systems programming
à C sopware engineering à C++ or Java or Smalltalk Web development à Java or JavaScript or VBScript or perl 
Download