inductive defs

advertisement
Inductive Definitions
COS 510
David Walker
Inductive Definitions
Inductive definitions play a central role in
the study of programming languages
They specify the following aspects of a
language:
• Concrete syntax (via CFGs)
• Abstract syntax (via CFGs/ML datatypes)
• Static semantics (via typing rules)
• Dynamic semantics (via evaluation rules)
Reading
• Read Pierce’s Text:
– Chapter 2 (skim definitions; understand 2.4)
• we will use sets, relations, functions, sequences
• you should know basics such as mathematical
induction, reflexivity, transitivity, symmetry, total
and partial orders, domains and ranges of functions,
etc.
– Chapter 3
Inductive Definitions
• An inductive definition consists of:
– One or more judgments (ie: assertions)
– A set of rules for deriving these judgments
• For example:
– Judgment is “n nat”
– Rules:
• zero nat
• if n nat, then succ(n) nat.
Inference Rule Notation
Inference rules are normally written as:
J1
...
J
Jn
where J and J1,..., Jn are judgements. (For
axioms, n = 0.)
An example
For example, the rules for deriving n nat are
usually written:
zero nat
n nat
succ(n) nat
Derivation of Judgments
• A judgment J is derivable iff either
– there is an axiom
J
– or there is a rule
J1
...
J
Jn
– such that J1, ..., Jn are derivable
Derivation of Judgments
• We may determine whether a judgment is
derivable by working backwards.
• For example, the judgment
succ(succ(zero)) nat
is derivable as follows:
a derivation
(ie: a proof)
zero nat (zero)
succ(zero) nat (succ)
(succ)
succ(succ(zero)) nat
optional:
names of
rules used
at each step
Binary Trees
• Here is a set of rules defining the judgment t
tree stating that t is a binary tree:
empty tree
t1 tree t2 tree
node (t1, t2) tree
• Prove that the following is a valid
judgment:
node(empty, node(empty, empty)) tree
Rule Induction
• By definition, every derivable judgment
– is the consequence of some rule...
– whose premises are derivable
• That is, the rules are an exhaustive description of
the derivable judgments
• Just like an ML datatype definition is an
exhaustive description of all the objects in the type
being defined
Rule Induction
• To show that every derivable judgment has
a property P, it is enough to show that
– For every rule,
J1 ... Jn
J
if J1, ..., Jn have the property P, then J has
property P
This is the principal of rule induction.
Example: Natural Numbers
• Consider the rules for n nat
• We can prove that the property P holds of
every n such that n nat by rule induction:
– Show that P holds of zero;
– Assuming that P holds of n, show that P holds
of succ(n).
• This is just ordinary mathematical
induction!!
Example: Binary Tree
• Similarly, we can prove that every binary
tree t has a property P by showing that
– empty has property P;
– If t1 has property P and t2 has property P, then
node(t1, t2) has property P.
• This might be called tree induction.
Example: The Height of a Tree
• Consider the following equations:
– hgt(empty) = 0
– hgt(node(t1, t2)) = 1 + max(hgt(t1), hgt(t2))
• Claim: for every binary tree t there exists a
unique integer n such that hgt(t) = n.
• That is, the above equations define a
function.
Example: The Height of a Tree
• We will prove the claim by rule induction:
– If t is derivable by the axiom
empty tree
– then n = 0 is determined by the first equation:
hgt(empty) = 0
– is it unique? Yes.
Example: The Height of a Tree
• If t is derivable by the rule
t1 tree t2 tree
node (t1, t2) tree
then we may assume that:
• exists a unique n1 such that hgt(t1) = n1;
• exists a unique n2 such that hgt(t2) = n2;
Hence, there exists a unique n, namely
1+max(n1, n2)
such that hgt(t) = n.
Example: The Height of a Tree
This is awfully pedantic, but it is useful to see the
details at least once.
• It is not obvious a priori that a tree has a welldefined height!
• Rule induction justified the existence of the
function hgt.
It is “obvious” from the equations that there is at
most one n such that hgt(t) = n. The proof shows
that there exists at least one.
Inductive Definitions in PL
• In this course, we will be looking at
inductive definitions that determine
–
–
–
–
abstract syntax
static semantics (typing)
dynamic semantics (evaluation)
other properties of programs and programming
languages
Inductive Definitions
First up: Syntax
Abstract vs Concrete Syntax
• the concrete syntax of a program is a string
of characters:
– ‘(’ ‘3’ ‘+’ ‘2’ ‘)’ ‘*’ ‘7’
• the abstract syntax of a program is a tree
representing the computationally relevant
portion of the program:
*
+
3
7
2
Abstract vs Concrete Syntax
• the concrete syntax of a program contains many
elements necessary for parsing:
– parentheses
– delimiters for comments
– rules for precedence of operators
• the abstract syntax of a program is much simpler;
it does not contain these elements
– precedence is given directly by the tree structure
Abstract vs Concrete Syntax
• in this class, we work with abstract syntax
– we want to define what programs mean
– will work with the simple ASTs
• nevertheless, we need a notation for writing
down abstract syntax trees
– when we write (3 + 2) * 7, you should visualize
the tree:
*
+
3
7
2
Arithmetic Expressions, Informally
• Informally, an arithmetic expression e is
–
–
–
–
–
–
a boolean value
an if statement (if e1 then e2 else e3)
the number zero
the successor of a number
the predecessor of a number
a test for zero (isZero e)
Arithmetic Expressions, Formally
• An arithmetic expression e is
– a boolean value:
true exp
false exp
– an if statement (if e1 then e2 else e3):
t1 exp t2 exp t3 exp
if t1 then t2 else t3 exp
Arithmetic Expressions, formally
• An arithmetic expression e is
– a boolean, an if statement, a zero, a successor, a
predecessor or a 0 test:
true exp
zero exp
false exp
e exp
succ e exp
e1 exp e2 exp e3 exp
if e1 then e2 else e3 exp
e exp
pred e exp
e exp
iszero e exp
BNF
• Defining every bit of syntax by inductive
definitions can be lengthy and tedious
• Syntactic definitions are an especially
simple form of inductive definition:
– context insensitive
– unary predicates
• There is a very convenient abbreviation:
BNF
Arithmetic Expressions, in BNF
e ::= true | false | if e then e else e
| 0 | succ e | pred e | iszero e
pick a new letter
(Greek symbol/word)
to represent any object
in the set of objects
being defined
separates
alternatives
(7 alternatives
implies
7 inductive rules)
subterm/
subobject
is any “e”
object
An alternative definition
b ::= true | false
e ::= b | if e then e else e
| 0 | succ e | pred e | iszero e
corresponds to two inductively defined judgements:
2. e exp
1. b bool
the key rule is an inclusion of booleans in expressions:
b bool
b exp
Metavariables
b ::= true | false
e ::= b | if e then e else e
| 0 | succ e | pred e | iszero e
• b and e are called metavariables
• they stand for classes of objects, programs, and other things
• they must not be confused with program variables
2 Functions defined over Terms
constants(true) = {true}
constants (false) = {false}
constants (0) = {0}
constants(succ e) = constants(pred e) = constants(iszero e) = constants e
constants (if e1 then e2 else e3) = Ui=1-3 (constants ei)
size(true) = 1
size(false) = 1
size(0) = 1
size(succ e) = size(pred e) = size(iszero e) = size e + 1
size(if e1 then e2 else e3) = Ui=1-3 (size ei) +1
A Lemma
• The number of distinct constants in any
expression e is no greater than the size of e:
| constants e | ≤ size e
• How to prove it?
A Lemma
• The number of distinct constants in any
expression e is no greater than the size of e:
| constants e | ≤ size e
• How to prove it?
– By rule induction on the rules for “e exp”
– More commonly called induction on the
structure of e
– a form of “structural induction”
Structural Induction
• Suppose P is a predicate on expressions.
– structural induction:
• for each expression e, we assume P(e’) holds for
each subexpression e’ of e and go on to prove P(e)
• result: we know P(e) for all expressions e
– you’ll use this idea every single week in the rest
of the course.
Back to the Lemma
• The number of distinct constants in any
expression e is no greater than the size of e:
| constants e | ≤ size e
• Proof:
By induction on the structure of e.
case e is 0, true, false: ...
case e is succ e’, pred e’, iszero e’: ...
case e is (if e1 then e2 else e3): ...
always
state
method
first
separate
cases
(1 case
per rule)
The Lemma
• Lemma: | constants e | ≤ size e
2-column
proof
• Proof: ...
case e is 0, true, false:
| constants e | = |{e}|
(by def of constants)
=1
(simple calculation)
= size e
(by def of size)
calculation
justification
A Lemma
• Lemma: | constants e | ≤ size e
...
case e is pred e’:
| constants e | = |constants e’|
≤ size e’
< size e
(def of constants)
(IH)
(by def of size)
A Lemma
• Lemma: | constants e | ≤ size e
...
case e is (if e1 then e2 else e3):
| constants e | = |Ui=1..3 constants ei| (def of constants)
≤ Sumi=1..3 |constants ei| (property of sets)
≤ Sumi=1..3 (size ei)
(IH on each ei)
< size e
(def of size)
A Lemma
• Lemma: | constants e | ≤ size e
...
other cases are similar. QED
this had better be true
use Latin to show off 
A Lemma
• In reality, this lemma is so simple that you might not
bother to write down all the details
– “By induction on the structure of e.” is a sufficient statement
• BUT, when you omit the details of a proof, you had better
be sure it is trivial!
– when in doubt, present the details.
• NEVER hand-wave through a proof
– it is better to admit you don’t know then to fake it
– if you cannot do part of the proof for homework, explicitly state
the part of the proof that fails (if I had lemma X here, then ...)
What is a proof?
• A proof is an easily-checked justification of
a judgment (ie: a theorem)
– different people have different ideas about what
“easily-checked” means
– the more formal a proof, the more “easilychecked”
– in this class, we have a pretty high bar
• If there is one thing you’ll learn in this
class, it is how to write a proof!
Inductive Definitions
Next up: Evaluation
Evaluation
• There are many different ways to formalize
the evaluation of expressions
• In this course we will use different sorts of
operational semantics
–
–
–
–
direct expression of how an interpreter works
can be implemented in ML directly
easy to prove things about
scales up to complete languages easily
Values
• A value is an object that has been completely evaluated
• The values in our language of arithmetic expressions are
v ::= true | false | zero | succ v
• These values are a subset of the expressions
• By calling “succ v” a value, we’re treating “succ v” like a
piece of data; “succ v” is not function application
– “succ zero” is a value that represents 1
– “succ (succ zero)” is the value that represents 2
– we are counting in unary
• Remember, there is an inductive definition behind all this
Defining evaluation
• single-step evaluation judgment:
e  e’
• in English, we say “expression e evaluates
to e’ in a single step”
Defining evaluation
• single-step evaluation judgment:
e  e’
• evaluation rules for booleans:
if true then e2 else e3  e2
if false then e2 else e3  e3
Defining evaluation
• single-step evaluation judgment:
e  e’
• evaluation rules for booleans:
if true then e2 else e3  e2
if false then e2 else e3  e3
what if the first position in the “if”
is not true or false?
Defining evaluation
• single-step evaluation judgment:
e  e’
• evaluation rules for booleans:
if true then e2 else e3  e2
rules like this
do the “real work”
if false then e2 else e3  e3
a “search” rule
e1  e1’
if e1 then e2 else e3  if e1’ then e2 else e3
Defining evaluation
• single-step evaluation judgment:
e  e’
• evaluation rules for numbers:
e  e’
succ e  succ e’
e  e’
pred e  pred e’
e  e’
iszero e  iszero e’
iszero (succ v)  false
pred (succ v)  v
iszero (zero)  true
Defining evaluation
• single-step evaluation judgment:
e  e’
• other evaluation rules:
– there are none!
• Consider the term iszero true
–
–
–
–
We call such terms stuck
They aren’t values, but no rule applies
They are nonsensical programs
An interpreter for our language will either raise an
exception when it tries to evaluate a stuck program or
maybe do something random or even crash!
– It is a bad scene.
Defining evaluation
• Multistep evaluation: e * e’
• In English: “e evaluates to e’ in some
number of steps (possibly 0)”:
e * e
(reflexivity)
e  e’’
e’’ * e’
e * e’
(transitivity)
Single-step Induction
• We have defined the evaluation rules inductively, so we get
a proof principle:
– Given a property P of the single-step rules
– For each rule:
e1  e1’ .... ek  ek’
– we get to assume P(ei
e’
ei’) for i = 1..k and must prove the
e
conclusion P(e  e’)
– Result: we know P(e  e’) for all valid judgments with the form
e  e’
– called induction on the structure of the operational semantics
Multi-step Induction
– Given a property P of the multi-step rules
– For each rule:
e1 * e1’ ....
e * e’
ek * ek’
– we get to assume P(ei * ei’) for i = 1..k and
must prove the conclusion P(e * e’)
Multi-step Induction
– In other words, given a property P of the multi-step
rules
– we must prove:
• P(e * e)
• P(e * e’) when
e  e’’
e’’ * e’
e * e’
and we get to assume P(e’’ * e’) and (of course) any properties
we have proven already of the single step relation e  e’’
• this means, to prove things about multi-step rules, we normally
first need to prove a lemma about the single-step rules
A Theorem
• Remember the function size(e) from earlier
• Theorem: if e * e’ then size(e’) <= size(e)
• Proof: ?
A Theorem
• Remember the function size(e) from earlier
• Theorem: if e * e’ then size(e’) <= size(e)
• Proof: By induction on the structure of the
multi-step operational rules.
A Theorem
• Remember the function size(e) from earlier
• Theorem: if e * e’ then size(e’) <= size(e)
• Proof: By induction on the structure of the multi-step
operational rules.
– consider the transitivity rule:
e  e’’
e’’ * e’
e * e’
– ... we are going to need a similar property of the single step
evaluation function
A Lemma
• Lemma: if e  e’ then size(e’) <= size(e)
• Proof: ?
A Lemma
• Lemma: if e  e’ then size(e’) <= size(e)
• Proof: By induction on the structure of the
multi-step operational rules.
– one case for each rule, for example:
– case:
e  e’
succ e  succ e’
– case:
pred (succ v)  v
A Lemma
• Once we have proven the lemma, we can then prove
the theorem
– Theorem: if e * e’ then size(e’) <= size(e)
– When writing out a proof, always write lemmas in order to
make it clear there is no circularity in the proof!
• The consequence of our theorem: evaluation always
terminates
– our properties are starting to get more useful!
Summary
• Everything in this class will be defined
using inductive rules
• These rules give rise to inductive proofs
• How to succeed in this class:
– Dave: how do we prove X?
– Student: by induction on the structure of Y.
that’s the only tricky part
Download