Inductive Definitions
COS 510
David Walker
Inductive Definitions
Inductive definitions play a central role in
the study of programming languages
They specify the following aspects of a
• Concrete syntax (via CFGs)
• Abstract syntax (via CFGs/ML datatypes)
• Static semantics (via typing rules)
• Dynamic semantics (via evaluation rules)
• Read Pierce’s Text:
– Chapter 2 (skim definitions; understand 2.4)
• we will use sets, relations, functions, sequences
• you should know basics such as mathematical
induction, reflexivity, transitivity, symmetry, total
and partial orders, domains and ranges of functions,
– Chapter 3
Inductive Definitions
• An inductive definition consists of:
– One or more judgments (ie: assertions)
– A set of rules for deriving these judgments
• For example:
– Judgment is “n nat”
– Rules:
• zero nat
• if n nat, then succ(n) nat.
Inference Rule Notation
Inference rules are normally written as:
where J and J1,..., Jn are judgements. (For
axioms, n = 0.)
An example
For example, the rules for deriving n nat are
usually written:
zero nat
n nat
succ(n) nat
Derivation of Judgments
• A judgment J is derivable iff either
– there is an axiom
– or there is a rule
– such that J1, ..., Jn are derivable
Derivation of Judgments
• We may determine whether a judgment is
derivable by working backwards.
• For example, the judgment
succ(succ(zero)) nat
is derivable as follows:
a derivation
(ie: a proof)
zero nat (zero)
succ(zero) nat (succ)
succ(succ(zero)) nat
names of
rules used
at each step
Binary Trees
• Here is a set of rules defining the judgment t
tree stating that t is a binary tree:
empty tree
t1 tree t2 tree
node (t1, t2) tree
• Prove that the following is a valid
node(empty, node(empty, empty)) tree
Rule Induction
• By definition, every derivable judgment
– is the consequence of some rule...
– whose premises are derivable
• That is, the rules are an exhaustive description of
the derivable judgments
• Just like an ML datatype definition is an
exhaustive description of all the objects in the type
being defined
Rule Induction
• To show that every derivable judgment has
a property P, it is enough to show that
– For every rule,
J1 ... Jn
if J1, ..., Jn have the property P, then J has
property P
This is the principal of rule induction.
Example: Natural Numbers
• Consider the rules for n nat
• We can prove that the property P holds of
every n such that n nat by rule induction:
– Show that P holds of zero;
– Assuming that P holds of n, show that P holds
of succ(n).
• This is just ordinary mathematical
Example: Binary Tree
• Similarly, we can prove that every binary
tree t has a property P by showing that
– empty has property P;
– If t1 has property P and t2 has property P, then
node(t1, t2) has property P.
• This might be called tree induction.
Example: The Height of a Tree
• Consider the following equations:
– hgt(empty) = 0
– hgt(node(t1, t2)) = 1 + max(hgt(t1), hgt(t2))
• Claim: for every binary tree t there exists a
unique integer n such that hgt(t) = n.
• That is, the above equations define a
Example: The Height of a Tree
• We will prove the claim by rule induction:
– If t is derivable by the axiom
empty tree
– then n = 0 is determined by the first equation:
hgt(empty) = 0
– is it unique? Yes.
Example: The Height of a Tree
• If t is derivable by the rule
t1 tree t2 tree
node (t1, t2) tree
then we may assume that:
• exists a unique n1 such that hgt(t1) = n1;
• exists a unique n2 such that hgt(t2) = n2;
Hence, there exists a unique n, namely
1+max(n1, n2)
such that hgt(t) = n.
Example: The Height of a Tree
This is awfully pedantic, but it is useful to see the
details at least once.
• It is not obvious a priori that a tree has a welldefined height!
• Rule induction justified the existence of the
function hgt.
It is “obvious” from the equations that there is at
most one n such that hgt(t) = n. The proof shows
that there exists at least one.
Inductive Definitions in PL
• In this course, we will be looking at
inductive definitions that determine
abstract syntax
static semantics (typing)
dynamic semantics (evaluation)
other properties of programs and programming
Inductive Definitions
First up: Syntax
Abstract vs Concrete Syntax
• the concrete syntax of a program is a string
of characters:
– ‘(’ ‘3’ ‘+’ ‘2’ ‘)’ ‘*’ ‘7’
• the abstract syntax of a program is a tree
representing the computationally relevant
portion of the program:
Abstract vs Concrete Syntax
• the concrete syntax of a program contains many
elements necessary for parsing:
– parentheses
– delimiters for comments
– rules for precedence of operators
• the abstract syntax of a program is much simpler;
it does not contain these elements
– precedence is given directly by the tree structure
Abstract vs Concrete Syntax
• in this class, we work with abstract syntax
– we want to define what programs mean
– will work with the simple ASTs
• nevertheless, we need a notation for writing
down abstract syntax trees
– when we write (3 + 2) * 7, you should visualize
the tree:
Arithmetic Expressions, Informally
• Informally, an arithmetic expression e is
a boolean value
an if statement (if e1 then e2 else e3)
the number zero
the successor of a number
the predecessor of a number
a test for zero (isZero e)
Arithmetic Expressions, Formally
• An arithmetic expression e is
– a boolean value:
true exp
false exp
– an if statement (if e1 then e2 else e3):
t1 exp t2 exp t3 exp
if t1 then t2 else t3 exp
Arithmetic Expressions, formally
• An arithmetic expression e is
– a boolean, an if statement, a zero, a successor, a
predecessor or a 0 test:
true exp
zero exp
false exp
e exp
succ e exp
e1 exp e2 exp e3 exp
if e1 then e2 else e3 exp
e exp
pred e exp
e exp
iszero e exp
• Defining every bit of syntax by inductive
definitions can be lengthy and tedious
• Syntactic definitions are an especially
simple form of inductive definition:
– context insensitive
– unary predicates
• There is a very convenient abbreviation:
Arithmetic Expressions, in BNF
e ::= true | false | if e then e else e
| 0 | succ e | pred e | iszero e
pick a new letter
(Greek symbol/word)
to represent any object
in the set of objects
being defined
(7 alternatives
7 inductive rules)
is any “e”
An alternative definition
b ::= true | false
e ::= b | if e then e else e
| 0 | succ e | pred e | iszero e
corresponds to two inductively defined judgements:
2. e exp
1. b bool
the key rule is an inclusion of booleans in expressions:
b bool
b exp
b ::= true | false
e ::= b | if e then e else e
| 0 | succ e | pred e | iszero e
• b and e are called metavariables
• they stand for classes of objects, programs, and other things
• they must not be confused with program variables
2 Functions defined over Terms
constants(true) = {true}
constants (false) = {false}
constants (0) = {0}
constants(succ e) = constants(pred e) = constants(iszero e) = constants e
constants (if e1 then e2 else e3) = Ui=1-3 (constants ei)
size(true) = 1
size(false) = 1
size(0) = 1
size(succ e) = size(pred e) = size(iszero e) = size e + 1
size(if e1 then e2 else e3) = Ui=1-3 (size ei) +1
A Lemma
• The number of distinct constants in any
expression e is no greater than the size of e:
| constants e | ≤ size e
• How to prove it?
A Lemma
• The number of distinct constants in any
expression e is no greater than the size of e:
| constants e | ≤ size e
• How to prove it?
– By rule induction on the rules for “e exp”
– More commonly called induction on the
structure of e
– a form of “structural induction”
Structural Induction
• Suppose P is a predicate on expressions.
– structural induction:
• for each expression e, we assume P(e’) holds for
each subexpression e’ of e and go on to prove P(e)
• result: we know P(e) for all expressions e
– you’ll use this idea every single week in the rest
of the course.
Back to the Lemma
• The number of distinct constants in any
expression e is no greater than the size of e:
| constants e | ≤ size e
• Proof:
By induction on the structure of e.
case e is 0, true, false: ...
case e is succ e’, pred e’, iszero e’: ...
case e is (if e1 then e2 else e3): ...
(1 case
per rule)
The Lemma
• Lemma: | constants e | ≤ size e
• Proof: ...
case e is 0, true, false:
| constants e | = |{e}|
(by def of constants)
(simple calculation)
= size e
(by def of size)
A Lemma
• Lemma: | constants e | ≤ size e
case e is pred e’:
| constants e | = |constants e’|
≤ size e’
< size e
(def of constants)
(by def of size)
A Lemma
• Lemma: | constants e | ≤ size e
case e is (if e1 then e2 else e3):
| constants e | = |Ui=1..3 constants ei| (def of constants)
≤ Sumi=1..3 |constants ei| (property of sets)
≤ Sumi=1..3 (size ei)
(IH on each ei)
< size e
(def of size)
A Lemma
• Lemma: | constants e | ≤ size e
other cases are similar. QED
this had better be true
use Latin to show off 
A Lemma
• In reality, this lemma is so simple that you might not
bother to write down all the details
– “By induction on the structure of e.” is a sufficient statement
• BUT, when you omit the details of a proof, you had better
be sure it is trivial!
– when in doubt, present the details.
• NEVER hand-wave through a proof
– it is better to admit you don’t know then to fake it
– if you cannot do part of the proof for homework, explicitly state
the part of the proof that fails (if I had lemma X here, then ...)
What is a proof?
• A proof is an easily-checked justification of
a judgment (ie: a theorem)
– different people have different ideas about what
“easily-checked” means
– the more formal a proof, the more “easilychecked”
– in this class, we have a pretty high bar
• If there is one thing you’ll learn in this
class, it is how to write a proof!
Inductive Definitions
Next up: Evaluation
• There are many different ways to formalize
the evaluation of expressions
• In this course we will use different sorts of
operational semantics
direct expression of how an interpreter works
can be implemented in ML directly
easy to prove things about
scales up to complete languages easily
• A value is an object that has been completely evaluated
• The values in our language of arithmetic expressions are
v ::= true | false | zero | succ v
• These values are a subset of the expressions
• By calling “succ v” a value, we’re treating “succ v” like a
piece of data; “succ v” is not function application
– “succ zero” is a value that represents 1
– “succ (succ zero)” is the value that represents 2
– we are counting in unary
• Remember, there is an inductive definition behind all this
Defining evaluation
• single-step evaluation judgment:
e  e’
• in English, we say “expression e evaluates
to e’ in a single step”
Defining evaluation
• single-step evaluation judgment:
e  e’
• evaluation rules for booleans:
if true then e2 else e3  e2
if false then e2 else e3  e3
Defining evaluation
• single-step evaluation judgment:
e  e’
• evaluation rules for booleans:
if true then e2 else e3  e2
if false then e2 else e3  e3
what if the first position in the “if”
is not true or false?
Defining evaluation
• single-step evaluation judgment:
e  e’
• evaluation rules for booleans:
if true then e2 else e3  e2
rules like this
do the “real work”
if false then e2 else e3  e3
a “search” rule
e1  e1’
if e1 then e2 else e3  if e1’ then e2 else e3
Defining evaluation
• single-step evaluation judgment:
e  e’
• evaluation rules for numbers:
e  e’
succ e  succ e’
e  e’
pred e  pred e’
e  e’
iszero e  iszero e’
iszero (succ v)  false
pred (succ v)  v
iszero (zero)  true
Defining evaluation
• single-step evaluation judgment:
e  e’
• other evaluation rules:
– there are none!
• Consider the term iszero true
We call such terms stuck
They aren’t values, but no rule applies
They are nonsensical programs
An interpreter for our language will either raise an
exception when it tries to evaluate a stuck program or
maybe do something random or even crash!
– It is a bad scene.
Defining evaluation
• Multistep evaluation: e * e’
• In English: “e evaluates to e’ in some
number of steps (possibly 0)”:
e * e
e  e’’
e’’ * e’
e * e’
Single-step Induction
• We have defined the evaluation rules inductively, so we get
a proof principle:
– Given a property P of the single-step rules
– For each rule:
e1  e1’ .... ek  ek’
– we get to assume P(ei
ei’) for i = 1..k and must prove the
conclusion P(e  e’)
– Result: we know P(e  e’) for all valid judgments with the form
e  e’
– called induction on the structure of the operational semantics
Multi-step Induction
– Given a property P of the multi-step rules
– For each rule:
e1 * e1’ ....
e * e’
ek * ek’
– we get to assume P(ei * ei’) for i = 1..k and
must prove the conclusion P(e * e’)
Multi-step Induction
– In other words, given a property P of the multi-step
– we must prove:
• P(e * e)
• P(e * e’) when
e  e’’
e’’ * e’
e * e’
and we get to assume P(e’’ * e’) and (of course) any properties
we have proven already of the single step relation e  e’’
• this means, to prove things about multi-step rules, we normally
first need to prove a lemma about the single-step rules
A Theorem
• Remember the function size(e) from earlier
• Theorem: if e * e’ then size(e’) <= size(e)
• Proof: ?
A Theorem
• Remember the function size(e) from earlier
• Theorem: if e * e’ then size(e’) <= size(e)
• Proof: By induction on the structure of the
multi-step operational rules.
A Theorem
• Remember the function size(e) from earlier
• Theorem: if e * e’ then size(e’) <= size(e)
• Proof: By induction on the structure of the multi-step
operational rules.
– consider the transitivity rule:
e  e’’
e’’ * e’
e * e’
– ... we are going to need a similar property of the single step
evaluation function
A Lemma
• Lemma: if e  e’ then size(e’) <= size(e)
• Proof: ?
A Lemma
• Lemma: if e  e’ then size(e’) <= size(e)
• Proof: By induction on the structure of the
multi-step operational rules.
– one case for each rule, for example:
– case:
e  e’
succ e  succ e’
– case:
pred (succ v)  v
A Lemma
• Once we have proven the lemma, we can then prove
the theorem
– Theorem: if e * e’ then size(e’) <= size(e)
– When writing out a proof, always write lemmas in order to
make it clear there is no circularity in the proof!
• The consequence of our theorem: evaluation always
– our properties are starting to get more useful!
• Everything in this class will be defined
using inductive rules
• These rules give rise to inductive proofs
• How to succeed in this class:
– Dave: how do we prove X?
– Student: by induction on the structure of Y.
that’s the only tricky part