has - Purdue University

advertisement
Automatic Differentiation of Functional Programs
or
Lambda the Ultimate Calculus
Jeffrey Mark Siskind
qobi@purdue.edu
School of Electrical and Computer Engineering
Purdue University
Lecture Circuit 2009
Part of this talk covers joint work with Barak A. Pearlmutter.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
1 / 73
The Essence
(define (f x) 2x3 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
Jeffrey Mark Siskind (Purdue/ECE)
(define (f ′ x) 6x2 )
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (g x) sin f (x))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (g x) sin f (x))
Jeffrey Mark Siskind (Purdue/ECE)
(define (g′ x) f ′ (x) cos f (x))
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (g x) sin f (x))
Jeffrey Mark Siskind (Purdue/ECE)
(define (g′ x) f ′ (x) cos f (x))
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (g x) sin f (x))
Jeffrey Mark Siskind (Purdue/ECE)
(define (g′ x) f ′ (x) cos f (x))
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (g x) sin f (x))
Jeffrey Mark Siskind (Purdue/ECE)
(define (g′ x) f ′ (x) cos f (x))
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (g x) sin f (x))
Jeffrey Mark Siskind (Purdue/ECE)
(define (g′ x) f ′ (x) cos f (x))
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
Jeffrey Mark Siskind (Purdue/ECE)
h{f 7→ λx 2x3 }, λx sin f (x)i
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
Jeffrey Mark Siskind (Purdue/ECE)
h{f 7→ λx 2x3 }, λx sin f (x)i
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
Jeffrey Mark Siskind (Purdue/ECE)
h{f 7→ λx 2x3 }, λx sin f (x)i
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
Jeffrey Mark Siskind (Purdue/ECE)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
(map-closure
f h{x1 7→ v1 , . . .}, ei)
Jeffrey Mark Siskind (Purdue/ECE)
=⇒h{x1 7→ f (v1 ), . . .}, ei
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
(map-closure
f h{x1 7→ v1 , . . .}, ei)
Jeffrey Mark Siskind (Purdue/ECE)
=⇒h{x1 7→ f (v1 ), . . .}, ei
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
(map-closure
f h{x1 7→ v1 , . . .}, ei)
=⇒h{x1 7→ f (v1 ), . . .}, ei
need reflective transformation of closure bodies
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
(map-closure
f h{x1 7→ v1 , . . .}, ei)
=⇒h{x1 7→ f (v1 ), . . .}, ei
need reflective transformation of closure bodies
want transformation done at compile time
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
(map-closure
f h{x1 7→ v1 , . . .}, ei)
=⇒h{x1 7→ f (v1 ), . . .}, ei
need reflective transformation of closure bodies
want transformation done at compile time
need flow analysis
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
The Essence
(define (f x) 2x3 )
(define (f ′ x) 6x2 )
(define (g x) sin f (x))
(define (g′ x) f ′ (x) cos f (x))
(D g)
=⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i)
=⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 },
λx f ′ (x) cos f (x)i
(map-closure
f h{x1 7→ v1 , . . .}, ei)
=⇒h{x1 7→ f (v1 ), . . .}, ei
need reflective transformation of closure bodies
want transformation done at compile time
need polyvariant flow analysis
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
2 / 73
Nesting
(sqrt (sqrt x))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
3 / 73
Nesting
(sqrt (sqrt x))
(D (D f))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
3 / 73
Nesting
(sqrt (sqrt x))
(D (D f))
(map (lambda (x) . . . (map (lambda (y) . . .) . . .) . . .) . . .)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
3 / 73
Nesting
(sqrt (sqrt x))
(D (D f))
(map (lambda (x) . . . (map (lambda (y) . . .) . . .) . . .) . . .)
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
3 / 73
Nesting
(sqrt (sqrt x))
(D (D f))
(map (lambda (x) . . . (map (lambda (y) . . .) . . .) . . .) . . .)
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
max min f (x, y)
x
Jeffrey Mark Siskind (Purdue/ECE)
y
AD of Functional Programs
Lecture Circuit 2009
3 / 73
R, the Software, Finds Fans in Data Analysts - NYTimes.com http://www.nytimes.com/2009/01/07/technology/busines...
Welcome to
TimesPeople
What’s this?
HOME PAGE
TimesPeople Lets You Share and Discover the Best of NY...
TODAY'S PAPER
VIDEO
MOST POPULAR
Get Started
10:54 AM
My Account
TIMES TOPICS
No, thanks
Welcome, qobiyan Log Out Help
Search All NYTimes.com
Business Computing
WORLD
U.S.
N.Y. / REGION
BUSINESS
Search Technology
TECHNOLOGY
SCIENCE
HEALTH
SPORTS
OPINION
Inside Technology
Internet
Start-Ups
Business Computing
Companies
ARTS
STYLE
TRAVEL
JOBS
REAL ESTATE AUTOS
Bits
Personal Tech »
Blog »
Cellphones, Cameras, Computers and more
Advertise on NYTimes.com
Data Analysts Captivated by R’s Power
More Articles in Technology »
Advertisement
"Do Wrinkle Creams Work?"
Read how a Mom combined these
2 products and got rid of her
wrinkles forever! Learn more
Stuart Isett for The New York Times
R first appeared in 1996, when the statistics professors Robert Gentleman, left, and Ross Ihaka released the code as a free
software package.
Jeffrey Mark
Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
4 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
Taylor, B. (1715). Methodus Incrementorum Directa et Inversa. London.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε,
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
ε+
+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε,
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε, and
multiply by 1!
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε, and
multiply by 1! (noop).
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f ′′ (c) 2
f (c) f ′ (c)
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε, and
multiply by 1! (noop).
Key idea: Only need output to be a finite truncated power series a + bε.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε, and
multiply by 1! (noop).
Key idea: Only need output to be a finite truncated power series a + bε.
The input c + ε is also a truncated power series.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε, and
multiply by 1! (noop).
Key idea: Only need output to be a finite truncated power series a + bε.
The input c + ε is also a truncated power series.
Can do a nonstandard interpretation of f over truncated power series.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε, and
multiply by 1! (noop).
Key idea: Only need output to be a finite truncated power series a + bε.
The input c + ε is also a truncated power series.
Can do a nonstandard interpretation of f over truncated power series.
Preserves control flow: Augments original values with derivatives.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
The Essence of Forward-Mode AD
f (c + ε) =
f (c) f ′ (c)
f ′′ (c) 2
f (i) (c) i
+
ε+
ε + ··· +
ε + ···
0!
1!
2!
i!
To compute D f c:
evaluate f at the term c + ε to get a power series,
extract the coefficient of ε, and
multiply by 1! (noop).
Key idea: Only need output to be a finite truncated power series a + bε.
The input c + ε is also a truncated power series.
Can do a nonstandard interpretation of f over truncated power series.
Preserves control flow: Augments original values with derivatives.
(D f ) is O(1) relative to f (both space and time).
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
5 / 73
Arithmetic on Complex Numbers
a + bi
Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples;
with a preliminary and elementary essay on algebra as the science of pure
time. Transactions of the Royal Irish Academy, 17(1):293–422.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
6 / 73
Arithmetic on Complex Numbers
a + bi
i2 = −1
Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples;
with a preliminary and elementary essay on algebra as the science of pure
time. Transactions of the Royal Irish Academy, 17(1):293–422.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
6 / 73
Arithmetic on Complex Numbers
a + bi
i2 = −1
(a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i
(a1 + b1 i) × (a2 + b2 i) = (a1 × a2 ) + (a1 × b2 + a2 × b1 )i + (b1 × b2 )i2
Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples;
with a preliminary and elementary essay on algebra as the science of pure
time. Transactions of the Royal Irish Academy, 17(1):293–422.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
6 / 73
Arithmetic on Complex Numbers
a + bi
i2 = −1
(a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i
(a1 + b1 i) × (a2 + b2 i) = (a1 × a2 ) + (a1 × b2 + a2 × b1 )i + (b1 × b2 )i2
Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples;
with a preliminary and elementary essay on algebra as the science of pure
time. Transactions of the Royal Irish Academy, 17(1):293–422.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
6 / 73
Arithmetic on Complex Numbers
a + bi
i2 = −1
(a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i
(a1 + b1 i) × (a2 + b2 i) = (a1 × a2 −b1 × b2 ) + (a1 × b2 + a2 × b1 )i
Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples;
with a preliminary and elementary essay on algebra as the science of pure
time. Transactions of the Royal Irish Academy, 17(1):293–422.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
6 / 73
Arithmetic on Complex Numbers
a + bi
i2 = −1
(a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i
(a1 + b1 i) × (a2 + b2 i) = (a1 × a2 −b1 × b2 ) + (a1 × b2 + a2 × b1 )i
ha, bi
Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples;
with a preliminary and elementary essay on algebra as the science of pure
time. Transactions of the Royal Irish Academy, 17(1):293–422.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
6 / 73
Arithmetic on Complex Numbers
a + bi
i2 = −1
(a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i
(a1 + b1 i) × (a2 + b2 i) = (a1 × a2 −b1 × b2 ) + (a1 × b2 + a2 × b1 )i
ha, bi
ha1 , b1 i + ha2 , b2 i = h(a1 + a2 ), (b1 + b2 )i
ha1 , b1 i × ha2 , b2 i = h(a1 × a2 − b1 × b2 ), (a1 × b2 + a2 × b1 )i
Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples;
with a preliminary and elementary essay on algebra as the science of pure
time. Transactions of the Royal Irish Academy, 17(1):293–422.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
6 / 73
Arithmetic on Dual Numbers
x + x′ ε
Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of
the London Mathematical Society, 4:381–95.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
7 / 73
Arithmetic on Dual Numbers
x + x′ ε
ε2 = 0, but ε 6= 0
Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of
the London Mathematical Society, 4:381–95.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
7 / 73
Arithmetic on Dual Numbers
x + x′ ε
ε2 = 0, but ε 6= 0
(x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε
(x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε + (x1′ + x2′ )ε2
Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of
the London Mathematical Society, 4:381–95.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
7 / 73
Arithmetic on Dual Numbers
x + x′ ε
ε2 = 0, but ε 6= 0
(x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε
(x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε + (x1′ + x2′ )ε2
Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of
the London Mathematical Society, 4:381–95.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
7 / 73
Arithmetic on Dual Numbers
x + x′ ε
ε2 = 0, but ε 6= 0
(x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε
(x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε
Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of
the London Mathematical Society, 4:381–95.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
7 / 73
Arithmetic on Dual Numbers
x + x′ ε
ε2 = 0, but ε 6= 0
(x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε
(x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε
hx, x′ i
Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of
the London Mathematical Society, 4:381–95.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
7 / 73
Arithmetic on Dual Numbers
x + x′ ε
ε2 = 0, but ε 6= 0
(x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε
(x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε
hx, x′ i
hx1 , x1′ i + hx2 , x2′ i = h(x1 + x2 ), (x1′ + x2′ )i
hx1 , x1′ i × hx2 , x2′ i = h(x1 × x2 ), (x1 × x2′ + x2 × x1′ )i
Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of
the London Mathematical Society, 4:381–95.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
7 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
(define (f x) (* 2 (* x (* x x))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
(define (f x) (* 2 (* x (* x x))))
(D f)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
(define (f x) (* 2 (* x (* x x))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
(define (f x) (* 2 (* x (* x x))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
(define (f x) (* 2 (* x (* x x))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Convenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
(define (f x) (* 2 (* x (* x x))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Convenient but slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
(define (f x) (* 2 (* x (* x x))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Convenient but slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define +
(let ((+ +))
(lambda (x1 x2)
(make-bundle (+ (primal x1) (primal x2))
(+ (tangent x1) (tangent x2))))))
(define *
(let ((+ +) (* *))
(lambda (x1 x2)
(make-bundle (* (primal x1) (primal x2))
(+ (* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(define ((D f) x) (tangent (f (make-bundle x 1))))
(define (f x) (* 2 (* x (* x x))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Convenient but slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define ((D f) x)
(fluid-let ((+ (lambda (x1 x2)
(make-bundle (+
(+
(* (lambda (x1 x2)
(make-bundle (*
(+
(primal x1) (primal x2))
(tangent x1) (tangent x2)))))
(primal x1) (primal x2))
(* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(tangent (f (make-bundle x 1)))))
(define (f x) (* 2 (* x (* x x))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Convenient but slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Dynamic Overloading: SCMUTILS
(define-structure bundle primal tangent)
(define (primal p) (if (bundle? p) (bundle-primal p) p))
(define (tangent p) (if (bundle? p) (bundle-tangent p) 0))
(define ((D f) x)
(fluid-let ((+ (lambda (x1 x2)
(make-bundle (+
(+
(* (lambda (x1 x2)
(make-bundle (*
(+
(primal x1) (primal x2))
(tangent x1) (tangent x2)))))
(primal x1) (primal x2))
(* (primal x1) (tangent x2))
(* (tangent x1) (primal x2)))))))
(tangent (f (make-bundle x 1)))))
(define (f x) (* 2 (* x (* x x))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
Convenient but slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
8 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
Fast
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
AD_TOP = gf
AD_IVARS = x, gx
AD_DVARS = gf, gresult
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
AD_TOP = gf
AD_IVARS = x, gx
AD_DVARS = gf, gresult
function ggf(x, gx, gx, ggx, gresult, ggresult, gresult)
double precision x, gx, gx, ggx, ggf, gresult, gresult, ggresult
ggf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
gresult = 6.0d0*x*x*gx
ggresult = 6.0d0*x*x*ggx+12.0d0*x*gx*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
AD_TOP = gf
AD_IVARS = x, gx
AD_DVARS = gf, gresult
function ggf(x, gx, gx, ggx, gresult, ggresult, gresult)
double precision x, gx, gx, ggx, ggf, gresult, gresult, ggresult
ggf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
gresult = 6.0d0*x*x*gx
ggresult = 6.0d0*x*x*ggx+12.0d0*x*gx*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
AD_TOP = gf
AD_IVARS = x, gx
AD_DVARS = gf, gresult
function ggf(x, gx, gx, ggx, gresult, ggresult, gresult)
double precision x, gx, gx, ggx, ggf, gresult, gresult, ggresult
ggf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
gresult = 6.0d0*x*x*gx
ggresult = 6.0d0*x*x*ggx+12.0d0*x*gx*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
AD_TOP = gf
AD_IVARS = x, gx
AD_DVARS = gf, gresult
function ggf(x, gx, gx, ggx, gresult, ggresult, gresult)
double precision x, gx, gx, ggx, ggf, gresult, gresult, ggresult
ggf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
gresult = 6.0d0*x*x*gx
ggresult = 6.0d0*x*x*ggx+12.0d0*x*gx*gx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Preprocessor: ADIFOR and TAPENADE
function f(x)
double precision x, f
f = 2.0d0*x*x*x
end
AD_TOP = f
AD_IVARS = x
AD_DVARS = f
function gf(x, gx, gresult)
double precision x, gx, gf, gresult
gf = 2.0d0*x*x*x
gresult = 6.0d0*x*x*gx
end
AD_TOP = gf
AD_IVARS = x, gx
AD_DVARS = gf, gresult
AD_PREFIX = h
function hgf(x, hx, gx, hgx, gresult, hgresult, hresult)
double precision x, hx, gx, hgx, hgf, hresult, gresult, hgresult
hgf = 2.0d0*x*x*x
hresult = 6.0d0*x*x*hx
gresult = 6.0d0*x*x*gx
hgresult = 6.0d0*x*x*hgx+12.0d0*x*gx*hx
end
Fast but inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
9 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
template <typename T>
T f(T x) {return 2*x*x*x;}
T x;
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
template <typename T>
T f(T x) {return 2*x*x*x;}
T x;
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Static Overloading: FADBAD ++
double f(double x) {return 2*x*x*x;}
double x;
. . . f(x) . . .
F<double> f(F<double> x) {return 2*x*x*x;}
F<double> x;
x.diff(0, 1);
. . . f(x).d(0) . . .
F<F<double> > f(F<F<double> > x) {return 2*x*x*x;}
F<F<double> > x;
x.diff(0, 1);
x.diff(0, 1).diff(0,1);
. . . f(x).d(0).d(0) . . .
template <typename T>
T f(T x) {return 2*x*x*x;}
T x;
Slow and inconvenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
10 / 73
Our API for Functional Forward AD
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
−
⇁
−
⇁
: Rn × Rn → (Rn ⊲ Rn )
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
−
⇁
−
⇁
: Rn × Rn → (Rn ⊲ Rn )
−
⇁
: (Rn ⊲ Rn ) → Rn
−
⇁
−
⇁
: (Rn ⊲ Rn ) → Rn
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
j*
−
⇁
−
⇁
: Rn × Rn → (Rn ⊲ Rn )
−
⇁
: (Rn ⊲ Rn ) → Rn
−
⇁
−
⇁
: (Rn ⊲ Rn ) → Rn
−
⇁
−
⇁
: (Rn → Rm ) → ((Rn ⊲ Rn ) → (Rm ⊲ Rm ))
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
j*
−
⇁
−
⇁
: Rn × Rn → (Rn ⊲ Rn )
−
⇁
: (Rn ⊲ Rn ) → Rn
−
⇁
−
⇁
: (Rn ⊲ Rn ) → Rn
−
⇁
−
⇁
: (Rn → Rm ) → ((Rn ⊲ Rn ) → (Rm ⊲ Rm ))
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
j*
−
⇁
−
⇁
: Rn × Rn → (Rn ⊲ Rn )
−
⇁
: (Rn ⊲ Rn ) → Rn
−
⇁
−
⇁
: (Rn ⊲ Rn ) → Rn
−
⇁
−
⇁
: (Rn → Rm ) → ((Rn ⊲ Rn ) → (Rm ⊲ Rm ))
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
:
primal
:
⇁
⇁
τ ×−
τ → (τ ⊲ −
τ)
−
⇁
(τ ⊲ τ ) → τ
tangent
:
⇁
⇁
(τ ⊲ −
τ )→−
τ
j* :
⇁
⇁
(τ1 → τ2 ) → ((τ1 ⊲ −
τ1 ) → (τ2 ⊲ −
τ2 ))
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
:
primal
:
⇁
⇁
τ → (τ ⊲ −
τ)
τ ×−
−
⇁
(τ ⊲ τ ) → τ
tangent
:
⇁
⇁
τ )→−
τ
(τ ⊲ −
j* :
⇁
⇁
τ1 ) → (τ2 ⊲ −
τ2 ))
(τ1 → τ2 ) → ((τ1 ⊲ −
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
:
primal
:
⇁
⇁
τ ×−
τ → (τ ⊲ −
τ)
−
⇁
(τ ⊲ τ ) → τ
tangent
:
⇁
⇁
(τ ⊲ −
τ )→−
τ
j* :
⇁
⇁
(τ1 → τ2 ) → ((τ1 ⊲ −
τ1 ) → (τ2 ⊲ −
τ2 ))
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
⇀
τ as −
τ
Can abbreviate τ ⊲ −
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
⇀
⇁
τ
: τ ×−
τ →−
−
⇀
: τ →τ
⇀
⇁
τ →−
: −
τ
⇀
⇀
τ1 → −
τ2 )
j* : (τ1 → τ2 ) → (−
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇀
⇁
τ
Can abbreviate τ ⊲ −
τ as −
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
⇁
⇀
: τ ×−
τ →−
τ
−
⇀
: τ →τ
⇀
⇁
tangent : −
τ →−
τ
−
→
⇀
⇀
τ1 → −
τ2 )
J : (τ1 → τ2 ) → (−
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
−
⇀
Can abbreviate τ ⊲ −
τ as
−
→τ
Sometimes write j* as J
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
⇁
⇀
: τ ×−
τ →−
τ
−
⇀
: τ →τ
⇀
⇁
: −
τ →−
τ
⇀
⇀
j* : (τ1 → τ2 ) → (−
τ1 → −
τ2 )
(define ((D f) x) (tangent ((j* f) (bundle x 1))))
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
−
⇀
Can abbreviate τ ⊲ −
τ as
−
→τ
Sometimes write j* as J
Convenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
⇁
⇀
: τ ×−
τ →−
τ
−
⇀
: τ →τ
⇀
⇁
: −
τ →−
τ
⇀
⇀
j* : (τ1 → τ2 ) → (−
τ1 → −
τ2 )
(define ((D f) x) (tangent ((j* f) (bundle x 1))))
(D f)
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
−
⇀
Can abbreviate τ ⊲ −
τ as
−
→τ
Sometimes write j* as J
Convenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
⇁
⇀
: τ ×−
τ →−
τ
−
⇀
: τ →τ
⇀
⇁
: −
τ →−
τ
⇀
⇀
j* : (τ1 → τ2 ) → (−
τ1 → −
τ2 )
(define ((D f) x) (tangent ((j* f) (bundle x 1))))
(D f)
(D (D f))
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
−
⇀
Can abbreviate τ ⊲ −
τ as
−
→τ
Sometimes write j* as J
Convenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
⇁
⇀
: τ ×−
τ →−
τ
−
⇀
: τ →τ
⇀
⇁
: −
τ →−
τ
⇀
⇀
j* : (τ1 → τ2 ) → (−
τ1 → −
τ2 )
(define ((D f) x) (tangent ((j* f) (bundle x 1))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
−
⇀
Can abbreviate τ ⊲ −
τ as
−
→τ
Sometimes write j* as J
Convenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
⇁
⇀
: τ ×−
τ →−
τ
−
⇀
: τ →τ
⇀
⇁
: −
τ →−
τ
⇀
⇀
j* : (τ1 → τ2 ) → (−
τ1 → −
τ2 )
(define ((D f) x) (tangent ((j* f) (bundle x 1))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
−
⇀
Can abbreviate τ ⊲ −
τ as
−
→τ
Sometimes write j* as J
What is (j* j*)?
Convenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
⇁
⇀
: τ ×−
τ →−
τ
−
⇀
: τ →τ
⇀
⇁
: −
τ →−
τ
⇀
⇀
j* : (τ1 → τ2 ) → (−
τ1 → −
τ2 )
(define ((D f) x) (tangent ((j* f) (bundle x 1))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
−
⇀
Can abbreviate τ ⊲ −
τ as
−
→τ
Sometimes write j* as J
What is (j* j*)?
Convenient
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
Our API for Functional Forward AD
bundle
primal
tangent
⇁
⇀
: τ ×−
τ →−
τ
−
⇀
: τ →τ
⇀
⇁
: −
τ →−
τ
⇀
⇀
j* : (τ1 → τ2 ) → (−
τ1 → −
τ2 )
(define ((D f) x) (tangent ((j* f) (bundle x 1))))
(D f)
(D (D f))
(D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .)
−
⇁
Differential geometry bundles points Rn in a manifold with tangent vectors Rn
j* maps a function to its push forward
Generalize to arbitrary types
What is the tangent of a discrete value or a function?
⇁
−
⇀
Can abbreviate τ ⊲ −
τ as
−
→τ
Sometimes write j* as J
What is (j* j*)?
Convenient and fast
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
11 / 73
A property
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
−
⇁
x : Rn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
−
⇁
x : Rn
Jeffrey Mark Siskind (Purdue/ECE)
f : Rn → Rm
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
−
⇁
x : Rn
x : Rn
((J f ) x)[i, j] =
f : Rn → Rm
∂f (x)[i]
∂x[j]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
−
⇁
x : Rn
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
f : Rn → Rm
((J f ) x) : Rm×n
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
L
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : Rn → Rm
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : Rn → Rm
0/1 matrix, every row has exactly one 1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : Rn → Rm
0/1 matrix, every row has exactly one 1
((J f ) x)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : Rn → Rm
0/1 matrix, every row has exactly one 1
((J f ) x) =
∂f (x)[i]
∂x[j]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : Rn → Rm
0/1 matrix, every row has exactly one 1
(
1 when (f x)[i] = x[j]
∂f (x)[i]
((J f ) x) = ∂x[j] =
0 otherwise
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : Rn → Rm
0/1 matrix, every row has exactly one 1
(
1 when (f x)[i] = x[j]
∂f (x)[i]
((J f ) x) = ∂x[j] =
=f
0 otherwise
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : Rn → Rm
0/1 matrix, every row has exactly one 1
(
1 when (f x)[i] = x[j]
∂f (x)[i]
((J f ) x) = ∂x[j] =
=f
0 otherwise
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
x : Rn
((J f ) x)[i, j] =
∂f (x)[i]
∂x[j]
−
⇁
x : Rn
f : Rn → Rm
((J f ) x) : Rm×n
((J f ) x) : Rn → Rm
L
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : Rn → Rm
0/1 matrix, every row has exactly one 1
(
1 when (f x)[i] = x[j]
∂f (x)[i]
((J f ) x) = ∂x[j] =
=f
0 otherwise
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
−
⇁
x : τ1
x : τ1
((J f ) x)[i, j] =
f : τ1 → τ2
L
∂f (x)[i]
∂x[j]
((J f ) x) : τ1 → τ2
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : τ1 → τ2
0/1 matrix, every row has exactly one 1
(
1 when (f x)[i] = x[j]
∂f (x)[i]
((J f ) x) = ∂x[j] =
=f
0 otherwise
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
A property
−
⇁
x : τ1
x : τ1
((J f ) x)[i, j] =
f : τ1 → τ2
L
∂f (x)[i]
∂x[j]
((J f ) x) : τ1 → τ2
⇁
(primal ((j* f ) (bundle x −
x ))) = (f x)
⇁
⇁
(tangent ((j* f ) (bundle x −
x ))) = ((J f ) x) × −
x
−
⇁
−
⇁
(tangent ((j* f ) (bundle x x ))) = (((J f ) x) x )
((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x)))
rearrangement function: (∀i)(∃j)(f x)[i] = x[j]
L
f : τ1 → τ2
0/1 matrix, every row has exactly one 1
(
1 when (f x)[i] = x[j]
∂f (x)[i]
((J f ) x) = ∂x[j] =
=f
0 otherwise
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
12 / 73
What is the tangent of #t?
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
f : (#t x y) 7→ (#t x y)
Jeffrey Mark Siskind (Purdue/ECE)
but
f : (#f x y) 7→ (#f y x)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
f : (#t x y) 7→ (#t x y)
but
f : (#f x y) 7→ (#f y x)
f is a rearrangement function
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
f : (#t x y) 7→ (#t x y)
but
f : (#f x y) 7→ (#f y x)
f is a rearrangement function
⇁
⇁
((j* f ) (bundle (#t x y) (#f −
x −
y )))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
f : (#t x y) 7→ (#t x y)
but
f : (#f x y) 7→ (#f y x)
f is a rearrangement function
⇁
⇁
((j* f ) (bundle (#t x y) (#f −
x −
y )))
−
⇁
−
⇁
= (bundle (#t x y) (#f y x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
f : (#t x y) 7→ (#t x y)
but
f : (#f x y) 7→ (#f y x)
f is a rearrangement function
⇁
⇁
((j* f ) (bundle (#t x y) (#f −
x −
y )))
−
⇁
−
⇁
= (bundle (#t x y) (#f y x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
f : (#t x y) 7→ (#t x y)
but
f : (#f x y) 7→ (#f y x)
f is a rearrangement function
⇁
⇁
((j* f ) (bundle (#t x y) (#f −
x −
y )))
−
⇁
−
⇁
= (bundle (#t x y) (#f y x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is the tangent of #t?
−
⇁
What if we take #t = #f?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
f : (#t x y) 7→ (#t x y)
but
f : (#f x y) 7→ (#f y x)
f is a rearrangement function
⇁
⇁
((j* f ) (bundle (#t x y) (#f −
x −
y )))
−
⇁
−
⇁
= (bundle (#t x y) (#f y x ))
−
⇁
Problem avoided if we take #t = #t
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
13 / 73
What is (j* j*)?
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
14 / 73
What is (j* j*)?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
14 / 73
What is (j* j*)?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
bundle, primal, tangent, and j* are rearrangement functions
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
14 / 73
What is (j* j*)?
when f is a rearrangement function
((j* f ) x) = (bundle (f (primal x)) (f (tangent x)))
bundle, primal, tangent, and j* are rearrangement functions
((j* bundle) x)=(bundle (bundle (primal x)) (bundle (tangent x)))
((j* primal) x)=(bundle (primal (primal x)) (primal (tangent x)))
((j* tangent) x)=(bundle (tangent (primal x)) (tangent (tangent x)))
((j* j*) x)=(bundle (j* (primal x)) (j* (tangent x)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
14 / 73
A (Not So) Brief Tutorial on AD
z = g(f (x))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
15 / 73
A (Not So) Brief Tutorial on AD
z = g(f (x))
= (f ◦ g)(x)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
15 / 73
A (Not So) Brief Tutorial on AD
z = g(f (x))
= (f ◦ g)(x)
y = f (x)
z = g(y)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
15 / 73
A (Not So) Brief Tutorial on AD
z = g(f (x))
= (f ◦ g)(x)
y = f (x)
z = g(y)
dz dy
dz
=
dx
dy dx
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
15 / 73
A (Not So) Brief Tutorial on AD
z = g(f (x))
= (f ◦ g)(x)
y = f (x)
z = g(y)
dz dy
dz
=
dx
dy dx
D (f ◦ g) x = (D g y) × (D f x)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
15 / 73
A (Not So) Brief Tutorial on AD
f = f 1 ◦ · · · ◦ fn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
16 / 73
A (Not So) Brief Tutorial on AD
f = f 1 ◦ · · · ◦ fn
x1 = f1 x0
..
.
xn = fn xn−1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
16 / 73
A (Not So) Brief Tutorial on AD
f = f 1 ◦ · · · ◦ fn
x1 = f1 x0
..
.
xn = fn xn−1
J f x0 = (J fn xn−1 ) × · · · × (J f1 x0 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
16 / 73
A (Not So) Brief Tutorial on AD
f = f 1 ◦ · · · ◦ fn
x1 = f1 x0
..
.
xn = fn xn−1
J f x0 = (J fn xn−1 ) × · · · × (J f1 x0 )
(J f x0 )⊤ = (J f1 x0 )⊤ × · · · × (J fn xn−1 )⊤
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
16 / 73
A (Not So) Brief Tutorial on AD
−
⇁
Xn = J f x0
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
17 / 73
A (Not So) Brief Tutorial on AD
−
⇁
Xn = J f x0
= (J fn xn−1 ) × · · · ×(J f2 x1 )×(J f1 x0 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
17 / 73
A (Not So) Brief Tutorial on AD
−
⇁
Xn = J f x0
= (J fn xn−1 ) × · · · ×(J f2 x1 )×(J f1 x0 )
−
⇁
X1 = (J f1 x0 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
17 / 73
A (Not So) Brief Tutorial on AD
−
⇁
Xn = J f x0
= (J fn xn−1 ) × · · · ×(J f2 x1 )×(J f1 x0 )
−
⇁
X1 = (J f1 x0 )
−
⇁
−
⇁
X2 = (J f2 x1 ) × X1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
17 / 73
A (Not So) Brief Tutorial on AD
−
⇁
Xn = J f x0
= (J fn xn−1 ) × · · · ×(J f2 x1 )×(J f1 x0 )
−
⇁
X1 = (J f1 x0 )
−
⇁
−
⇁
X2 = (J f2 x1 ) × X1
..
.
−
⇁
−−⇁
Xn = (J fn xn−1 ) × Xn−1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
17 / 73
A (Not So) Brief Tutorial on AD
↽−
X0 = (J f x0 )⊤
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
18 / 73
A (Not So) Brief Tutorial on AD
↽−
X0 = (J f x0 )⊤
= (J f1 x0 )⊤ × · · · ×(J fn−1 xn−2 )⊤×(J fn xn−1 )⊤
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
18 / 73
A (Not So) Brief Tutorial on AD
↽−
X0 = (J f x0 )⊤
= (J f1 x0 )⊤ × · · · ×(J fn−1 xn−2 )⊤×(J fn xn−1 )⊤
↽−−−
Xn−1 = (J fn xn−1 )⊤
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
18 / 73
A (Not So) Brief Tutorial on AD
↽−
X0 = (J f x0 )⊤
= (J f1 x0 )⊤ × · · · ×(J fn−1 xn−2 )⊤×(J fn xn−1 )⊤
↽−−−
Xn−1 = (J fn xn−1 )⊤
↽−−−
↽−−−
Xn−2 = (J fn−1 xn−2 )⊤ × Xn−1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
18 / 73
A (Not So) Brief Tutorial on AD
↽−
X0 = (J f x0 )⊤
= (J f1 x0 )⊤ × · · · ×(J fn−1 xn−2 )⊤×(J fn xn−1 )⊤
↽−−−
Xn−1 = (J fn xn−1 )⊤
↽−−−
↽−−−
Xn−2 = (J fn−1 xn−2 )⊤ × Xn−1
..
.
↽−
↽−
X0 = (J f1 x0 )⊤ × X1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
18 / 73
A (Not So) Brief Tutorial on AD
−
⇁
X1
−
⇁
X2
−
⇁
Xn
= (J f1 x0 )
−
⇁
= (J f2 x1 ) × X1
..
.
−−⇁
= (J fn xn−1 ) × Xn−1
Jeffrey Mark Siskind (Purdue/ECE)
↽−−−
Xn−1
↽−−−
Xn−2
↽−
X0
= (J fn xn−1 )⊤
↽−−−
= (J fn−1 xn−2 )⊤ × Xn−1
..
.
↽−
= (J f1 x0 )⊤ × X1
AD of Functional Programs
Lecture Circuit 2009
19 / 73
A (Not So) Brief Tutorial on AD
−
⇁
X1
−
⇁
X2
−
⇁
Xn
= (J f1 x0 )
−
⇁
= (J f2 x1 ) × X1
..
.
−−⇁
= (J fn xn−1 ) × Xn−1
Jeffrey Mark Siskind (Purdue/ECE)
↽−−−
Xn−1
↽−−−
Xn−2
↽−
X0
= (J fn xn−1 )⊤
↽−−−
= (J fn−1 xn−2 )⊤ × Xn−1
..
.
↽−
= (J f1 x0 )⊤ × X1
AD of Functional Programs
Lecture Circuit 2009
19 / 73
A (Not So) Brief Tutorial on AD
−
⇁
X1
−
⇁
X2
−
⇁
Xn
= (J f1 x0 )
−
⇁
= (J f2 x1 ) × X1
..
.
−−⇁
= (J fn xn−1 ) × Xn−1
Jeffrey Mark Siskind (Purdue/ECE)
↽−−−
Xn−1
↽−−−
Xn−2
↽−
X0
= (J fn xn−1 )⊤
↽−−−
= (J fn−1 xn−2 )⊤ × Xn−1
..
.
↽−
= (J f1 x0 )⊤ × X1
AD of Functional Programs
Lecture Circuit 2009
19 / 73
A (Not So) Brief Tutorial on AD
−
−
⇁
x⇁
n = (J f x0 ) × x0
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
20 / 73
A (Not So) Brief Tutorial on AD
−
−
⇁
x⇁
n = (J f x0 ) × x0
= (J fn xn−1 ) × · · · × (J f1 x0 ) × −
x⇁
0
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
20 / 73
A (Not So) Brief Tutorial on AD
−
−
⇁
x⇁
n = (J f x0 ) × x0
= (J fn xn−1 ) × · · · × (J f1 x0 ) × −
x⇁
0
−
−
⇁
x⇁
1 = (J f1 x0 ) × x0
..
.
−
⇁
⇁
xn = (J fn xn−1 ) × −
x−
n−1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
20 / 73
A (Not So) Brief Tutorial on AD
↽−
↽−
x0 = (J f x0 )⊤ × xn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
21 / 73
A (Not So) Brief Tutorial on AD
↽−
↽−
x0 = (J f x0 )⊤ × xn
↽−
= (J f1 x0 )⊤ × · · · × (J fn xn−1 )⊤ × xn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
21 / 73
A (Not So) Brief Tutorial on AD
↽−
↽−
x0 = (J f x0 )⊤ × xn
↽−
= (J f1 x0 )⊤ × · · · × (J fn xn−1 )⊤ × xn
↽−−−
↽−
xn−1 = (J fn xn−1 )⊤ × xn
..
.
↽−
↽−
x0 = (J f1 x0 )⊤ × x1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
21 / 73
A (Not So) Brief Tutorial on AD
y=A×x
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
22 / 73
A (Not So) Brief Tutorial on AD
y=A×x
= f (x)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
22 / 73
A (Not So) Brief Tutorial on AD
y=A×x
= f (x)
−
−
⇁
x⇁
n = (J f x0 ) × x0
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
22 / 73
A (Not So) Brief Tutorial on AD
y=A×x
= f (x)
−
−
⇁
x⇁
n = (J f x0 ) × x0
−
⇁
= f x0 −
x⇁
0
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
22 / 73
A (Not So) Brief Tutorial on AD
y=A×x
= f (x)
−
−
⇁
x⇁
n = (J f x0 ) × x0
−
⇁
= f x0 −
x⇁
0
↽−
↽−
x0 = (J f x0 )⊤ × xn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
22 / 73
A (Not So) Brief Tutorial on AD
y=A×x
= f (x)
−
−
⇁
x⇁
n = (J f x0 ) × x0
−
⇁
= f x0 −
x⇁
0
↽−
↽−
x0 = (J f x0 )⊤ × xn
↽
− ↽−
= f x0 xn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
22 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
⇁
Xn [; j] = f x0 −
ej
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
23 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
⇁
Xn [; j] = f x0 −
ej
= (J f x0 )[; j]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
23 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
⇁
Xn [; j] = f x0 −
ej
= (J f x0 )[; j]
↽−
↽
− ↽
−
X0 [; i] = f x0 ei
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
23 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
⇁
Xn [; j] = f x0 −
ej
= (J f x0 )[; j]
↽−
↽
− ↽
−
X0 [; i] = f x0 ei
= (J f x0 )⊤[; i]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
23 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
⇁
Xn [; j] = f x0 −
ej
= (J f x0 )[; j]
↽−
↽
− ↽
−
X0 [; i] = f x0 ei
= (J f x0 )⊤[; i]
= (J f x0 )[i; ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
23 / 73
A (Not So) Brief Tutorial on AD
y = B × (A × x)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
24 / 73
A (Not So) Brief Tutorial on AD
y = B × (A × x)
= (B × A) × x
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
24 / 73
A (Not So) Brief Tutorial on AD
y = B × (A × x)
= (B × A) × x
= g(f (x))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
24 / 73
A (Not So) Brief Tutorial on AD
y = B × (A × x)
= (B × A) × x
= g(f (x))
= (f ◦ g)(x)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
24 / 73
A (Not So) Brief Tutorial on AD
y = B × (A × x)
= (B × A) × x
= g(f (x))
= (f ◦ g)(x)
−
⇁
−
⇁
(J fn xn−1 ) × · · · × (J f1 x0 ) = ( f1 x0 ) ◦ · · · ◦ ( fn xn−1 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
24 / 73
A (Not So) Brief Tutorial on AD
y = B × (A × x)
= (B × A) × x
= g(f (x))
= (f ◦ g)(x)
−
⇁
−
⇁
(J fn xn−1 ) × · · · × (J f1 x0 ) = ( f1 x0 ) ◦ · · · ◦ ( fn xn−1 )
↽
−
↽
−
(J f1 x0 )⊤ × · · · × (J fn xn−1 )⊤ = ( fn xn−1 ) ◦ · · · ◦ ( f1 x0 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
24 / 73
A (Not So) Brief Tutorial on AD
xi−1 = ( xi−1 [1] · · ·
xi−1 [Li ]
· · · xi−1 [Ri ] · · · xi−1 [m] )
y
yy
yy
y
yy
yy
y
|
y
xi = fi xi−1 = ( xi−1 [1] · · · ui xi−1 [Ri ] · · · xi−1 [Ri ] · · · xi−1 [m] )
x[Li ] := ui x[Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
25 / 73
A (Not So) Brief Tutorial on AD
xi−1 = ( xi−1 [1] · · ·
xi−1 [Li ]
· · · xi−1 [Ri ] · · · xi−1 [m] )
y
yy
yy
y
yy
yy
y
|
y
xi = fi xi−1 = ( xi−1 [1] · · · ui xi−1 [Ri ] · · · xi−1 [Ri ] · · · xi−1 [m] )
x[Li ] := ui x[Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
25 / 73
A (Not So) Brief Tutorial on AD
xi−1 = ( xi−1 [1] · · ·
xi−1 [Li ]
· · · xi−1 [Ri ] · · · xi−1 [m] )
y
yy
yy
y
yy
yy
y
|
y
xi = fi xi−1 = ( xi−1 [1] · · · ui xi−1 [Ri ] · · · xi−1 [Ri ] · · · xi−1 [m] )
x[Li ] := ui x[Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
25 / 73
A (Not So) Brief Tutorial on AD
xi−1 = ( xi−1 [1] · · ·
xi−1 [Li ]
· · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] )
j
rr
jjjj
rrr jjjjjj
r
r
r
jjj
rrr jjjj
xrrrujjjj
xi = fi xi−1 = ( xi−1 [1] · · · bi (xi−1 [Ri ], xi−1 [Si ]) · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] )
x[Li ] := b (x[Ri ], x[Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
26 / 73
A (Not So) Brief Tutorial on AD
xi−1 = ( xi−1 [1] · · ·
xi−1 [Li ]
· · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] )
j
rr
jjjj
rrr jjjjjj
r
r
r
jjj
rrr jjjj
xrrrujjjj
xi = fi xi−1 = ( xi−1 [1] · · · bi (xi−1 [Ri ], xi−1 [Si ]) · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] )
x[Li ] := b (x[Ri ], x[Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
26 / 73
A (Not So) Brief Tutorial on AD
xi−1 = ( xi−1 [1] · · ·
xi−1 [Li ]
· · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] )
j
rr
jjjj
rrr jjjjjj
r
r
r
jjj
rrr jjjj
xrrrujjjj
xi = fi xi−1 = ( xi−1 [1] · · · bi (xi−1 [Ri ], xi−1 [Si ]) · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] )
x[Li ] := b (x[Ri ], x[Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
26 / 73
A (Not So) Brief Tutorial on AD
Li
↓
Ri
↓
0
u′
1
..
.
1
Li →
1
..
.
Ri →
1
..
.
1
u′ = D ui xi−1 [Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
27 / 73
A (Not So) Brief Tutorial on AD
Li
↓
Ri
↓
Si
↓
0
b′1
b′2
1
..
.
1
Li →
1
..
Ri →
.
1
..
.
Si →
1
..
.
1
b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ])
b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
28 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
⇁
xi = fi xi−1 −
x−
i−1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
29 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
⇁
xi = fi xi−1 −
x−
i−1
⇁
= (J fi xi−1 ) × −
x−
i−1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
29 / 73
A (Not So) Brief Tutorial on AD

















−
⇁
−
⇁
⇁
xi = fi xi−1 −
x−
i−1
⇁
= (J fi xi−1 ) × −
x−
i−1


−
⇁
1
x−
i−1 [1]


..
..


.
.


−
−
⇁

1
xi−1 [Li − 1] 


⇁


0
u′
u′ × −
x−
i−1 [Ri ] 

−
−
⇁


1
xi−1 [Li + 1]  = 


..
..


.
.


−
−
⇁


1
xi−1 [Ri ]




..


.
−
x−⇁[m]
i−1
  −−⇁
xi−1 [1]
  ..
  .
  −−⇁
  xi−1 [Li − 1]
  −−⇁
  xi−1 [Li ]
  −−⇁
  xi−1 [Li + 1]
 
  ..
  .
  −−⇁
  xi−1 [Ri ]
 
  ..
..
  .
.
−
⇁
x−
1
i−1 [m]

















u′ = D ui xi−1 [Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
29 / 73
A (Not So) Brief Tutorial on AD

















−
⇁
−
⇁
⇁
xi = fi xi−1 −
x−
i−1
⇁
= (J fi xi−1 ) × −
x−
i−1


−
⇁
1
x−
i−1 [1]


..
..


.
.


−
−
⇁

1
xi−1 [Li − 1] 


⇁


0
u′
u′ × −
x−
i−1 [Ri ] 

−
−
⇁


1
xi−1 [Li + 1]  = 


..
..


.
.


−
−
⇁


1
xi−1 [Ri ]




..


.
−
x−⇁[m]
i−1
  −−⇁
xi−1 [1]
  ..
  .
  −−⇁
  xi−1 [Li − 1]
  −−⇁
  xi−1 [Li ]
  −−⇁
  xi−1 [Li + 1]
 
  ..
  .
  −−⇁
  xi−1 [Ri ]
 
  ..
..
  .
.
−
⇁
x−
1
i−1 [m]

















u′ = D ui xi−1 [Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
29 / 73
A (Not So) Brief Tutorial on AD

















−
⇁
−
⇁
⇁
xi = fi xi−1 −
x−
i−1
⇁
= (J fi xi−1 ) × −
x−
i−1


−
⇁
1
x−
i−1 [1]


..
..


.
.


−
−
⇁

1
xi−1 [Li − 1] 


⇁


0
u′
u′ × −
x−
i−1 [Ri ] 

−
−
⇁


1
xi−1 [Li + 1]  = 


..
..


.
.


−
−
⇁


1
xi−1 [Ri ]




..


.
−
x−⇁[m]
i−1
  −−⇁
xi−1 [1]
  ..
  .
  −−⇁
  xi−1 [Li − 1]
  −−⇁
  xi−1 [Li ]
  −−⇁
  xi−1 [Li + 1]
 
  ..
  .
  −−⇁
  xi−1 [Ri ]
 
  ..
..
  .
.
−
⇁
1
x−
i−1 [m]

















u′ = D ui xi−1 [Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
29 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
−−⇁
xi = fi xi−1 x
i−1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
30 / 73
A (Not So) Brief Tutorial on AD
−
⇁
−
⇁
−−⇁
xi = fi xi−1 x
i−1
⇁
= (J fi xi−1 ) × −
x−
i−1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
30 / 73
A (Not So) Brief Tutorial on AD






















−
⇁
−
⇁
−−⇁
xi = fi xi−1 x
i−1
⇁
= (J fi xi−1 ) × −
x−
i−1


−
⇁
1
x−
i−1 [1]


..
..


.
.


−
⇁


1
x−
i−1 [Li − 1]


⇁
−−⇁
′


0
b′1
b′1 × x−−
i−1 [Ri ] + b2 × xi−1 [Si ] 

−
⇁


1
x−
i−1 [Li + 1]




..
..
 = 
.
.


−
−
⇁


1
xi−1 [Ri ]




..


.


−
⇁


x−
i−1 [Si ]




..


.
−
−
⇁
xi−1 [m]
b′2
..
.
1
  −−⇁
xi−1 [1]
  ..
  .
  −−⇁
  xi−1 [Li − 1]
  −−⇁
  xi−1 [Li ]
  −−⇁
  xi−1 [Li + 1]
 
  ..
  .
  −−⇁
  xi−1 [Ri ]
 
  ..
  .
  −−⇁
  xi−1 [Si ]
 
  ..
..
  .
.
−
⇁
x−
1
i−1 [m]






















b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ])
b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
30 / 73
A (Not So) Brief Tutorial on AD






















−
⇁
−
⇁
−−⇁
xi = fi xi−1 x
i−1
⇁
= (J fi xi−1 ) × −
x−
i−1


−
⇁
1
x−
i−1 [1]


..
..


.
.


−
⇁


1
x−
i−1 [Li − 1]


⇁
−−⇁
′


0
b′1
b′1 × −
x−
i−1 [Ri ] + b2 × xi−1 [Si ] 

−
⇁


1
x−
i−1 [Li + 1]




..
..
 = 
.
.


−
−
⇁


1
xi−1 [Ri ]




..


.


−
⇁


x−
i−1 [Si ]




..


.
−
−
⇁
xi−1 [m]
b′2
..
.
1
  −−⇁
xi−1 [1]
  ..
  .
  −−⇁
  xi−1 [Li − 1]
  −−⇁
  xi−1 [Li ]
  −−⇁
  xi−1 [Li + 1]
 
  ..
  .
  −−⇁
  xi−1 [Ri ]
 
  ..
  .
  −−⇁
  xi−1 [Si ]
 
  ..
..
  .
.
−
⇁
x−
1
i−1 [m]






















b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ])
b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
30 / 73
A (Not So) Brief Tutorial on AD






















−
⇁
−
⇁
−−⇁
xi = fi xi−1 x
i−1
⇁
= (J fi xi−1 ) × −
x−
i−1


−
⇁
1
x−
i−1 [1]


..
..


.
.


−
⇁


1
x−
i−1 [Li − 1]


⇁
−−⇁
′


0
b′1
b′1 × −
x−
i−1 [Ri ] + b2 × xi−1 [Si ] 

−
⇁


1
x−
i−1 [Li + 1]




..
..
 = 
.
.


−
−
⇁


1
xi−1 [Ri ]




..


.


−
⇁


x−
i−1 [Si ]




..


.
−
−
⇁
xi−1 [m]
b′2
..
.
1
  −−⇁
xi−1 [1]
  ..
  .
  −−⇁
  xi−1 [Li − 1]
  −−⇁
  xi−1 [Li ]
  −−⇁
  xi−1 [Li + 1]
 
  ..
  .
  −−⇁
  xi−1 [Ri ]
 
  ..
  .
  −−⇁
  xi−1 [Si ]
 
  ..
..
  .
.
−
⇁
1
x−
i−1 [m]






















b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ])
b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
30 / 73
A (Not So) Brief Tutorial on AD
↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
31 / 73
A (Not So) Brief Tutorial on AD
↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
↽
−
= (J fi xi−1 )⊤ × xi
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
31 / 73
A (Not So) Brief Tutorial on AD
↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
↽
−
= (J fi xi−1 )⊤ × xi
 ↽
−
xi [1]
 ..
 .
 ↽
 −
 xi [Li − 1]
 0

 ↽
−
 xi [Li + 1]
 .
 .
 .
 ′ ↽
↽
−
 u × −
xi [Li ] + xi [Ri ]

 ..
 .
↽
−
xi [m]

















 = 

















1
..
.
1
0
1
..
u′
.
1
..
.
1
















 ↽
−
xi [1]
 .
 ..

−
 ↽
 xi [Li − 1]
 ↽
 −
xi [Li ]
 ↽
 −
x
 i [Li + 1]
 .
 ..

−
 ↽
 xi [Ri ]
 .
 .
 .
↽
−
xi [m]



















u′ = D ui xi−1 [Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
31 / 73
A (Not So) Brief Tutorial on AD
↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
↽
−
= (J fi xi−1 )⊤ × xi
 ↽
−
xi [1]
 ..
 .
 ↽
 −
 xi [Li − 1]
 0

 ↽
−
 xi [Li + 1]
 .
 .
 .
 ′ ↽
↽
−
 u × −
xi [Li ] + xi [Ri ]

 ..
 .
↽
−
xi [m]

















 = 

















1
..
.
1
0
1
..
u′
.
1
..
.
1
















 ↽
−
xi [1]
 .
 ..

−
 ↽
 xi [Li − 1]
 ↽
 −
xi [Li ]
 ↽
 −
x
 i [Li + 1]
 .
 ..

−
 ↽
 xi [Ri ]
 .
 .
 .
↽
−
xi [m]



















u′ = D ui xi−1 [Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
31 / 73
A (Not So) Brief Tutorial on AD
↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
↽
−
= (J fi xi−1 )⊤ × xi
 ↽
−
xi [1]
 ..
 .
 ↽
 −
 xi [Li − 1]
 0

 ↽
−
 xi [Li + 1]
 .
 .
 .
 ′ ↽
↽
−
 u × −
xi [Li ] + xi [Ri ]

 ..
 .
↽
−
xi [m]

















 = 

















1
..
.
1
0
1
..
u′
.
1
..
.
1
















 ↽
−
xi [1]
 .
 ..

−
 ↽
 xi [Li − 1]
 ↽
 −
xi [Li ]
 ↽
 −
 xi [Li + 1]
 .
 ..

−
 ↽
 xi [Ri ]
 .
 .
 .
↽
−
xi [m]



















u′ = D ui xi−1 [Ri ]
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
31 / 73
A (Not So) Brief Tutorial on AD
↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
32 / 73
A (Not So) Brief Tutorial on AD
↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
↽
−
= (J fi xi−1 )⊤ × xi
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
32 / 73
A (Not So) Brief Tutorial on AD
























↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
↽
−
= (J fi xi−1 )⊤ × xi

↽
−

xi [1]
1

..


..
.


.
↽
−




xi [Li − 1]
1




0
0


↽
−


1
xi [Li + 1]




..
..


.
.
 = 
↽
−
↽
−

b′1
1
b′1 × xi [Li ] + xi [Ri ] 




..
..


.
.


′
↽
−
↽
−


′
b
1
2

b2 × xi [Li ] + xi [Si ] 


..
..


.
.

↽
−
1
xi [m]






















 ↽
−
xi [1]
 .
 ..
 ↽
 −
 xi [Li − 1]
 ↽
 −
xi [Li ]
 ↽
 −
 xi [Li + 1]
 .
 ..
 ↽
 −
 xi [Ri ]
 .
 ..

 ↽
−
 xi [Si ]

 ..
 .
↽
−
xi [m]
























b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ])
b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
32 / 73
A (Not So) Brief Tutorial on AD
























↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
↽
−
= (J fi xi−1 )⊤ × xi

↽
−

xi [1]
1

..


..
.


.
↽
−




xi [Li − 1]
1




0
0


↽
−


1
xi [Li + 1]




..
..


.
.
 = 
↽
−
↽
−

b′1
1
b′1 × xi [Li ] + xi [Ri ] 




..
..


.
.


′
↽
−
↽
−


′
b
1
2

b2 × xi [Li ] + xi [Si ] 


..
..


.
.

↽
−
1
xi [m]






















 ↽
−
xi [1]
 .
 ..
 ↽
 −
 xi [Li − 1]
 ↽
 −
xi [Li ]
 ↽
 −
 xi [Li + 1]
 .
 ..
 ↽
 −
 xi [Ri ]
 .
 ..

 ↽
−
 xi [Si ]

 ..
 .
↽
−
xi [m]
























b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ])
b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
32 / 73
A (Not So) Brief Tutorial on AD
























↽−−
↽
−
↽
−
xi−1 = fi xi−1 xi
↽
−
= (J fi xi−1 )⊤ × xi

↽
−

xi [1]
1

..


..
.


.
↽
−




xi [Li − 1]
1




0
0


↽
−


1
xi [Li + 1]




..
..


.
.
 = 
↽
−
↽
−

b′1
1
b′1 × xi [Li ] + xi [Ri ] 




..
..


.
.


′
↽
−
↽
−


′
b
1
2

b2 × xi [Li ] + xi [Si ] 


..
..


.
.

↽
−
1
xi [m]






















 ↽
−
xi [1]
 .
 ..
 ↽
 −
 xi [Li − 1]
 ↽
 −
xi [Li ]
 ↽
 −
 xi [Li + 1]
 .
 ..
 ↽
 −
 xi [Ri ]
 .
 ..

 ↽
−
 xi [Si ]

 ..
 .
↽
−
xi [m]
























b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ])
b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ])
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
32 / 73
A (Not So) Brief Tutorial on AD
xn
= fn xn−1
HH
H
HH
H
Jeffrey Mark Siskind (Purdue/ECE)
−
x⇁
n
=
..
.
=
= f1 x0
..
.
−
⇁ −
f1 x0 x⇁
0
−
⇁
⇁
fn xn−1 −
x−
n−1
−
x⇁
1
x1
x1
−
x⇁
1
= f1 x0
−
⇁
= f1 x0 −
x⇁
0
..
.
xn
−
⇁
xn
= fn xn−1
−
⇁
⇁
= fn xn−1 −
x−
n−1
AD of Functional Programs
Lecture Circuit 2009
33 / 73
A (Not So) Brief Tutorial on AD
xn
= fn xn−1
HH
H
HH
H
Jeffrey Mark Siskind (Purdue/ECE)
−
x⇁
n
=
..
.
=
= f1 x0
..
.
−
⇁ −
f1 x0 x⇁
0
−
⇁
⇁
fn xn−1 −
x−
n−1
−
x⇁
1
x1
x1
−
x⇁
1
= f1 x0
−
⇁
= f1 x0 −
x⇁
0
..
.
xn
−
⇁
xn
= fn xn−1
−
⇁
⇁
= fn xn−1 −
x−
n−1
AD of Functional Programs
Lecture Circuit 2009
33 / 73
A (Not So) Brief Tutorial on AD
xn
= fn xn−1
HH
H
HH
H
Jeffrey Mark Siskind (Purdue/ECE)
−
x⇁
n
=
..
.
=
= f1 x0
..
.
−
⇁ −
f1 x0 x⇁
0
−
⇁
⇁
fn xn−1 −
x−
n−1
−
x⇁
1
x1
x1
−
x⇁
1
= f1 x0
−
⇁
= f1 x0 −
x⇁
0
..
.
xn
−
⇁
xn
= fn xn−1
−
⇁
⇁
= fn xn−1 −
x−
n−1
AD of Functional Programs
Lecture Circuit 2009
33 / 73
A (Not So) Brief Tutorial on AD
x1 = f1 x0
−
⇁
−
⇁
x1 = f1 x0 −
x⇁
0
..
.
xn = fn xn−1
−
⇁
−
⇁
⇁
xn = fn xn−1 −
x−
n−1
−
⇁ −
⇁
(x1 , −
x⇁
1 ) = ((f1 x0 ), ( f1 x0 x0 ))
..
.
−
⇁
−
⇁
⇁
(xn , xn ) = ((fn xn−1 ), ( fn xn−1 −
x−
n−1 ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
34 / 73
A (Not So) Brief Tutorial on AD
x1 = f1 x0
..
.
xn = fn xn−1








−
⇀−
−
⇀

x⇀

1 = f1 x0


..
.


−
−
⇀ −⇀

⇀
xn = fn −
xn−1
−
⇀
⇁
x ≡ (x, −
x)
−
⇀−
−
⇁ ⇁
f ⇀
x ≡ ((f x), ( f x −
x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
35 / 73
A (Not So) Brief Tutorial on AD
xLi := ui xRi
xLi := bi (xRi , xSi )
⇀
−
−
⇀−
x⇀
Li := ui xRi
−
⇀
−
x⇀ := b (−
x⇀, −
x⇀)
Li
i
Ri
Si
−
⇀
⇁
x ≡ (x, −
x)
−
⇀
⇀
⇁
u −
x ≡ ((u x), ((D u x) × −
x ))
−
⇀ −
⇀
⇁
⇁
b (⇀
x1 , −
x2 ) ≡ ((b (x1 , x2 )), ((D1 b (x1 , x2 )) × −
x1 ) + ((D2 b (x1 , x2 )) × −
x2 )))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
36 / 73
A (Not So) Brief Tutorial on AD
H
xn
↽−−−
xn−1
↽−
x0
Jeffrey Mark Siskind (Purdue/ECE)
HH
H
HH
x1
xn = fn xn−1
↽−−−
↽
−
↽−
xn−1 = fn xn−1 xn
..
.
↽−
↽
− ↽−
x0 = f1 x0 x1
x1 = f1 x0
..
.
= f1 x0
..
.
= fn xn−1
↽
−
↽−
= fn xn−1 xn
..
.
↽
− ↽−
= f1 x0 x1
AD of Functional Programs
Lecture Circuit 2009
37 / 73
A (Not So) Brief Tutorial on AD
H
xn
↽−−−
xn−1
↽−
x0
Jeffrey Mark Siskind (Purdue/ECE)
HH
H
HH
x1
xn = fn xn−1
↽−−−
↽
−
↽−
xn−1 = fn xn−1 xn
..
.
↽−
↽
− ↽−
x0 = f1 x0 x1
x1 = f1 x0
..
.
= f1 x0
..
.
= fn xn−1
↽
−
↽−
= fn xn−1 xn
..
.
↽
− ↽−
= f1 x0 x1
AD of Functional Programs
Lecture Circuit 2009
37 / 73
A (Not So) Brief Tutorial on AD
H
xn
↽−−−
xn−1
↽−
x0
Jeffrey Mark Siskind (Purdue/ECE)
HH
H
HH
x1
xn = fn xn−1
↽−−−
↽
−
↽−
xn−1 = fn xn−1 xn
..
.
↽−
↽
− ↽−
x0 = f1 x0 x1
x1 = f1 x0
..
.
= f1 x0
..
.
= fn xn−1
↽
−
↽−
= fn xn−1 xn
..
.
↽
− ↽−
= f1 x0 x1
AD of Functional Programs
Lecture Circuit 2009
37 / 73
A (Not So) Brief Tutorial on AD
H
xn
↽−−−
xn−1
↽−
x0
Jeffrey Mark Siskind (Purdue/ECE)
HH
H
HH
x1
xn = fn xn−1
↽−−−
↽
−
↽−
xn−1 = fn xn−1 xn
..
.
↽−
↽
− ↽−
x0 = f1 x0 x1
x1 = f1 x0
..
.
= f1 x0
..
.
= fn xn−1
↽
−
↽−
= fn xn−1 xn
..
.
↽
− ↽−
= f1 x0 x1
AD of Functional Programs
Lecture Circuit 2009
37 / 73
A (Not So) Brief Tutorial on AD
H
xn
↽−−−
xn−1
↽−
x0
Jeffrey Mark Siskind (Purdue/ECE)
HH
H
HH
x1
xn = fn xn−1
↽−−−
↽
−
↽−
xn−1 = fn xn−1 xn
..
.
↽−
↽
− ↽−
x0 = f1 x0 x1
x1 = f1 x0
..
.
= f1 x0
..
.
= fn xn−1
↽
−
↽−
= fn xn−1 xn
..
.
↽
− ↽−
= f1 x0 x1
AD of Functional Programs
Lecture Circuit 2009
37 / 73
A (Not So) Brief Tutorial on AD
x1 = f1 x0
↽
−
↽
− ↽
−
x1 = λ x x0 ( f1 x0 x )
..
.
xn = fn xn−1
↽
−
↽
− ↽
−
xn = λ x xn−1 ( f1 x0 x )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
38 / 73
A (Not So) Brief Tutorial on AD
x1 = f1 x0
↽
−
↽
− ↽
−
x1 = λ x x0 ( f1 x0 x )
..
.
xn = fn xn−1
↽
−
↽
− ↽
−
xn = λ x xn−1 ( f1 x0 x )
↽
−
↽
− ↽
−
(x1 , x1 ) = ((f1 x0 ), (λ x x0 ( f1 x0 x )))
..
.
↽
−
↽
− ↽
−
(xn , xn ) = ((fn xn−1 ), (λ x xn−1 ( f1 x0 x )))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
39 / 73
A (Not So) Brief Tutorial on AD
x1 = f1 x0
..
.
xn = fn xn−1








↼
−↼
↼
−

x−

1 = f1 x0


..
.


↼
↼
− −−

−
xn = fn ↼
xn−1
↼
x− ≡ (x, x)
↼
−↼
↽
− ↽
−
↽
−
f x− ≡ ((f x), (λ x x ( f x x )))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
40 / 73
A (Not So) Brief Tutorial on AD
↼
−
↽
−
↽
− ↽
−
f x ≡ begin x := λ x x ( f x x );
(f x) end
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
41 / 73
A (Not So) Brief Tutorial on AD
xLi := ui xRi
xLi := bi (xRi , xSi )

↽−
↽−

x−
x := λ[ ] begin xRi +:= (D ui ↼

Ri ) × xLi ;


↽−
xLi := 0;

x
[ ] end


 ↼
−
−:= u ↼
x
x
i Ri
 Li
↽−
↽−
↼
−

x−
x
:=
λ[
] begin xRi +:= (D1 bi (↼
Ri , xSi )) × xLi ;


↽−
↽−

↼
−


x−
xSi +:= (D2 bi (↼
Ri , xSi )) × xLi ;
↽−
xLi := 0;




x
[ ] end

 ↼
x−)
x−, ↼
x−:= b (↼
Li
i
Ri
Si
↼
x− ≡ x
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
42 / 73
A (Not So) Brief Tutorial on AD
xLi := ui xRi
xLi := bi (xRi , xSi )
 ↼
↼
−
x−
Li := ui xRi



..

.
↽
−
↽−

xRi +:= (D ui ↼
x−

Ri ) × xLi ;

 ↽−
xLi := 0
 ↼
−
↼
−
x
x−
 Li := bi (↼
Ri , xSi )



..


.

↽−
↽−
↼
−
xRi +:= (D1 bi (↼
x−
Ri , xSi )) × xLi ;

↽−
↽−

↼
−


xSi +:= (D2 bi (↼
x−
Ri , xSi )) × xLi ;


 ↽−
xLi := 0
↼
x− ≡ x
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
43 / 73
The Functional Reverse-Mode Transformation
xL1 := u1 xS1
..
.
xLn := un xSn
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
44 / 73
The Functional Reverse-Mode Transformation
xL1
xLn
..
.

:= u1 xS1 

:= un xSn
Jeffrey Mark Siskind (Purdue/ECE)



xL1










xLn



↽
−


x1







↽−
xm

↽
−


x

S


↽−n


xLn








↽−



xS1


 ↽−
xL1
:=
u1 xS1
:=
:=
..
.
un xSn
0
..
.
:=
0
↽−
+:= (D un xSn ) × xLn
:= 0
..
.
↽−
+:= (D u1 xS1 ) × xL1
:=
=0
AD of Functional Programs
Lecture Circuit 2009
45 / 73
The Functional Reverse-Mode Transformation
xL1
xLn
..
.

:= u1 xS1 

:= un xSn
Jeffrey Mark Siskind (Purdue/ECE)



xL1










xLn



↽
−


x1







↽−
xm

↽
−


x

S


↽−n


xLn








↽−



xS1


 ↽−
xL1
:=
u1 xS1
:=
:=
..
.
un xSn
0
..
.
:=
0
↽−
+:= (D un xSn ) × xLn
:= 0
..
.
↽−
+:= (D u1 xS1 ) × xL1
:=
=0
AD of Functional Programs
Lecture Circuit 2009
45 / 73
The Functional Reverse-Mode Transformation
xL1
xLn
..
.

:= u1 xS1 

:= un xSn
Jeffrey Mark Siskind (Purdue/ECE)



xL1










xLn



↽
−


x1







↽−
xm

↽
−


x

S


↽−n


xLn








↽−



xS1


 ↽−
xL1
:=
u1 xS1
:=
:=
..
.
un xSn
0
..
.
:=
0
↽−
+:= (D un xSn ) × xLn
:= 0
..
.
↽−
+:= (D u1 xS1 ) × xL1
:=
=0
AD of Functional Programs
Lecture Circuit 2009
45 / 73
The Functional Reverse-Mode Transformation
xL1
xLn
..
.

:= u1 xS1 

:= un xSn
Jeffrey Mark Siskind (Purdue/ECE)



xL1










xLn



↽
−


x1







↽−
xm

↽
−


x

S


↽−n


xLn








↽−



xS1


 ↽−
xL1
:=
u1 xS1
:=
:=
..
.
un xSn
0
..
.
:=
0
↽−
+:= (D un xSn ) × xLn
:= 0
..
.
↽−
+:= (D u1 xS1 ) × xL1
:=
=0
AD of Functional Programs
Lecture Circuit 2009
45 / 73
The Functional Reverse-Mode Transformation
x1
..
.
xn
= u1 xS1
= un xSn
Jeffrey Mark Siskind (Purdue/ECE)








































=
x1
..
.
xn =
↽
−
x0 :=
..
.
↽−−−
xn−1 :=
↽−
xSn +:=
..
.
↽−
xS1 +:=
AD of Functional Programs
u1 xS1
un xSn
0
0
↽
−
(D un xSn ) × xn
↽
−
(D u1 xS1 ) × x1
Lecture Circuit 2009
46 / 73
The Functional Reverse-Mode Transformation
x1
..
.
xn
= u1 xS1
= un xSn
Jeffrey Mark Siskind (Purdue/ECE)








































=
x1
..
.
xn =
↽
−
x0 :=
..
.
↽−−−
xn−1 :=
↽−
xSn +:=
..
.
↽−
xS1 +:=
AD of Functional Programs
u1 xS1
un xSn
0
0
↽
−
(D un xSn ) × xn
↽
−
(D u1 xS1 ) × x1
Lecture Circuit 2009
46 / 73
The Functional Reverse-Mode Transformation
−
↽
−
△ ↽
ui = λ x (D ui xSi ) × x
x1 = u1 xS1
..
.
xn = un xSn
Jeffrey Mark Siskind (Purdue/ECE)








































x1 =
..
.
u1 xS1
xn =
↽
−
x0 :=
..
.
↽−−−
xn−1 :=
↽−
xSn +:=
..
.
↽−
xS1 +:=
un xSn
0
AD of Functional Programs
0
↽
−
un xn
↽
−
u1 x1
Lecture Circuit 2009
47 / 73
The Functional Reverse-Mode Transformation
↽
−
↽
−
△
↼
u− x = ((u x), (λ x (D u x) × x ))
x1 = u1 xS1
..
.
xn = un xSn
Jeffrey Mark Siskind (Purdue/ECE)






(x1 , x1 )










(xn , xn )



↽
−


x0



















=
..
.
=
:=
..
.
↼
u−
1 xS1
↼
u−
n xSn
0
↽−−−
xn−1 :=
0
↽
−
↽−
xSn +:= xn xn
..
.
↽−
↽
−
xS1 +:= x1 x1
AD of Functional Programs
Lecture Circuit 2009
48 / 73
The Functional Reverse-Mode Transformation
↽
−
↽
−
△
↼
u− x = ((u x), (λ x (D u x) × x ))
x1 = u1 xS1
..
.
xn = un xSn
Jeffrey Mark Siskind (Purdue/ECE)






(x1 , x1 )










(xn , xn )



↽
−


x0



















=
..
.
=
:=
..
.
↼
u−
1 xS1
↼
u−
n xSn
0
↽−−−
xn−1 :=
0
↽
−
↽−
xSn +:= xn xn
..
.
↽−
↽
−
xS1 +:= x1 x1
AD of Functional Programs
Lecture Circuit 2009
48 / 73
The Functional Reverse-Mode Transformation
x1 = xR1 xS1
..
.
xn = xRn xSn
Jeffrey Mark Siskind (Purdue/ECE)








































(↼
x−
1 , x1 ) =
..
.
(↼
x−
,
x
)
=
n n
↽
−
x0 :=
..
.
↽−−−
xn−1 :=
↽−
xSn +:=
..
.
↽−
xS1 +:=
AD of Functional Programs
↼
xR−1 ↼
x−
S1
↼
xR−n ↼
x−
Sn
0
0
↽
−
xn xn
↽
−
x1 x1
Lecture Circuit 2009
49 / 73
The Functional Reverse-Mode Transformation
x1 = xR1 xS1
..
.
xn = xRn xSn
Jeffrey Mark Siskind (Purdue/ECE)








































(↼
x−
1 , x1 ) =
..
.
(↼
x−
,
x
)
=
n n
↽
−
x0 :=
..
.
↽−−−
xn−1 :=
↽−
xSn +:=
..
.
↽−
xS1 +:=
AD of Functional Programs
↼
xR−1 ↼
x−
S1
↼
x−
xR−n ↼
Sn
0
0
↽
−
xn xn
↽
−
x1 x1
Lecture Circuit 2009
49 / 73
The Functional Reverse-Mode Transformation
x1 = xR1 xS1
..
.
xn = xRn xSn
Jeffrey Mark Siskind (Purdue/ECE)








































(↼
x−
1 , x1 ) =
..
.
(↼
x−
,
x
)
=
n n
↽
−
x0 :=
..
.
↽−−−
xn−1 :=
↽−
xSn +:=
..
.
↽−
xS1 +:=
AD of Functional Programs
↼
xR−1 ↼
x−
S1
↼
x−
xR−n ↼
Sn
0
0
↽
−
xn xn
↽
−
x1 x1
Lecture Circuit 2009
49 / 73
The Functional Reverse-Mode Transformation
x1 = xR1 xS1
..
.
xn = xRn xSn
Jeffrey Mark Siskind (Purdue/ECE)








































(↼
x−
1 , x1 ) =
..
.
(↼
x−
,
x
)
=
n n
↽
−
x0 :=
..
.
↽−−−
xn−1 :=
↽−
xSn +:=
..
.
↽−
xS1 +:=
AD of Functional Programs
↼
xR−1 ↼
x−
S1
↼
x−
xR−n ↼
Sn
0
0
↽
−
xn xn
↽
−
x1 x1
Lecture Circuit 2009
49 / 73
The Functional Reverse-Mode Transformation
x1 = xR1 xS1
..
.
xn = xRn xSn
Jeffrey Mark Siskind (Purdue/ECE)








































(↼
x−
1 , x1 ) =
..
.
(↼
x−
,
x
)
=
n n
↽
−
x0 :=
..
.
↽−−−
xn−1 :=
↽−
xSn ⊕:=
..
.
↽−
xS1 ⊕:=
AD of Functional Programs
↼
xR−1 ↼
x−
S1
↼
x−
xR−n ↼
Sn
←
−−1
↼
x−
0 (J
0)
←
−
−−)
xn−1
0 ( J −1 ↼
↽
−
xn xn
↽
−
x1 x1
Lecture Circuit 2009
50 / 73
The Functional Reverse-Mode Transformation

△
λx0 let x1 = xR1 xS1 ;




..
.
△

xn = xRn xSn 



in xn end
Jeffrey Mark Siskind (Purdue/ECE)

△ ↼
−↼
−
↼
−

λ↼
x−

0 let ( x1 , x1 ) = xR1 xS1 ;



..


.




△ −↼
↼
−


( xn , xn ) = ↼
xRn x−
Sn



↽
−
↽
−△ ←
−

↼
−

x−
in ( xn , (λ xn let x0 = 0 ( J −1 ↼

0 );


..

.
↽−−− △ ←
−

−−);

xn−1
xn−1 = 0 ( J −1 ↼



↽
−
↽
−
△


xSn ⊕ = xn xn ;




..


.



↽− △ ↽
−



xS1 ⊕ = x1 x1


↽
−

in x0 end)) end
AD of Functional Programs
Lecture Circuit 2009
51 / 73
The Functional Reverse-Mode Transformation

△
λx0 let x1 = xR1 xS1 ;




..
.
△

xn = xRn xSn 



in xn end

△ ↼
−↼
−
↼
−

λ↼
x−

0 let ( x1 , x1 ) = xR1 xS1 ;



..


.




△ −↼
↼
−


( xn , xn ) = ↼
xRn x−
Sn



↽
−
↽
−△ ←
−

↼
−

x−
in ( xn , (λ xn let x0 = 0 ( J −1 ↼

0 );


..

.
↽−−− △ ←
−

−−);

xn−1
xn−1 = 0 ( J −1 ↼



↽
−
↽
−
△


xSn ⊕ = xn xn ;




..


.



↽− △ ↽
−



xS1 ⊕ = x1 x1


↽
−

in x0 end)) end
The above is a white lie. The truth is a lot more complicated.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
51 / 73
Modularity
∇f x
Jeffrey Mark Siskind (Purdue/ECE)
△
=
∂f (x)
∂f (x)
, . . . , ∂x
∂x1
n
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Modularity
△
∇f x
=
G RADIENT D ESCENT f x0
=
Jeffrey Mark Siskind (Purdue/ECE)
△
∂f (x)
∂f (x)
, . . . , ∂x
∂x1
n
. . . xi+1 := . . . ∇ f xi . . .
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Modularity
△
∇f x
=
G RADIENT D ESCENT f x0
=
argmin f
=
Jeffrey Mark Siskind (Purdue/ECE)
△
△
∂f (x)
∂f (x)
, . . . , ∂x
∂x1
n
. . . xi+1 := . . . ∇ f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Modularity
△
∇f x
=
G RADIENT D ESCENT f x0
=
argmin f
=
N EUTRON F LUX r
=
Jeffrey Mark Siskind (Purdue/ECE)
△
△
△
∂f (x)
∂f (x)
, . . . , ∂x
∂x1
n
. . . xi+1 := . . . ∇ f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
classified
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Modularity
△
∇f x
=
G RADIENT D ESCENT f x0
=
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
Jeffrey Mark Siskind (Purdue/ECE)
△
△
△
△
∂f (x)
∂f (x)
, . . . , ∂x
∂x1
n
. . . xi+1 := . . . ∇ f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Modularity
△
∇f x
=
G RADIENT D ESCENT f x0
=
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
△
△
∂f (x)
∂f (x)
, . . . , ∂x
∂x1
n
. . . xi+1 := . . . ∇ f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
△
∇f x
=
G RADIENT D ESCENT f x0
=
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
△
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
. . . xi+1 := . . . ∇ f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
G RADIENT D ESCENT f x0
=
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
△
△
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
. . . xi+1 := . . . ∇ f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
△
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
△
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
D EVIATION r
=
△
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
r∗
=
△
−−−−−−−⇀
argmin D EVIATION
△
classified
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
D EVIATION r
=
D EVIATION
r∗
△
△
ADIFOR
△
=
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
−
⇀
−
⇀
⇁
⇁
( f x ⊲−
e1 ), . . . , ( f x ⊲ −
en )
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
−
⇀
∇ f x
=
△
↼
−
... f x...
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
G RADIENT D ESCENT f x0
=
△
−
⇀
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
G RADIENT D ESCENT f x0
=
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
↼
−
G RADIENT D ESCENT f x0
=
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
−
⇀
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
↼
−
G RADIENT D ESCENT f x0
=
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
argmin f
=
△
↼
−
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
↼
−
G RADIENT D ESCENT f x0
=
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
argmin f
=
△
↼
−
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
−−−−−−−⇀
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
↼
−
G RADIENT D ESCENT f x0
=
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
argmin f
=
△
↼
−
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
−−−−−−−−−−⇀
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
−−−−−−−⇀
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
↼
−
G RADIENT D ESCENT f x0
=
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
argmin f
=
△
↼
−
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
ADIFOR
△
=
ADIFOR
△
=
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
↼
−
G RADIENT D ESCENT f x0
=
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
argmin f
=
△
↼
−
. . . G RADIENT D ESCENT f x0 . . .
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
TAPENADE
△
=
TAPENADE
△
=
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
△
↼
−
... f x...
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
↼
−
. . . G RADIENT D ESCENT f x0 . . .
=
↼
−
G RADIENT D ESCENT f x0
↼
−
N EWTONS M ETHOD f x0
↼
−
argmin f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
△
↼
−
... f x...
△
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
↼
−
. . . N EWTONS M ETHOD f x0 . . .
=
↼
−
G RADIENT D ESCENT f x0
↼
−
N EWTONS M ETHOD f x0
↼
−
argmin f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
Hf x
=
△
△
↼
−
G RADIENT D ESCENT f x0
↼
−
N EWTONS M ETHOD f x0
↼
−
argmin f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
↼
−
... f x...
△
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
↼
−
. . . N EWTONS M ETHOD f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
Hf x
=
△
−
⇀
↼
−
... f ...x...
↼
−
G RADIENT D ESCENT f x0
↼
−
N EWTONS M ETHOD f x0
↼
−
argmin f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
↼
−
. . . N EWTONS M ETHOD f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
↼
−
H f x
=
△
−
⇀
↼
−
... f ...x...
↼
−
G RADIENT D ESCENT f x0
↼
−
N EWTONS M ETHOD f x0
↼
−
argmin f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
↼
−
. . . N EWTONS M ETHOD f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
↼
−
H f x
=
△
−
⇀
↼
−
... f ...x...
↼
−
G RADIENT D ESCENT f x0
↼
−
N EWTONS M ETHOD f x0
↼
−
argmin f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
↼
−
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
↼
−
. . . N EWTONS M ETHOD f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
↼
−
H f x
=
△
−
⇀
↼
−
... f ...x...
↼
−
G RADIENT D ESCENT f x0
−
⇀
↼
−↼
−
N EWTONS M ETHOD f f x0
↼
−
argmin f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
↼
−
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
↼
−
. . . N EWTONS M ETHOD f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
↼
−
H f x
=
△
−
⇀
↼
−
... f ...x...
↼
−
G RADIENT D ESCENT f x0
−
⇀
↼
−↼
−
N EWTONS M ETHOD f f x0
↼
−
argmin f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
↼
−
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
−
⇀
↼
−↼
−
. . . N EWTONS M ETHOD f f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
↼
−
H f x
=
△
−
⇀
↼
−
... f ...x...
↼
−
G RADIENT D ESCENT f x0
−
⇀
↼
−↼
−
N EWTONS M ETHOD f f x0
−
⇀
↼
−↼
−
argmin f f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
↼
−
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
−
⇀
↼
−↼
−
. . . N EWTONS M ETHOD f f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
↼−−−−−−−
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
↼
−
H f x
=
△
−
⇀
↼
−
... f ...x...
↼
−
G RADIENT D ESCENT f x0
−
⇀
↼
−↼
−
N EWTONS M ETHOD f f x0
−
⇀
↼
−↼
−
argmin f f
=
N EUTRON F LUX r
=
N EUTRON F LUX
D EVIATION r
D EVIATION
r∗
△
=
△
=
△
△
TAPENADE
△
=
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
↼
−
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
−
⇀
↼
−↼
−
. . . N EWTONS M ETHOD f f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
−
⇀
↼
−−
−−
−−
−−
−−
−
−
↼−−−−−−− −
argmin D EVIATION D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Breaking Modularity
↼
−
∇ f x
=
△
↼
−
... f x...
−
⇀
↼
−
H f x
=
△
−
⇀
↼
−
... f ...x...
↼
−
G RADIENT D ESCENT f x0
−
⇀
↼
−↼
−
N EWTONS M ETHOD f f x0
−
⇀
↼
−↼
−
argmin f f
=
N EUTRON F LUX r
=
N EUTRON F LUX
↼−−−−−−−−−−
N EUTRON F LUX
D EVIATION r
D EVIATION
↼−−−−−−−
D EVIATION
r∗
△
=
△
=
△
△
TAPENADE
TAPENADE
△
=
TAPENADE
TAPENADE
△
=
↼
−
. . . xi+1 := . . . ∇ f xi . . .
−
⇀
↼
−
↼
−
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
−
⇀
↼
−↼
−
. . . N EWTONS M ETHOD f f x0 . . .
classified
↼−−−−−−−−−−
N EUTRON F LUX
−
−
⇀
↼
−−
−−
−−
−−
−−
−−
−−
−−
−
−
N EUTRON F LUX
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
↼−−−−−−−
D EVIATION
−−
⇀
↼
−−
−−
−−
−−
−−
−
−
D EVIATION
−
⇀
↼
−−
−−
−−
−−
−−
−
−
↼−−−−−−− −
argmin D EVIATION D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Restoring Modularity
△
∇f x
=
Hf x
=
G RADIENT D ESCENT f x0
=
N EWTONS M ETHOD f x0
△
△
△
=
△
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
. . . xi+1 := . . . ∇ f xi . . .
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Restoring Modularity
△
∇f x
=
Hf x
=
G RADIENT D ESCENT f x0
=
N EWTONS M ETHOD f x0
→
−
→
−
⇁
⇁
(( J f ) x ⊲ −
e1 ), . . . , (( J f ) x ⊲ −
en )
△
△
△
=
△
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
. . . xi+1 := . . . ∇ f xi . . .
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Restoring Modularity
△
∇f x
=
Hf x
=
G RADIENT D ESCENT f x0
=
N EWTONS M ETHOD f x0
←
−
. . . (J f ) x . . .
△
△
△
=
△
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
. . . xi+1 := . . . ∇ f xi . . .
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
. . . G RADIENT D ESCENT f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Restoring Modularity
△
←
−
. . . (J f ) x . . .
△
→
− ←
−
. . . ( J ( J f )) . . . x . . .
∇f x
=
Hf x
=
G RADIENT D ESCENT f x0
=
N EWTONS M ETHOD f x0
△
△
=
△
argmin f
=
N EUTRON F LUX r
=
D EVIATION r
=
r∗
=
△
△
△
. . . xi+1 := . . . ∇ f xi . . .
. . . xi+1 := . . . ∇ f xi . . . H f xi . . .
. . . N EWTONS M ETHOD f x0 . . .
classified
((N EUTRON F LUX r) − N EUTRON F LUXcritical )2
argmin D EVIATION
Fermi, E. (1946). The Development of the first chain reaction pile.
Proceedings of the American Philosophy Society, 90:20–4.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
52 / 73
Having your cake and eating it too
Convenient
Fast
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
53 / 73
Having your cake and eating it too
Convenient
D formulated as a higher-order function in the language
Fast
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
53 / 73
Having your cake and eating it too
Convenient
D formulated as a higher-order function in the language
no arbitrary restrictions
applies to all data types and constructs in the language, including code
produced by D and even D itself
Fast
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
53 / 73
Having your cake and eating it too
Convenient
D formulated as a higher-order function in the language
no arbitrary restrictions
applies to all data types and constructs in the language, including code
produced by D and even D itself
higher-order derivatives
(D (D f))
Fast
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
53 / 73
Having your cake and eating it too
Convenient
D formulated as a higher-order function in the language
no arbitrary restrictions
applies to all data types and constructs in the language, including code
produced by D and even D itself
higher-order derivatives
(D (D f))
nesting
(D (lambda (. . .) . . . (D (lambda (. . .) . . .)) . . .))
Fast
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
53 / 73
Having your cake and eating it too
Convenient
D formulated as a higher-order function in the language
no arbitrary restrictions
applies to all data types and constructs in the language, including code
produced by D and even D itself
higher-order derivatives
(D (D f))
nesting
(D (lambda (. . .) . . . (D (lambda (. . .) . . .)) . . .))
Fast
D implemented by reflective transformation of environments and code
associated with closures
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
53 / 73
Having your cake and eating it too
Convenient
D formulated as a higher-order function in the language
no arbitrary restrictions
applies to all data types and constructs in the language, including code
produced by D and even D itself
higher-order derivatives
(D (D f))
nesting
(D (lambda (. . .) . . . (D (lambda (. . .) . . .)) . . .))
Fast
D implemented by reflective transformation of environments and code
associated with closures
compile away reflection with partial evaluation implemented by flow
analysis
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
53 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f)
. . .)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f)
. . .)
(D (lambda (x) 2x3 ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ))
. . .)
(D (lambda (x) 2x3 ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ))
. . .:(λx 6x2 ))
(D (lambda (x) 2x3 ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ))
. . .:(λx 6x2 ))
(D (lambda (x) 2x3 )):(λx 6x2 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ))
. . .:(λx 6x2 ))
(D (lambda (x) 2x3 )):(λx 6x2 )
(D (lambda (x) 3x4 ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ) ∪ (λx 3x4 ))
. . .:(λx 6x2 ))
(D (lambda (x) 2x3 )):(λx 6x2 )
(D (lambda (x) 3x4 ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ) ∪ (λx 3x4 ))
. . .:(λx 6x2 ) ∪ (λx 12x3 ))
(D (lambda (x) 2x3 )):(λx 6x2 )
(D (lambda (x) 3x4 ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ) ∪ (λx 3x4 ))
. . .:(λx 6x2 ) ∪ (λx 12x3 ))
(D (lambda (x) 2x3 )):(λx 6x2 )
(D (lambda (x) 3x4 )):(λx 12x3 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ) ∪ (λx 3x4 ))
. . .:(λx 6x2 ) ∪ (λx 12x3 ))
(D (lambda (x) 2x3 )):(λx 6x2 ) ∪ (λx 12x3 )
(D (lambda (x) 3x4 )):(λx 6x2 ) ∪ (λx 12x3 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx 2x3 ) ∪ (λx 3x4 ))
. . .:(λx 6x2 ) ∪ (λx 12x3 ))
(D (lambda (x) 2x3 )):(λx 6x2 ) ∪ (λx 12x3 )
(D (lambda (x) 3x4 )):(λx 6x2 ) ∪ (λx 12x3 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f)
. . .)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f)
. . .)
(D (D (lambda (x) e2x )))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx e2x ))
. . .)
(D (D (lambda (x) e2x )))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx e2x ))
. . .:(λx 2e2x ))
(D (D (lambda (x) e2x )))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx e2x ))
. . .:(λx 2e2x ))
(D (D (lambda (x) e2x )):(λx 2e2x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx e2x ) ∪ (λx 2e2x ))
. . .:(λx 2e2x ))
(D (D (lambda (x) e2x )):(λx 2e2x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx e2x ) ∪ (λx 2e2x ))
. . .:(λx 2e2x ) ∪ (λx 4e2x ))
(D (D (lambda (x) e2x )):(λx 2e2x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx e2x ) ∪ (λx 2e2x ))
. . .:(λx 2e2x ) ∪ (λx 4e2x ))
(D (D (lambda (x) e2x )):(λx 2e2x ) ∪ (λx 4e2x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx e2x ) ∪ (λx 2e2x ) ∪ (λx 4e2x ))
. . .:(λx 2e2x ) ∪ (λx 4e2x ))
(D (D (lambda (x) e2x )):(λx 2e2x ) ∪ (λx 4e2x ))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Monovariant Flow Analysis: 0-CFA
(define (D f:(λx e2x ) ∪ (λx 2e2x ) ∪ (λx 4e2x ) ∪ . . .)
. . .:(λx 2e2x ) ∪ (λx 4e2x ) ∪ . . .)
(D (D (lambda (x) e2x )):(λx 2e2x ) ∪ (λx 4e2x ) ∪ . . .)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
54 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (D f) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (D f) . . .)
(define (g . . .) . . . (D (lambda (x) 2x3 )) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f) . . .)
(define (g . . .) . . . (D (lambda (x) 2x3 )) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f:(λx 2x3 )) . . .)
(define (g . . .) . . . (D (lambda (x) 2x3 )) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f:(λx 2x3 )) . . .:(λx 6x2 ))
(define (g . . .) . . . (D (lambda (x) 2x3 )) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f:(λx 2x3 )) . . .:(λx 6x2 ))
(define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f:(λx 2x3 )) . . .:(λx 6x2 ))
(define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .)
(define (h . . .) . . . (D (lambda (x) 3x4 )) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f:(λx 2x3 )) . . .:(λx 6x2 ))
(define (Dh f) . . .)
(define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .)
(define (h . . .) . . . (D (lambda (x) 3x4 )) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f:(λx 2x3 )) . . .:(λx 6x2 ))
(define (Dh f:(λx 3x4 )) . . .)
(define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .)
(define (h . . .) . . . (D (lambda (x) 3x4 )) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f:(λx 2x3 )) . . .:(λx 6x2 ))
(define (Dh f:(λx 3x4 )) . . .:(λx 12x3 ))
(define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .)
(define (h . . .) . . . (D (lambda (x) 3x4 )) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define (Dg f:(λx 2x3 )) . . .:(λx 6x2 ))
(define (Dh f:(λx 3x4 )) . . .:(λx 12x3 ))
(define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .)
(define (h . . .) . . . (D (lambda (x) 3x4 )):(λx 12x3 ) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define ((compose n f) x)
(if (zero? n) x ((compose (- n 1) f) (f x))))
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define ((compose n f) x)
(if (zero? n) x ((compose (- n 1) f) (f x))))
((compose k D) g)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define ((compose n f) x)
(if (zero? n) x ((compose (- n 1) f) (f x))))
((compose k D) g)
(define (Dcompose f:g) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define ((compose n f) x)
(if (zero? n) x ((compose (- n 1) f) (f x))))
((compose k D) g)
(define (Dcompose f:g) . . .)
(define (Dcompose:compose f:g′ ) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define ((compose n f) x)
(if (zero? n) x ((compose (- n 1) f) (f x))))
((compose k D) g)
(define (Dcompose f:g) . . .)
(define (Dcompose:compose f:g′ ) . . .)
(define (Dcompose:compose:compose f:g′′ ) . . .)
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis: k-CFA
with Bounded Context Sensitivity
(define ((compose n f) x)
(if (zero? n) x ((compose (- n 1) f) (f x))))
((compose k D) g)
(define (Dcompose f:g) . . .)
(define (Dcompose:compose f:g′ ) . . .)
(define (Dcompose:compose:compose f:g′′ ) . . .)
.
.
.
Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages
or Taming Lambda. Ph.D. thesis, CMU.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
55 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Memoize E indexed (by suitable equivalence relations on) e and σ.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Memoize E indexed (by suitable equivalence relations on) e and σ.
Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Memoize E indexed (by suitable equivalence relations on) e and σ.
Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs.
Is suitable for F ORTRAN-like programs.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Memoize E indexed (by suitable equivalence relations on) e and σ.
Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs.
Is suitable for F ORTRAN-like programs.
Necessary for migrating reflective source-code transformation to compile time.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Memoize E indexed (by suitable equivalence relations on) e and σ.
Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs.
Is suitable for F ORTRAN-like programs.
Necessary for migrating reflective source-code transformation to compile time.
Side benefit: union-free
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Memoize E indexed (by suitable equivalence relations on) e and σ.
Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs.
Is suitable for F ORTRAN-like programs.
Necessary for migrating reflective source-code transformation to compile time.
Side benefit: union-free
No tags, tag checking, tag dispatching, indirect calls
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Memoize E indexed (by suitable equivalence relations on) e and σ.
Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs.
Is suitable for F ORTRAN-like programs.
Necessary for migrating reflective source-code transformation to compile time.
Side benefits: union-free, no cyclic abstract values
No tags, tag checking, tag dispatching, indirect calls
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Polyvariant Flow Analysis
with Unbounded Context Sensitivity
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei
σ ::= {x1 7→ v1 , . . .}
E :e×σ →v
v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R
σ ::= {x1 7→ v1 , . . .}
Memoize E indexed (by suitable equivalence relations on) e and σ.
Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs.
Is suitable for F ORTRAN-like programs.
Necessary for migrating reflective source-code transformation to compile time.
Side benefits: union-free, no cyclic abstract values
No tags, tag checking, tag dispatching, indirect calls
Allows complete unboxing: no allocation, reclamation, indirection
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
56 / 73
Game Theory
b1 . . .
a1
..
.
A
ai
..
.
B
bj
. . . bn
..
.
.
. . . PAYOFF(ai , bj ) . . .
..
..
.
.
..
am
von Neumann, J. and Morgenstern, O. (1944). Theory of Games and
Economic Behavior. Princeton University Press, Princeton, NJ.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
57 / 73
Game Theory
b1 . . .
a1
..
.
A
ai
..
.
B
bj
. . . bn
..
.
.
. . . PAYOFF(ai , bj ) . . .
..
..
.
.
..
am
max min PAYOFF(a, b)
a∈A b∈B
von Neumann, J. and Morgenstern, O. (1944). Theory of Games and
Economic Behavior. Princeton University Press, Princeton, NJ.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
57 / 73
Game Theory
...
Rm
..
.
a
..
.
Rn
b
...
..
.
.
. . . PAYOFF(a, b) . . .
..
..
.
.
..
max min PAYOFF(a, b)
a∈Rm b∈Rn
von Neumann, J. and Morgenstern, O. (1944). Theory of Games and
Economic Behavior. Princeton University Press, Princeton, NJ.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
57 / 73
Cathode Ray Tubes
Path of Charged Particle
10
potential: p(x; w)
=
8
6
4
ẍ(t)
=
− ∇x p(x)|x=x(t)
ẋ(t + ∆t)
=
ẋ(t) + ∆t ẍ(t)
x(t + ∆t)
=
x(t) + ∆t ẋ(t)
When: x1 (t + ∆t)
≤
0
let: ∆tf
=
−x1 (t)/ẋ1 (t)
tf
=
t + ∆tf
x(tf )
=
x(t) + ∆tf ẋ(t)
Error: E(w)
=
2
(0)
w(1)=0
w =-0.272
(2)
w(3)=-0.267
w(4)=-0.266
w =-0.266
0
0
2
4
6
8
kx − (10, 10 − w)k−1 + kx − (10, 0)k−1
Find:
x0 (tf )2
argmin E(w)
w
10
Sprague, C. S. and George, R. H. (1939). Cathode Ray Deflecting Electrode.
US Patent 2,161,437.
George, R. H. (1940). Cathode Ray Tube. US Patent 2,222,942.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
58 / 73
Performance Comparison
VLAD
S TALIN∇
F ORTRAN
ADIFOR
C ++
ML
H ASKELL
S CHEME
TAPENADE
FADBAD ++
MLTON
OC AML
SML / NJ
GHC
B IGLOO
C HICKEN
G AMBIT
I KARUS
L ARCENY
MIT S CHEME
MZC
M Z S CHEME
S CHEME ->C
SCMUTILS
S TALIN
FF
1.00
1.52
3.40
65.69
53.89
160.50
106.21
165.22
505.90
1120.37
444.13
192.07
726.59
1472.26
2073.26
2344.70
391.42
3321.20
208.10
particle
FR
RF
1.00
1.00
RR
1.00
88.88
340.35
182.45
16.08
147.91
105.04
28.06
263.66
185.15
761.40
2026.31
752.63
312.28
1108.18
2500.00
3434.64
4076.16
605.26
104.81
425.60
138.34
61.79
144.55
309.66
340.30
409.95
109.77
228.56
1872.85
256.30
114.87
270.14
591.36
655.83
843.68
198.43
366.08
51.84
91.86
FF
1.00
2.07
2.56
22.44
40.39
107.71
84.38
121.18
423.69
889.58
362.65
158.88
571.81
1243.26
2436.26
2000.89
324.95
2800.71
166.96
saddle
FR
RF
1.00
1.00
RR
1.00
51.21
156.33
106.01
1.86
6.75
3.55
2.67
13.51
6.31
440.25
1144.65
420.48
205.97
613.65
1428.57
1996.40
2332.43
328.84
15.77
35.73
14.08
6.75
19.14
51.36
72.45
80.78
12.74
24.59
68.94
23.87
11.40
29.77
79.10
150.02
134.00
18.28
212.93
7.68
11.40
not implemented but could implement
not implemented in existing tool
can’t implement
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
59 / 73
Gradient-Based Optimization
(define (e i n)
(if (zero? n)
’()
(cons (if (zero? i) 1.0 0.0)
(e (- i 1) (- n 1)))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
60 / 73
Gradient-Based Optimization
(define (e i n)
(if (zero? n)
’()
(cons (if (zero? i) 1.0 0.0)
(e (- i 1) (- n 1)))))
(define ((gradient f) x)
(let ((n (length x)))
(map (lambda (i) (tangent ((j* f) (bundle x (e i n)))))
(iota n))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
60 / 73
Gradient-Based Optimization
(define (e i n)
(if (zero? n)
’()
(cons (if (zero? i) 1.0 0.0)
(e (- i 1) (- n 1)))))
(define ((gradient f) x)
(let ((n (length x)))
(map (lambda (i) (tangent ((j* f) (bundle x (e i n)))))
(iota n))))
(define (gradient-ascent f x0 n eta)
(if (zero? n)
(list x0 (f x0) ((gradient f) x0))
(gradient-ascent f
(zip (lambda (xi gi) (+ xi (* eta gi)))
x0
((gradient f) x0))
(- n 1)
eta)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
60 / 73
Gradient-Based Optimization
(define ((gradient f) x) (cdr ((cdr ((*j f) (*j x))) 1.0)))
(define (gradient-ascent f x0 n eta)
(if (zero? n)
(list x0 (f x0) ((gradient f) x0))
(gradient-ascent f
(zip (lambda (xi gi) (+ xi (* eta gi)))
x0
((gradient f) x0))
(- n 1)
eta)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
60 / 73
Neural Networks
p
r
q
p
0
0
1
1
q
0
1
0
1
r
0
1
1
0
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning
representations by back-propagating errors. Nature, 323:533–6.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
61 / 73
Neural Networks in VLAD
(define ((sum-activities activities) bias ws)
((fold + bias) (zip * ws activities)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
62 / 73
Neural Networks in VLAD
(define ((sum-activities activities) bias ws)
((fold + bias) (zip * ws activities)))
(define (sum-layer activities ws-layer)
(map (sum-activities activities) ws-layer))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
62 / 73
Neural Networks in VLAD
(define ((sum-activities activities) bias ws)
((fold + bias) (zip * ws activities)))
(define (sum-layer activities ws-layer)
(map (sum-activities activities) ws-layer))
(define (sigmoid x) (/ 1 (+ (exp (- 0 x)) 1)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
62 / 73
Neural Networks in VLAD
(define ((sum-activities activities) bias ws)
((fold + bias) (zip * ws activities)))
(define (sum-layer activities ws-layer)
(map (sum-activities activities) ws-layer))
(define (sigmoid x) (/ 1 (+ (exp (- 0 x)) 1)))
(define ((forward-pass ws-layers) in)
(if (null? ws-layers)
in
((forward-pass (cdr ws-layers))
(map sigmoid (sum-layer in (car ws-layers))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
62 / 73
Neural Networks in VLAD
(define ((sum-activities activities) bias ws)
((fold + bias) (zip * ws activities)))
(define (sum-layer activities ws-layer)
(map (sum-activities activities) ws-layer))
(define (sigmoid x) (/ 1 (+ (exp (- 0 x)) 1)))
(define ((forward-pass ws-layers) in)
(if (null? ws-layers)
in
((forward-pass (cdr ws-layers))
(map sigmoid (sum-layer in (car ws-layers))))))
(define ((error-on-dataset dataset) ws-layers)
((fold + 0)
(map (lambda ((list in target))
(* 0.5 (magnitude-squared (v- ((forward-pass ws-layers) in) target))))
dataset)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
62 / 73
Neural Networks in VLAD
(define ((sum-activities activities) bias ws)
((fold + bias) (zip * ws activities)))
(define (sum-layer activities ws-layer)
(map (sum-activities activities) ws-layer))
(define (sigmoid x) (/ 1 (+ (exp (- 0 x)) 1)))
(define ((forward-pass ws-layers) in)
(if (null? ws-layers)
in
((forward-pass (cdr ws-layers))
(map sigmoid (sum-layer in (car ws-layers))))))
(define ((error-on-dataset dataset) ws-layers)
((fold + 0)
(map (lambda ((list in target))
(* 0.5 (magnitude-squared (v- ((forward-pass ws-layers) in) target))))
dataset)))
(gradient-descent (error-on-dataset ’(((0 0) (0))
((0 1) (1))
((1 0) (1))
((1 1) (0))))
’(((0 -0.284227 1.16054) (0 0.617194 1.30467))
((0 -0.084395 0.648461)))
1000.0
0.3)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
62 / 73
Performance Comparison
VLAD
S TALIN∇
F ORTRAN
ADIFOR
TAPENADE
C
ADIC
C ++
ADOL – C
C PPAD
FADBAD ++
ML
MLTON
OC AML
SML / NJ
H ASKELL
S CHEME
GHC
B IGLOO
C HICKEN
G AMBIT
I KARUS
L ARCENY
MIT S CHEME
MZC
M Z S CHEME
S CHEME ->C
SCMUTILS
S TALIN
backprop
Fs
Fv
1.00
11.84
2.68
11.35
4.33
16.33
3.93
12.34
3.89
42.15
98.96 33.15
73.94
157.75
142.71
577.45
1391.75
545.20
216.42
955.98
1900.04
2439.93
3477.86
484.24
4544.48
832.68
R
1.00
6.24
35.53
23.69
53.03
37.94
149.14
94.97
306.60
971.91
341.73
147.49
486.64
1141.22
1571.52
1866.28
233.75
367.84
not implemented but could implement
not implemented in existing tool
can’t implement
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
63 / 73
Probabilistic Lambda Calculus
P = if x0 then 0 else if x1 then 1 else 2
Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian
Inference for Stochastic Programs. Proceedings of the 14th National
Conference on Artificial Intelligence (AAAI), pp. 740–7.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
64 / 73
Probabilistic Lambda Calculus
P = if x0 then 0 else if x1 then 1 else 2
Pr(x0 →
7 true) = p0
Pr(x1 →
7 true) = p1
Pr(x0 →
7 false) = 1 − p0
Pr(x1 →
7 false) = 1 − p1
Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian
Inference for Stochastic Programs. Proceedings of the 14th National
Conference on Artificial Intelligence (AAAI), pp. 740–7.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
64 / 73
Probabilistic Lambda Calculus
P = if x0 then 0 else if x1 then 1 else 2
Pr(x0 →
7 true) = p0
Pr(x1 →
7 true) = p1
Pr(x0 →
7 false) = 1 − p0
Pr(x1 →
7 false) = 1 − p1
Pr(E(P) = 0|p0 , p1 ) = p0
Pr(E(P) = 1|p0 , p1 ) = (1 − p0 )p1
Pr(E(P) = 2|p0 , p1 ) = (1 − p0 )(1 − p1 )
Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian
Inference for Stochastic Programs. Proceedings of the 14th National
Conference on Artificial Intelligence (AAAI), pp. 740–7.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
64 / 73
Probabilistic Lambda Calculus
P = if x0 then 0 else if x1 then 1 else 2
Pr(x0 →
7 true) = p0
Pr(x1 →
7 true) = p1
Pr(x0 →
7 false) = 1 − p0
Pr(x1 →
7 false) = 1 − p1
Pr(E(P) = 0|p0 , p1 ) = p0
Pr(E(P) = 1|p0 , p1 ) = (1 − p0 )p1
Pr(E(P) = 2|p0 , p1 ) = (1 − p0 )(1 − p1 )
Y
Pr(E(P) = v|p0 , p1 ) = p0 (1 − p0 )3 p1 (1 − p1 )2
v∈{0,1,2,2}
Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian
Inference for Stochastic Programs. Proceedings of the 14th National
Conference on Artificial Intelligence (AAAI), pp. 740–7.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
64 / 73
Probabilistic Lambda Calculus
P = if x0 then 0 else if x1 then 1 else 2
Pr(x0 →
7 true) = p0
Pr(x1 →
7 true) = p1
Pr(x0 →
7 false) = 1 − p0
Pr(x1 →
7 false) = 1 − p1
Pr(E(P) = 0|p0 , p1 ) = p0
Pr(E(P) = 1|p0 , p1 ) = (1 − p0 )p1
Pr(E(P) = 2|p0 , p1 ) = (1 − p0 )(1 − p1 )
Y
Pr(E(P) = v|p0 , p1 ) = p0 (1 − p0 )3 p1 (1 − p1 )2
v∈{0,1,2,2}
argmax
p0 ,p1
Y
Pr(E(P) = v|p0 , p1 ) =
v∈{0,1,2,2}
1 1
,
4 3
Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian
Inference for Stochastic Programs. Proceedings of the 14th National
Conference on Artificial Intelligence (AAAI), pp. 740–7.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
64 / 73
Probabilistic Prolog
p(0).
p(X):-q(X).
q(1).
q(2).
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
65 / 73
Probabilistic Prolog
Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
65 / 73
Probabilistic Prolog
Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1
Pr(?-p(0).) = p0
Pr(?-p(1).) = (1 − p0 )p1
Pr(?-p(2).) = (1 − p0 )(1 − p1 )
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
65 / 73
Probabilistic Prolog
Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1
Pr(?-p(0).) = p0
Pr(?-p(1).) = (1 − p0 )p1
Pr(?-p(2).) = (1 − p0 )(1 − p1 )
Y
Pr(?-q.) = p0 (1 − p0 )3 p1 (1 − p1 )2
q∈{p(0),p(1),p(2),p(2)}
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
65 / 73
Probabilistic Prolog
Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1
Pr(?-p(0).) = p0
Pr(?-p(1).) = (1 − p0 )p1
Pr(?-p(2).) = (1 − p0 )(1 − p1 )
Y
Pr(?-q.) = p0 (1 − p0 )3 p1 (1 − p1 )2
q∈{p(0),p(1),p(2),p(2)}
argmax
p0 ,p1
Y
Pr(?-q.) =
q∈{p(0),p(1),p(2),p(2)}
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
1 1
,
4 3
Lecture Circuit 2009
65 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(define (evaluate expression environment)
(cond
((constant-expression? expression)
(singleton-tagged-distribution
(constant-expression-value expression)))
((variable-access-expression? expression)
(lookup-value
(variable-access-expression-variable expression) environment))
((lambda-expression? expression)
(singleton-tagged-distribution
(lambda (tagged-distribution)
(evaluate
(lambda-expression-body expression)
(cons (make-binding (lambda-expression-variable expression)
tagged-distribution)
environment)))))
(else (let ((tagged-distribution
(evaluate (application-argument expression)
environment)))
(map-tagged-distribution
(lambda (value) (value tagged-distribution))
(evaluate (application-callee expression) environment))))))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
66 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Lambda Calculus
(gradient-ascent
(lambda (p)
(let ((tagged-distribution
(evaluate if x0 then 0 else if x1 then 1 else 2
(list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0
Pr(x1 7→ true) = p1
Pr(x1 7→ false) = 1 − p1
. . .))))
(map-reduce
*
1.0
(lambda (value)
(likelihood value tagged-distribution))
’(0 1 2 2))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
67 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(define (proof-distribution term clauses)
(let ((offset . . .))
(map-reduce
append
’()
(lambda (clause)
(let ((clause (alpha-rename clause offset)))
(let loop ((p (clause-p clause))
(substitution (unify term (clause-term clause)))
(terms (clause-terms clause)))
(if (boolean? substitution)
’()
(if (null? terms)
(list (make-double p substitution))
(map-reduce
append
’()
(lambda (double)
(loop (* p (double-p double))
(append substitution (double-substitution double))
(rest terms)))
(proof-distribution
(apply-substitution substitution (first terms)) clauses)))))))
clauses)))
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
68 / 73
Probabilistic Prolog
(gradient-ascent
(lambda (p)
(let ((clauses (list Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1 )))
(map-reduce
*
1.0
(lambda (query)
(likelihood (proof-distribution query clauses)))
’(p(0) p(1) p(2) p(2)))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
69 / 73
Probabilistic Prolog
(gradient-ascent
(lambda (p)
(let ((clauses (list Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1 )))
(map-reduce
*
1.0
(lambda (query)
(likelihood (proof-distribution query clauses)))
’(p(0) p(1) p(2) p(2)))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
69 / 73
Probabilistic Prolog
(gradient-ascent
(lambda (p)
(let ((clauses (list Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1 )))
(map-reduce
*
1.0
(lambda (query)
(likelihood (proof-distribution query clauses)))
’(p(0) p(1) p(2) p(2)))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
69 / 73
Probabilistic Prolog
(gradient-ascent
(lambda (p)
(let ((clauses (list Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1 )))
(map-reduce
*
1.0
(lambda (query)
(likelihood (proof-distribution query clauses)))
’(p(0) p(1) p(2) p(2)))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
69 / 73
Probabilistic Prolog
(gradient-ascent
(lambda (p)
(let ((clauses (list Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1 )))
(map-reduce
*
1.0
(lambda (query)
(likelihood (proof-distribution query clauses)))
’(p(0) p(1) p(2) p(2)))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
69 / 73
Probabilistic Prolog
(gradient-ascent
(lambda (p)
(let ((clauses (list Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1 )))
(map-reduce
*
1.0
(lambda (query)
(likelihood (proof-distribution query clauses)))
’(p(0) p(1) p(2) p(2)))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
69 / 73
Probabilistic Prolog
(gradient-ascent
(lambda (p)
(let ((clauses (list Pr(p(0).) = p0
Pr(p(X):-q(X).) = 1 − p0
Pr(q(1).) = p1
Pr(q(2).) = 1 − p1 )))
(map-reduce
*
1.0
(lambda (query)
(likelihood (proof-distribution query clauses)))
’(p(0) p(1) p(2) p(2)))))
’(0.5 0.5)
1000.0
0.1)
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
69 / 73
Generated Code
static void f2679(double a_f2679_0,double a_f2679_1,double a_f2679_2,double a_f2679_3){
int t272381=((a_f2679_2==0.)?0:1);
double t272406;
double t272405;
double t272404;
double t272403;
double t272402;
if((t272381==0)){
double t272480=(1.-a_f2679_0);
double t272572=(1.-a_f2679_1);
double t273043=(a_f2679_0+0.);
double t274185=(t272480*a_f2679_1);
double t274426=(t274185+0.);
double t275653=(t272480*t272572);
double t275894=(t275653+0.);
double t277121=(t272480*t272572);
double t277362=(t277121+0.);
double t277431=(t277362*1.);
double t277436=(t275894*t277431);
double t277441=(t274426*t277436);
double t277446=(t273043*t277441);
...
double t1777107=(t1774696+t1715394);
double t1777194=(0.-t1745420);
double t1778533=(t1777194+t1419700);
t272406=a_f2679_0;
t272405=a_f2679_1;
t272404=t277446;
t272403=t1778533;
t272402=t1777107;}
else {. . .}
r_f2679_0=t272406;
r_f2679_1=t272405;
r_f2679_2=t272404;
r_f2679_3=t272403;
r_f2679_4=t272402;}
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
70 / 73
Performance Comparison
VLAD
ML
H ASKELL
S CHEME
S TALIN∇
MLTON
OC AML
SML / NJ
probabilisticlambda-calculus
F
R
1.00
1.00
106.45
124.95
215.73
538.68
197.75
272.45
probabilisticprolog
F
R
1.00
1.00
789.41
483.47
1207.13
1534.61
2448.02
1471.94
832.92
2305.98
879.88
437.46
1651.01
3491.10
5289.17
6235.78
682.15
6456.99
1240.73
14422.16
66948.70
24316.03
8242.92
25589.62
85819.57
154206.95
166129.12
10530.66
80100.23
22511.79
GHC
B IGLOO
C HICKEN
G AMBIT
I KARUS
L ARCENY
MIT S CHEME
MZC
M Z S CHEME
S CHEME ->C
SCMUTILS
S TALIN
1048.11
3283.00
1153.86
531.10
1673.22
4130.19
5929.14
7134.71
794.31
1137.41
8286.06
37792.84
13649.81
4845.86
14833.53
48335.38
83480.27
91630.70
5980.27
10986.43
not implemented but could implement, including F ORTRAN, C, and C ++
not implemented in existing tool
can’t implement
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
71 / 73
It is, of course, not excluded that the range of arguments or range of values of
a function should consist wholly or partly of functions. The derivative, as this
notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both ranges consist of functions.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
72 / 73
It is, of course, not excluded that the range of arguments or range of values of
a function should consist wholly or partly of functions. The derivative, as this
notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both ranges consist of functions.
(¶4)
Church, A. (1941). The Calculi of Lambda Conversion, Princeton University
Press, Princeton, NJ.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
72 / 73
It is, of course, not excluded that the range of arguments or range of values of
a function should consist wholly or partly of functions. The derivative, as this
notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both ranges consist of functions.
(¶4)
Church, A. (1941). The Calculi of Lambda Conversion, Princeton University
Press, Princeton, NJ.
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
72 / 73
Gottfried Leibniz
Jacob Bernoulli
Johann Bernoulli
Leonhard Euler
Joseph Louis Lagrange
Simeon Poisson
Michel Chasles
Hubert Anson Newton
Eliakim Hastings Moore
Oswald Veblen
Alonzo Church
Jeffrey Mark Siskind (Purdue/ECE)
AD of Functional Programs
Lecture Circuit 2009
73 / 73
Download