Automatic Differentiation of Functional Programs or Lambda the Ultimate Calculus Jeffrey Mark Siskind qobi@purdue.edu School of Electrical and Computer Engineering Purdue University Lecture Circuit 2009 Part of this talk covers joint work with Barak A. Pearlmutter. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 1 / 73 The Essence (define (f x) 2x3 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) Jeffrey Mark Siskind (Purdue/ECE) (define (f ′ x) 6x2 ) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (g x) sin f (x)) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (g x) sin f (x)) Jeffrey Mark Siskind (Purdue/ECE) (define (g′ x) f ′ (x) cos f (x)) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (g x) sin f (x)) Jeffrey Mark Siskind (Purdue/ECE) (define (g′ x) f ′ (x) cos f (x)) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (g x) sin f (x)) Jeffrey Mark Siskind (Purdue/ECE) (define (g′ x) f ′ (x) cos f (x)) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (g x) sin f (x)) Jeffrey Mark Siskind (Purdue/ECE) (define (g′ x) f ′ (x) cos f (x)) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (g x) sin f (x)) Jeffrey Mark Siskind (Purdue/ECE) (define (g′ x) f ′ (x) cos f (x)) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) Jeffrey Mark Siskind (Purdue/ECE) h{f 7→ λx 2x3 }, λx sin f (x)i AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) Jeffrey Mark Siskind (Purdue/ECE) h{f 7→ λx 2x3 }, λx sin f (x)i AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) Jeffrey Mark Siskind (Purdue/ECE) h{f 7→ λx 2x3 }, λx sin f (x)i AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) Jeffrey Mark Siskind (Purdue/ECE) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i (map-closure f h{x1 7→ v1 , . . .}, ei) Jeffrey Mark Siskind (Purdue/ECE) =⇒h{x1 7→ f (v1 ), . . .}, ei AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i (map-closure f h{x1 7→ v1 , . . .}, ei) Jeffrey Mark Siskind (Purdue/ECE) =⇒h{x1 7→ f (v1 ), . . .}, ei AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i (map-closure f h{x1 7→ v1 , . . .}, ei) =⇒h{x1 7→ f (v1 ), . . .}, ei need reflective transformation of closure bodies Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i (map-closure f h{x1 7→ v1 , . . .}, ei) =⇒h{x1 7→ f (v1 ), . . .}, ei need reflective transformation of closure bodies want transformation done at compile time Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i (map-closure f h{x1 7→ v1 , . . .}, ei) =⇒h{x1 7→ f (v1 ), . . .}, ei need reflective transformation of closure bodies want transformation done at compile time need flow analysis Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 The Essence (define (f x) 2x3 ) (define (f ′ x) 6x2 ) (define (g x) sin f (x)) (define (g′ x) f ′ (x) cos f (x)) (D g) =⇒ (D h{f 7→ λx 2x3 }, λx sin f (x)i) =⇒ h{f 7→ λx 2x3 , f ′ 7→ λx 6x2 }, λx f ′ (x) cos f (x)i (map-closure f h{x1 7→ v1 , . . .}, ei) =⇒h{x1 7→ f (v1 ), . . .}, ei need reflective transformation of closure bodies want transformation done at compile time need polyvariant flow analysis Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 2 / 73 Nesting (sqrt (sqrt x)) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 3 / 73 Nesting (sqrt (sqrt x)) (D (D f)) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 3 / 73 Nesting (sqrt (sqrt x)) (D (D f)) (map (lambda (x) . . . (map (lambda (y) . . .) . . .) . . .) . . .) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 3 / 73 Nesting (sqrt (sqrt x)) (D (D f)) (map (lambda (x) . . . (map (lambda (y) . . .) . . .) . . .) . . .) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 3 / 73 Nesting (sqrt (sqrt x)) (D (D f)) (map (lambda (x) . . . (map (lambda (y) . . .) . . .) . . .) . . .) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) max min f (x, y) x Jeffrey Mark Siskind (Purdue/ECE) y AD of Functional Programs Lecture Circuit 2009 3 / 73 R, the Software, Finds Fans in Data Analysts - NYTimes.com http://www.nytimes.com/2009/01/07/technology/busines... Welcome to TimesPeople What’s this? HOME PAGE TimesPeople Lets You Share and Discover the Best of NY... TODAY'S PAPER VIDEO MOST POPULAR Get Started 10:54 AM My Account TIMES TOPICS No, thanks Welcome, qobiyan Log Out Help Search All NYTimes.com Business Computing WORLD U.S. N.Y. / REGION BUSINESS Search Technology TECHNOLOGY SCIENCE HEALTH SPORTS OPINION Inside Technology Internet Start-Ups Business Computing Companies ARTS STYLE TRAVEL JOBS REAL ESTATE AUTOS Bits Personal Tech » Blog » Cellphones, Cameras, Computers and more Advertise on NYTimes.com Data Analysts Captivated by R’s Power More Articles in Technology » Advertisement "Do Wrinkle Creams Work?" Read how a Mom combined these 2 products and got rid of her wrinkles forever! Learn more Stuart Isett for The New York Times R first appeared in 1996, when the statistics professors Robert Gentleman, left, and Ross Ihaka released the code as a free software package. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 4 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! Taylor, B. (1715). Methodus Incrementorum Directa et Inversa. London. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i ε+ + ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, and multiply by 1! Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, and multiply by 1! (noop). Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f ′′ (c) 2 f (c) f ′ (c) f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, and multiply by 1! (noop). Key idea: Only need output to be a finite truncated power series a + bε. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, and multiply by 1! (noop). Key idea: Only need output to be a finite truncated power series a + bε. The input c + ε is also a truncated power series. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, and multiply by 1! (noop). Key idea: Only need output to be a finite truncated power series a + bε. The input c + ε is also a truncated power series. Can do a nonstandard interpretation of f over truncated power series. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, and multiply by 1! (noop). Key idea: Only need output to be a finite truncated power series a + bε. The input c + ε is also a truncated power series. Can do a nonstandard interpretation of f over truncated power series. Preserves control flow: Augments original values with derivatives. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 The Essence of Forward-Mode AD f (c + ε) = f (c) f ′ (c) f ′′ (c) 2 f (i) (c) i + ε+ ε + ··· + ε + ··· 0! 1! 2! i! To compute D f c: evaluate f at the term c + ε to get a power series, extract the coefficient of ε, and multiply by 1! (noop). Key idea: Only need output to be a finite truncated power series a + bε. The input c + ε is also a truncated power series. Can do a nonstandard interpretation of f over truncated power series. Preserves control flow: Augments original values with derivatives. (D f ) is O(1) relative to f (both space and time). Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 5 / 73 Arithmetic on Complex Numbers a + bi Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Transactions of the Royal Irish Academy, 17(1):293–422. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 6 / 73 Arithmetic on Complex Numbers a + bi i2 = −1 Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Transactions of the Royal Irish Academy, 17(1):293–422. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 6 / 73 Arithmetic on Complex Numbers a + bi i2 = −1 (a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i (a1 + b1 i) × (a2 + b2 i) = (a1 × a2 ) + (a1 × b2 + a2 × b1 )i + (b1 × b2 )i2 Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Transactions of the Royal Irish Academy, 17(1):293–422. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 6 / 73 Arithmetic on Complex Numbers a + bi i2 = −1 (a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i (a1 + b1 i) × (a2 + b2 i) = (a1 × a2 ) + (a1 × b2 + a2 × b1 )i + (b1 × b2 )i2 Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Transactions of the Royal Irish Academy, 17(1):293–422. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 6 / 73 Arithmetic on Complex Numbers a + bi i2 = −1 (a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i (a1 + b1 i) × (a2 + b2 i) = (a1 × a2 −b1 × b2 ) + (a1 × b2 + a2 × b1 )i Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Transactions of the Royal Irish Academy, 17(1):293–422. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 6 / 73 Arithmetic on Complex Numbers a + bi i2 = −1 (a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i (a1 + b1 i) × (a2 + b2 i) = (a1 × a2 −b1 × b2 ) + (a1 × b2 + a2 × b1 )i ha, bi Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Transactions of the Royal Irish Academy, 17(1):293–422. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 6 / 73 Arithmetic on Complex Numbers a + bi i2 = −1 (a1 + b1 i) + (a2 + b2 i) = (a1 + a2 ) + (b1 + b2 )i (a1 + b1 i) × (a2 + b2 i) = (a1 × a2 −b1 × b2 ) + (a1 × b2 + a2 × b1 )i ha, bi ha1 , b1 i + ha2 , b2 i = h(a1 + a2 ), (b1 + b2 )i ha1 , b1 i × ha2 , b2 i = h(a1 × a2 − b1 × b2 ), (a1 × b2 + a2 × b1 )i Hamilton, W. R. (1837). Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Transactions of the Royal Irish Academy, 17(1):293–422. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 6 / 73 Arithmetic on Dual Numbers x + x′ ε Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of the London Mathematical Society, 4:381–95. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 7 / 73 Arithmetic on Dual Numbers x + x′ ε ε2 = 0, but ε 6= 0 Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of the London Mathematical Society, 4:381–95. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 7 / 73 Arithmetic on Dual Numbers x + x′ ε ε2 = 0, but ε 6= 0 (x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε (x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε + (x1′ + x2′ )ε2 Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of the London Mathematical Society, 4:381–95. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 7 / 73 Arithmetic on Dual Numbers x + x′ ε ε2 = 0, but ε 6= 0 (x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε (x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε + (x1′ + x2′ )ε2 Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of the London Mathematical Society, 4:381–95. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 7 / 73 Arithmetic on Dual Numbers x + x′ ε ε2 = 0, but ε 6= 0 (x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε (x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of the London Mathematical Society, 4:381–95. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 7 / 73 Arithmetic on Dual Numbers x + x′ ε ε2 = 0, but ε 6= 0 (x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε (x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε hx, x′ i Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of the London Mathematical Society, 4:381–95. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 7 / 73 Arithmetic on Dual Numbers x + x′ ε ε2 = 0, but ε 6= 0 (x1 + x1′ ε) + (x2 + x2′ ε) = (x1 + x2 ) + (x1′ + x2′ )ε (x1 + x1′ ε) × (x2 + x2′ ε) = (x1 × x2 ) + (x1 × x2′ + x2 × x1′ )ε hx, x′ i hx1 , x1′ i + hx2 , x2′ i = h(x1 + x2 ), (x1′ + x2′ )i hx1 , x1′ i × hx2 , x2′ i = h(x1 × x2 ), (x1 × x2′ + x2 × x1′ )i Clifford, W. K. (1873). Preliminary Sketch of Bi-quaternions. Proceedings of the London Mathematical Society, 4:381–95. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 7 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) (define (f x) (* 2 (* x (* x x)))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) (define (f x) (* 2 (* x (* x x)))) (D f) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) (define (f x) (* 2 (* x (* x x)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) (define (f x) (* 2 (* x (* x x)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) (define (f x) (* 2 (* x (* x x)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Convenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) (define (f x) (* 2 (* x (* x x)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Convenient but slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) (define (f x) (* 2 (* x (* x x)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Convenient but slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define + (let ((+ +)) (lambda (x1 x2) (make-bundle (+ (primal x1) (primal x2)) (+ (tangent x1) (tangent x2)))))) (define * (let ((+ +) (* *)) (lambda (x1 x2) (make-bundle (* (primal x1) (primal x2)) (+ (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (define ((D f) x) (tangent (f (make-bundle x 1)))) (define (f x) (* 2 (* x (* x x)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Convenient but slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define ((D f) x) (fluid-let ((+ (lambda (x1 x2) (make-bundle (+ (+ (* (lambda (x1 x2) (make-bundle (* (+ (primal x1) (primal x2)) (tangent x1) (tangent x2))))) (primal x1) (primal x2)) (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (tangent (f (make-bundle x 1))))) (define (f x) (* 2 (* x (* x x)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Convenient but slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Dynamic Overloading: SCMUTILS (define-structure bundle primal tangent) (define (primal p) (if (bundle? p) (bundle-primal p) p)) (define (tangent p) (if (bundle? p) (bundle-tangent p) 0)) (define ((D f) x) (fluid-let ((+ (lambda (x1 x2) (make-bundle (+ (+ (* (lambda (x1 x2) (make-bundle (* (+ (primal x1) (primal x2)) (tangent x1) (tangent x2))))) (primal x1) (primal x2)) (* (primal x1) (tangent x2)) (* (tangent x1) (primal x2))))))) (tangent (f (make-bundle x 1))))) (define (f x) (* 2 (* x (* x x)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) Convenient but slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 8 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end Fast Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end AD_TOP = gf AD_IVARS = x, gx AD_DVARS = gf, gresult Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end AD_TOP = gf AD_IVARS = x, gx AD_DVARS = gf, gresult function ggf(x, gx, gx, ggx, gresult, ggresult, gresult) double precision x, gx, gx, ggx, ggf, gresult, gresult, ggresult ggf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx gresult = 6.0d0*x*x*gx ggresult = 6.0d0*x*x*ggx+12.0d0*x*gx*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end AD_TOP = gf AD_IVARS = x, gx AD_DVARS = gf, gresult function ggf(x, gx, gx, ggx, gresult, ggresult, gresult) double precision x, gx, gx, ggx, ggf, gresult, gresult, ggresult ggf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx gresult = 6.0d0*x*x*gx ggresult = 6.0d0*x*x*ggx+12.0d0*x*gx*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end AD_TOP = gf AD_IVARS = x, gx AD_DVARS = gf, gresult function ggf(x, gx, gx, ggx, gresult, ggresult, gresult) double precision x, gx, gx, ggx, ggf, gresult, gresult, ggresult ggf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx gresult = 6.0d0*x*x*gx ggresult = 6.0d0*x*x*ggx+12.0d0*x*gx*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end AD_TOP = gf AD_IVARS = x, gx AD_DVARS = gf, gresult function ggf(x, gx, gx, ggx, gresult, ggresult, gresult) double precision x, gx, gx, ggx, ggf, gresult, gresult, ggresult ggf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx gresult = 6.0d0*x*x*gx ggresult = 6.0d0*x*x*ggx+12.0d0*x*gx*gx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Preprocessor: ADIFOR and TAPENADE function f(x) double precision x, f f = 2.0d0*x*x*x end AD_TOP = f AD_IVARS = x AD_DVARS = f function gf(x, gx, gresult) double precision x, gx, gf, gresult gf = 2.0d0*x*x*x gresult = 6.0d0*x*x*gx end AD_TOP = gf AD_IVARS = x, gx AD_DVARS = gf, gresult AD_PREFIX = h function hgf(x, hx, gx, hgx, gresult, hgresult, hresult) double precision x, hx, gx, hgx, hgf, hresult, gresult, hgresult hgf = 2.0d0*x*x*x hresult = 6.0d0*x*x*hx gresult = 6.0d0*x*x*gx hgresult = 6.0d0*x*x*hgx+12.0d0*x*gx*hx end Fast but inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 9 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . template <typename T> T f(T x) {return 2*x*x*x;} T x; Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . template <typename T> T f(T x) {return 2*x*x*x;} T x; Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Static Overloading: FADBAD ++ double f(double x) {return 2*x*x*x;} double x; . . . f(x) . . . F<double> f(F<double> x) {return 2*x*x*x;} F<double> x; x.diff(0, 1); . . . f(x).d(0) . . . F<F<double> > f(F<F<double> > x) {return 2*x*x*x;} F<F<double> > x; x.diff(0, 1); x.diff(0, 1).diff(0,1); . . . f(x).d(0).d(0) . . . template <typename T> T f(T x) {return 2*x*x*x;} T x; Slow and inconvenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 10 / 73 Our API for Functional Forward AD − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle − ⇁ − ⇁ : Rn × Rn → (Rn ⊲ Rn ) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent − ⇁ − ⇁ : Rn × Rn → (Rn ⊲ Rn ) − ⇁ : (Rn ⊲ Rn ) → Rn − ⇁ − ⇁ : (Rn ⊲ Rn ) → Rn − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent j* − ⇁ − ⇁ : Rn × Rn → (Rn ⊲ Rn ) − ⇁ : (Rn ⊲ Rn ) → Rn − ⇁ − ⇁ : (Rn ⊲ Rn ) → Rn − ⇁ − ⇁ : (Rn → Rm ) → ((Rn ⊲ Rn ) → (Rm ⊲ Rm )) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent j* − ⇁ − ⇁ : Rn × Rn → (Rn ⊲ Rn ) − ⇁ : (Rn ⊲ Rn ) → Rn − ⇁ − ⇁ : (Rn ⊲ Rn ) → Rn − ⇁ − ⇁ : (Rn → Rm ) → ((Rn ⊲ Rn ) → (Rm ⊲ Rm )) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent j* − ⇁ − ⇁ : Rn × Rn → (Rn ⊲ Rn ) − ⇁ : (Rn ⊲ Rn ) → Rn − ⇁ − ⇁ : (Rn ⊲ Rn ) → Rn − ⇁ − ⇁ : (Rn → Rm ) → ((Rn ⊲ Rn ) → (Rm ⊲ Rm )) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle : primal : ⇁ ⇁ τ ×− τ → (τ ⊲ − τ) − ⇁ (τ ⊲ τ ) → τ tangent : ⇁ ⇁ (τ ⊲ − τ )→− τ j* : ⇁ ⇁ (τ1 → τ2 ) → ((τ1 ⊲ − τ1 ) → (τ2 ⊲ − τ2 )) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle : primal : ⇁ ⇁ τ → (τ ⊲ − τ) τ ×− − ⇁ (τ ⊲ τ ) → τ tangent : ⇁ ⇁ τ )→− τ (τ ⊲ − j* : ⇁ ⇁ τ1 ) → (τ2 ⊲ − τ2 )) (τ1 → τ2 ) → ((τ1 ⊲ − − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle : primal : ⇁ ⇁ τ ×− τ → (τ ⊲ − τ) − ⇁ (τ ⊲ τ ) → τ tangent : ⇁ ⇁ (τ ⊲ − τ )→− τ j* : ⇁ ⇁ (τ1 → τ2 ) → ((τ1 ⊲ − τ1 ) → (τ2 ⊲ − τ2 )) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ ⇀ τ as − τ Can abbreviate τ ⊲ − Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent ⇀ ⇁ τ : τ ×− τ →− − ⇀ : τ →τ ⇀ ⇁ τ →− : − τ ⇀ ⇀ τ1 → − τ2 ) j* : (τ1 → τ2 ) → (− − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇀ ⇁ τ Can abbreviate τ ⊲ − τ as − Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal ⇁ ⇀ : τ ×− τ →− τ − ⇀ : τ →τ ⇀ ⇁ tangent : − τ →− τ − → ⇀ ⇀ τ1 → − τ2 ) J : (τ1 → τ2 ) → (− − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ − ⇀ Can abbreviate τ ⊲ − τ as − →τ Sometimes write j* as J Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent ⇁ ⇀ : τ ×− τ →− τ − ⇀ : τ →τ ⇀ ⇁ : − τ →− τ ⇀ ⇀ j* : (τ1 → τ2 ) → (− τ1 → − τ2 ) (define ((D f) x) (tangent ((j* f) (bundle x 1)))) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ − ⇀ Can abbreviate τ ⊲ − τ as − →τ Sometimes write j* as J Convenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent ⇁ ⇀ : τ ×− τ →− τ − ⇀ : τ →τ ⇀ ⇁ : − τ →− τ ⇀ ⇀ j* : (τ1 → τ2 ) → (− τ1 → − τ2 ) (define ((D f) x) (tangent ((j* f) (bundle x 1)))) (D f) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ − ⇀ Can abbreviate τ ⊲ − τ as − →τ Sometimes write j* as J Convenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent ⇁ ⇀ : τ ×− τ →− τ − ⇀ : τ →τ ⇀ ⇁ : − τ →− τ ⇀ ⇀ j* : (τ1 → τ2 ) → (− τ1 → − τ2 ) (define ((D f) x) (tangent ((j* f) (bundle x 1)))) (D f) (D (D f)) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ − ⇀ Can abbreviate τ ⊲ − τ as − →τ Sometimes write j* as J Convenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent ⇁ ⇀ : τ ×− τ →− τ − ⇀ : τ →τ ⇀ ⇁ : − τ →− τ ⇀ ⇀ j* : (τ1 → τ2 ) → (− τ1 → − τ2 ) (define ((D f) x) (tangent ((j* f) (bundle x 1)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ − ⇀ Can abbreviate τ ⊲ − τ as − →τ Sometimes write j* as J Convenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent ⇁ ⇀ : τ ×− τ →− τ − ⇀ : τ →τ ⇀ ⇁ : − τ →− τ ⇀ ⇀ j* : (τ1 → τ2 ) → (− τ1 → − τ2 ) (define ((D f) x) (tangent ((j* f) (bundle x 1)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ − ⇀ Can abbreviate τ ⊲ − τ as − →τ Sometimes write j* as J What is (j* j*)? Convenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent ⇁ ⇀ : τ ×− τ →− τ − ⇀ : τ →τ ⇀ ⇁ : − τ →− τ ⇀ ⇀ j* : (τ1 → τ2 ) → (− τ1 → − τ2 ) (define ((D f) x) (tangent ((j* f) (bundle x 1)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ − ⇀ Can abbreviate τ ⊲ − τ as − →τ Sometimes write j* as J What is (j* j*)? Convenient Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 Our API for Functional Forward AD bundle primal tangent ⇁ ⇀ : τ ×− τ →− τ − ⇀ : τ →τ ⇀ ⇁ : − τ →− τ ⇀ ⇀ j* : (τ1 → τ2 ) → (− τ1 → − τ2 ) (define ((D f) x) (tangent ((j* f) (bundle x 1)))) (D f) (D (D f)) (D (lambda (x) . . . (D (lambda (y) . . .) . . .) . . .) . . .) − ⇁ Differential geometry bundles points Rn in a manifold with tangent vectors Rn j* maps a function to its push forward Generalize to arbitrary types What is the tangent of a discrete value or a function? ⇁ − ⇀ Can abbreviate τ ⊲ − τ as − →τ Sometimes write j* as J What is (j* j*)? Convenient and fast Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 11 / 73 A property Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn − ⇁ x : Rn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn − ⇁ x : Rn Jeffrey Mark Siskind (Purdue/ECE) f : Rn → Rm AD of Functional Programs Lecture Circuit 2009 12 / 73 A property − ⇁ x : Rn x : Rn ((J f ) x)[i, j] = f : Rn → Rm ∂f (x)[i] ∂x[j] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property − ⇁ x : Rn x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] f : Rn → Rm ((J f ) x) : Rm×n Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs L Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : Rn → Rm Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : Rn → Rm 0/1 matrix, every row has exactly one 1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : Rn → Rm 0/1 matrix, every row has exactly one 1 ((J f ) x) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : Rn → Rm 0/1 matrix, every row has exactly one 1 ((J f ) x) = ∂f (x)[i] ∂x[j] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : Rn → Rm 0/1 matrix, every row has exactly one 1 ( 1 when (f x)[i] = x[j] ∂f (x)[i] ((J f ) x) = ∂x[j] = 0 otherwise Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : Rn → Rm 0/1 matrix, every row has exactly one 1 ( 1 when (f x)[i] = x[j] ∂f (x)[i] ((J f ) x) = ∂x[j] = =f 0 otherwise Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : Rn → Rm 0/1 matrix, every row has exactly one 1 ( 1 when (f x)[i] = x[j] ∂f (x)[i] ((J f ) x) = ∂x[j] = =f 0 otherwise when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property x : Rn ((J f ) x)[i, j] = ∂f (x)[i] ∂x[j] − ⇁ x : Rn f : Rn → Rm ((J f ) x) : Rm×n ((J f ) x) : Rn → Rm L ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : Rn → Rm 0/1 matrix, every row has exactly one 1 ( 1 when (f x)[i] = x[j] ∂f (x)[i] ((J f ) x) = ∂x[j] = =f 0 otherwise when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property − ⇁ x : τ1 x : τ1 ((J f ) x)[i, j] = f : τ1 → τ2 L ∂f (x)[i] ∂x[j] ((J f ) x) : τ1 → τ2 ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : τ1 → τ2 0/1 matrix, every row has exactly one 1 ( 1 when (f x)[i] = x[j] ∂f (x)[i] ((J f ) x) = ∂x[j] = =f 0 otherwise when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 A property − ⇁ x : τ1 x : τ1 ((J f ) x)[i, j] = f : τ1 → τ2 L ∂f (x)[i] ∂x[j] ((J f ) x) : τ1 → τ2 ⇁ (primal ((j* f ) (bundle x − x ))) = (f x) ⇁ ⇁ (tangent ((j* f ) (bundle x − x ))) = ((J f ) x) × − x − ⇁ − ⇁ (tangent ((j* f ) (bundle x x ))) = (((J f ) x) x ) ((j* f ) x) = (bundle (f (primal x)) (((J f ) (primal x)) (tangent x))) rearrangement function: (∀i)(∃j)(f x)[i] = x[j] L f : τ1 → τ2 0/1 matrix, every row has exactly one 1 ( 1 when (f x)[i] = x[j] ∂f (x)[i] ((J f ) x) = ∂x[j] = =f 0 otherwise when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 12 / 73 What is the tangent of #t? Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) f : (#t x y) 7→ (#t x y) Jeffrey Mark Siskind (Purdue/ECE) but f : (#f x y) 7→ (#f y x) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) f : (#t x y) 7→ (#t x y) but f : (#f x y) 7→ (#f y x) f is a rearrangement function Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) f : (#t x y) 7→ (#t x y) but f : (#f x y) 7→ (#f y x) f is a rearrangement function ⇁ ⇁ ((j* f ) (bundle (#t x y) (#f − x − y ))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) f : (#t x y) 7→ (#t x y) but f : (#f x y) 7→ (#f y x) f is a rearrangement function ⇁ ⇁ ((j* f ) (bundle (#t x y) (#f − x − y ))) − ⇁ − ⇁ = (bundle (#t x y) (#f y x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) f : (#t x y) 7→ (#t x y) but f : (#f x y) 7→ (#f y x) f is a rearrangement function ⇁ ⇁ ((j* f ) (bundle (#t x y) (#f − x − y ))) − ⇁ − ⇁ = (bundle (#t x y) (#f y x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) f : (#t x y) 7→ (#t x y) but f : (#f x y) 7→ (#f y x) f is a rearrangement function ⇁ ⇁ ((j* f ) (bundle (#t x y) (#f − x − y ))) − ⇁ − ⇁ = (bundle (#t x y) (#f y x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is the tangent of #t? − ⇁ What if we take #t = #f? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) f : (#t x y) 7→ (#t x y) but f : (#f x y) 7→ (#f y x) f is a rearrangement function ⇁ ⇁ ((j* f ) (bundle (#t x y) (#f − x − y ))) − ⇁ − ⇁ = (bundle (#t x y) (#f y x )) − ⇁ Problem avoided if we take #t = #t Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 13 / 73 What is (j* j*)? Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 14 / 73 What is (j* j*)? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 14 / 73 What is (j* j*)? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) bundle, primal, tangent, and j* are rearrangement functions Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 14 / 73 What is (j* j*)? when f is a rearrangement function ((j* f ) x) = (bundle (f (primal x)) (f (tangent x))) bundle, primal, tangent, and j* are rearrangement functions ((j* bundle) x)=(bundle (bundle (primal x)) (bundle (tangent x))) ((j* primal) x)=(bundle (primal (primal x)) (primal (tangent x))) ((j* tangent) x)=(bundle (tangent (primal x)) (tangent (tangent x))) ((j* j*) x)=(bundle (j* (primal x)) (j* (tangent x))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 14 / 73 A (Not So) Brief Tutorial on AD z = g(f (x)) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 15 / 73 A (Not So) Brief Tutorial on AD z = g(f (x)) = (f ◦ g)(x) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 15 / 73 A (Not So) Brief Tutorial on AD z = g(f (x)) = (f ◦ g)(x) y = f (x) z = g(y) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 15 / 73 A (Not So) Brief Tutorial on AD z = g(f (x)) = (f ◦ g)(x) y = f (x) z = g(y) dz dy dz = dx dy dx Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 15 / 73 A (Not So) Brief Tutorial on AD z = g(f (x)) = (f ◦ g)(x) y = f (x) z = g(y) dz dy dz = dx dy dx D (f ◦ g) x = (D g y) × (D f x) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 15 / 73 A (Not So) Brief Tutorial on AD f = f 1 ◦ · · · ◦ fn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 16 / 73 A (Not So) Brief Tutorial on AD f = f 1 ◦ · · · ◦ fn x1 = f1 x0 .. . xn = fn xn−1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 16 / 73 A (Not So) Brief Tutorial on AD f = f 1 ◦ · · · ◦ fn x1 = f1 x0 .. . xn = fn xn−1 J f x0 = (J fn xn−1 ) × · · · × (J f1 x0 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 16 / 73 A (Not So) Brief Tutorial on AD f = f 1 ◦ · · · ◦ fn x1 = f1 x0 .. . xn = fn xn−1 J f x0 = (J fn xn−1 ) × · · · × (J f1 x0 ) (J f x0 )⊤ = (J f1 x0 )⊤ × · · · × (J fn xn−1 )⊤ Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 16 / 73 A (Not So) Brief Tutorial on AD − ⇁ Xn = J f x0 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 17 / 73 A (Not So) Brief Tutorial on AD − ⇁ Xn = J f x0 = (J fn xn−1 ) × · · · ×(J f2 x1 )×(J f1 x0 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 17 / 73 A (Not So) Brief Tutorial on AD − ⇁ Xn = J f x0 = (J fn xn−1 ) × · · · ×(J f2 x1 )×(J f1 x0 ) − ⇁ X1 = (J f1 x0 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 17 / 73 A (Not So) Brief Tutorial on AD − ⇁ Xn = J f x0 = (J fn xn−1 ) × · · · ×(J f2 x1 )×(J f1 x0 ) − ⇁ X1 = (J f1 x0 ) − ⇁ − ⇁ X2 = (J f2 x1 ) × X1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 17 / 73 A (Not So) Brief Tutorial on AD − ⇁ Xn = J f x0 = (J fn xn−1 ) × · · · ×(J f2 x1 )×(J f1 x0 ) − ⇁ X1 = (J f1 x0 ) − ⇁ − ⇁ X2 = (J f2 x1 ) × X1 .. . − ⇁ −−⇁ Xn = (J fn xn−1 ) × Xn−1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 17 / 73 A (Not So) Brief Tutorial on AD ↽− X0 = (J f x0 )⊤ Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 18 / 73 A (Not So) Brief Tutorial on AD ↽− X0 = (J f x0 )⊤ = (J f1 x0 )⊤ × · · · ×(J fn−1 xn−2 )⊤×(J fn xn−1 )⊤ Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 18 / 73 A (Not So) Brief Tutorial on AD ↽− X0 = (J f x0 )⊤ = (J f1 x0 )⊤ × · · · ×(J fn−1 xn−2 )⊤×(J fn xn−1 )⊤ ↽−−− Xn−1 = (J fn xn−1 )⊤ Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 18 / 73 A (Not So) Brief Tutorial on AD ↽− X0 = (J f x0 )⊤ = (J f1 x0 )⊤ × · · · ×(J fn−1 xn−2 )⊤×(J fn xn−1 )⊤ ↽−−− Xn−1 = (J fn xn−1 )⊤ ↽−−− ↽−−− Xn−2 = (J fn−1 xn−2 )⊤ × Xn−1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 18 / 73 A (Not So) Brief Tutorial on AD ↽− X0 = (J f x0 )⊤ = (J f1 x0 )⊤ × · · · ×(J fn−1 xn−2 )⊤×(J fn xn−1 )⊤ ↽−−− Xn−1 = (J fn xn−1 )⊤ ↽−−− ↽−−− Xn−2 = (J fn−1 xn−2 )⊤ × Xn−1 .. . ↽− ↽− X0 = (J f1 x0 )⊤ × X1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 18 / 73 A (Not So) Brief Tutorial on AD − ⇁ X1 − ⇁ X2 − ⇁ Xn = (J f1 x0 ) − ⇁ = (J f2 x1 ) × X1 .. . −−⇁ = (J fn xn−1 ) × Xn−1 Jeffrey Mark Siskind (Purdue/ECE) ↽−−− Xn−1 ↽−−− Xn−2 ↽− X0 = (J fn xn−1 )⊤ ↽−−− = (J fn−1 xn−2 )⊤ × Xn−1 .. . ↽− = (J f1 x0 )⊤ × X1 AD of Functional Programs Lecture Circuit 2009 19 / 73 A (Not So) Brief Tutorial on AD − ⇁ X1 − ⇁ X2 − ⇁ Xn = (J f1 x0 ) − ⇁ = (J f2 x1 ) × X1 .. . −−⇁ = (J fn xn−1 ) × Xn−1 Jeffrey Mark Siskind (Purdue/ECE) ↽−−− Xn−1 ↽−−− Xn−2 ↽− X0 = (J fn xn−1 )⊤ ↽−−− = (J fn−1 xn−2 )⊤ × Xn−1 .. . ↽− = (J f1 x0 )⊤ × X1 AD of Functional Programs Lecture Circuit 2009 19 / 73 A (Not So) Brief Tutorial on AD − ⇁ X1 − ⇁ X2 − ⇁ Xn = (J f1 x0 ) − ⇁ = (J f2 x1 ) × X1 .. . −−⇁ = (J fn xn−1 ) × Xn−1 Jeffrey Mark Siskind (Purdue/ECE) ↽−−− Xn−1 ↽−−− Xn−2 ↽− X0 = (J fn xn−1 )⊤ ↽−−− = (J fn−1 xn−2 )⊤ × Xn−1 .. . ↽− = (J f1 x0 )⊤ × X1 AD of Functional Programs Lecture Circuit 2009 19 / 73 A (Not So) Brief Tutorial on AD − − ⇁ x⇁ n = (J f x0 ) × x0 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 20 / 73 A (Not So) Brief Tutorial on AD − − ⇁ x⇁ n = (J f x0 ) × x0 = (J fn xn−1 ) × · · · × (J f1 x0 ) × − x⇁ 0 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 20 / 73 A (Not So) Brief Tutorial on AD − − ⇁ x⇁ n = (J f x0 ) × x0 = (J fn xn−1 ) × · · · × (J f1 x0 ) × − x⇁ 0 − − ⇁ x⇁ 1 = (J f1 x0 ) × x0 .. . − ⇁ ⇁ xn = (J fn xn−1 ) × − x− n−1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 20 / 73 A (Not So) Brief Tutorial on AD ↽− ↽− x0 = (J f x0 )⊤ × xn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 21 / 73 A (Not So) Brief Tutorial on AD ↽− ↽− x0 = (J f x0 )⊤ × xn ↽− = (J f1 x0 )⊤ × · · · × (J fn xn−1 )⊤ × xn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 21 / 73 A (Not So) Brief Tutorial on AD ↽− ↽− x0 = (J f x0 )⊤ × xn ↽− = (J f1 x0 )⊤ × · · · × (J fn xn−1 )⊤ × xn ↽−−− ↽− xn−1 = (J fn xn−1 )⊤ × xn .. . ↽− ↽− x0 = (J f1 x0 )⊤ × x1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 21 / 73 A (Not So) Brief Tutorial on AD y=A×x Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 22 / 73 A (Not So) Brief Tutorial on AD y=A×x = f (x) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 22 / 73 A (Not So) Brief Tutorial on AD y=A×x = f (x) − − ⇁ x⇁ n = (J f x0 ) × x0 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 22 / 73 A (Not So) Brief Tutorial on AD y=A×x = f (x) − − ⇁ x⇁ n = (J f x0 ) × x0 − ⇁ = f x0 − x⇁ 0 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 22 / 73 A (Not So) Brief Tutorial on AD y=A×x = f (x) − − ⇁ x⇁ n = (J f x0 ) × x0 − ⇁ = f x0 − x⇁ 0 ↽− ↽− x0 = (J f x0 )⊤ × xn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 22 / 73 A (Not So) Brief Tutorial on AD y=A×x = f (x) − − ⇁ x⇁ n = (J f x0 ) × x0 − ⇁ = f x0 − x⇁ 0 ↽− ↽− x0 = (J f x0 )⊤ × xn ↽ − ↽− = f x0 xn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 22 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ Xn [; j] = f x0 − ej Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 23 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ Xn [; j] = f x0 − ej = (J f x0 )[; j] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 23 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ Xn [; j] = f x0 − ej = (J f x0 )[; j] ↽− ↽ − ↽ − X0 [; i] = f x0 ei Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 23 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ Xn [; j] = f x0 − ej = (J f x0 )[; j] ↽− ↽ − ↽ − X0 [; i] = f x0 ei = (J f x0 )⊤[; i] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 23 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ Xn [; j] = f x0 − ej = (J f x0 )[; j] ↽− ↽ − ↽ − X0 [; i] = f x0 ei = (J f x0 )⊤[; i] = (J f x0 )[i; ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 23 / 73 A (Not So) Brief Tutorial on AD y = B × (A × x) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 24 / 73 A (Not So) Brief Tutorial on AD y = B × (A × x) = (B × A) × x Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 24 / 73 A (Not So) Brief Tutorial on AD y = B × (A × x) = (B × A) × x = g(f (x)) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 24 / 73 A (Not So) Brief Tutorial on AD y = B × (A × x) = (B × A) × x = g(f (x)) = (f ◦ g)(x) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 24 / 73 A (Not So) Brief Tutorial on AD y = B × (A × x) = (B × A) × x = g(f (x)) = (f ◦ g)(x) − ⇁ − ⇁ (J fn xn−1 ) × · · · × (J f1 x0 ) = ( f1 x0 ) ◦ · · · ◦ ( fn xn−1 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 24 / 73 A (Not So) Brief Tutorial on AD y = B × (A × x) = (B × A) × x = g(f (x)) = (f ◦ g)(x) − ⇁ − ⇁ (J fn xn−1 ) × · · · × (J f1 x0 ) = ( f1 x0 ) ◦ · · · ◦ ( fn xn−1 ) ↽ − ↽ − (J f1 x0 )⊤ × · · · × (J fn xn−1 )⊤ = ( fn xn−1 ) ◦ · · · ◦ ( f1 x0 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 24 / 73 A (Not So) Brief Tutorial on AD xi−1 = ( xi−1 [1] · · · xi−1 [Li ] · · · xi−1 [Ri ] · · · xi−1 [m] ) y yy yy y yy yy y | y xi = fi xi−1 = ( xi−1 [1] · · · ui xi−1 [Ri ] · · · xi−1 [Ri ] · · · xi−1 [m] ) x[Li ] := ui x[Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 25 / 73 A (Not So) Brief Tutorial on AD xi−1 = ( xi−1 [1] · · · xi−1 [Li ] · · · xi−1 [Ri ] · · · xi−1 [m] ) y yy yy y yy yy y | y xi = fi xi−1 = ( xi−1 [1] · · · ui xi−1 [Ri ] · · · xi−1 [Ri ] · · · xi−1 [m] ) x[Li ] := ui x[Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 25 / 73 A (Not So) Brief Tutorial on AD xi−1 = ( xi−1 [1] · · · xi−1 [Li ] · · · xi−1 [Ri ] · · · xi−1 [m] ) y yy yy y yy yy y | y xi = fi xi−1 = ( xi−1 [1] · · · ui xi−1 [Ri ] · · · xi−1 [Ri ] · · · xi−1 [m] ) x[Li ] := ui x[Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 25 / 73 A (Not So) Brief Tutorial on AD xi−1 = ( xi−1 [1] · · · xi−1 [Li ] · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] ) j rr jjjj rrr jjjjjj r r r jjj rrr jjjj xrrrujjjj xi = fi xi−1 = ( xi−1 [1] · · · bi (xi−1 [Ri ], xi−1 [Si ]) · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] ) x[Li ] := b (x[Ri ], x[Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 26 / 73 A (Not So) Brief Tutorial on AD xi−1 = ( xi−1 [1] · · · xi−1 [Li ] · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] ) j rr jjjj rrr jjjjjj r r r jjj rrr jjjj xrrrujjjj xi = fi xi−1 = ( xi−1 [1] · · · bi (xi−1 [Ri ], xi−1 [Si ]) · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] ) x[Li ] := b (x[Ri ], x[Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 26 / 73 A (Not So) Brief Tutorial on AD xi−1 = ( xi−1 [1] · · · xi−1 [Li ] · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] ) j rr jjjj rrr jjjjjj r r r jjj rrr jjjj xrrrujjjj xi = fi xi−1 = ( xi−1 [1] · · · bi (xi−1 [Ri ], xi−1 [Si ]) · · · xi−1 [Ri ] · · · xi−1 [Si ] · · · xi−1 [m] ) x[Li ] := b (x[Ri ], x[Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 26 / 73 A (Not So) Brief Tutorial on AD Li ↓ Ri ↓ 0 u′ 1 .. . 1 Li → 1 .. . Ri → 1 .. . 1 u′ = D ui xi−1 [Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 27 / 73 A (Not So) Brief Tutorial on AD Li ↓ Ri ↓ Si ↓ 0 b′1 b′2 1 .. . 1 Li → 1 .. Ri → . 1 .. . Si → 1 .. . 1 b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ]) b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 28 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ xi = fi xi−1 − x− i−1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 29 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ xi = fi xi−1 − x− i−1 ⇁ = (J fi xi−1 ) × − x− i−1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 29 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ xi = fi xi−1 − x− i−1 ⇁ = (J fi xi−1 ) × − x− i−1 − ⇁ 1 x− i−1 [1] .. .. . . − − ⇁ 1 xi−1 [Li − 1] ⇁ 0 u′ u′ × − x− i−1 [Ri ] − − ⇁ 1 xi−1 [Li + 1] = .. .. . . − − ⇁ 1 xi−1 [Ri ] .. . − x−⇁[m] i−1 −−⇁ xi−1 [1] .. . −−⇁ xi−1 [Li − 1] −−⇁ xi−1 [Li ] −−⇁ xi−1 [Li + 1] .. . −−⇁ xi−1 [Ri ] .. .. . . − ⇁ x− 1 i−1 [m] u′ = D ui xi−1 [Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 29 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ xi = fi xi−1 − x− i−1 ⇁ = (J fi xi−1 ) × − x− i−1 − ⇁ 1 x− i−1 [1] .. .. . . − − ⇁ 1 xi−1 [Li − 1] ⇁ 0 u′ u′ × − x− i−1 [Ri ] − − ⇁ 1 xi−1 [Li + 1] = .. .. . . − − ⇁ 1 xi−1 [Ri ] .. . − x−⇁[m] i−1 −−⇁ xi−1 [1] .. . −−⇁ xi−1 [Li − 1] −−⇁ xi−1 [Li ] −−⇁ xi−1 [Li + 1] .. . −−⇁ xi−1 [Ri ] .. .. . . − ⇁ x− 1 i−1 [m] u′ = D ui xi−1 [Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 29 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ ⇁ xi = fi xi−1 − x− i−1 ⇁ = (J fi xi−1 ) × − x− i−1 − ⇁ 1 x− i−1 [1] .. .. . . − − ⇁ 1 xi−1 [Li − 1] ⇁ 0 u′ u′ × − x− i−1 [Ri ] − − ⇁ 1 xi−1 [Li + 1] = .. .. . . − − ⇁ 1 xi−1 [Ri ] .. . − x−⇁[m] i−1 −−⇁ xi−1 [1] .. . −−⇁ xi−1 [Li − 1] −−⇁ xi−1 [Li ] −−⇁ xi−1 [Li + 1] .. . −−⇁ xi−1 [Ri ] .. .. . . − ⇁ 1 x− i−1 [m] u′ = D ui xi−1 [Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 29 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ −−⇁ xi = fi xi−1 x i−1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 30 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ −−⇁ xi = fi xi−1 x i−1 ⇁ = (J fi xi−1 ) × − x− i−1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 30 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ −−⇁ xi = fi xi−1 x i−1 ⇁ = (J fi xi−1 ) × − x− i−1 − ⇁ 1 x− i−1 [1] .. .. . . − ⇁ 1 x− i−1 [Li − 1] ⇁ −−⇁ ′ 0 b′1 b′1 × x−− i−1 [Ri ] + b2 × xi−1 [Si ] − ⇁ 1 x− i−1 [Li + 1] .. .. = . . − − ⇁ 1 xi−1 [Ri ] .. . − ⇁ x− i−1 [Si ] .. . − − ⇁ xi−1 [m] b′2 .. . 1 −−⇁ xi−1 [1] .. . −−⇁ xi−1 [Li − 1] −−⇁ xi−1 [Li ] −−⇁ xi−1 [Li + 1] .. . −−⇁ xi−1 [Ri ] .. . −−⇁ xi−1 [Si ] .. .. . . − ⇁ x− 1 i−1 [m] b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ]) b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 30 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ −−⇁ xi = fi xi−1 x i−1 ⇁ = (J fi xi−1 ) × − x− i−1 − ⇁ 1 x− i−1 [1] .. .. . . − ⇁ 1 x− i−1 [Li − 1] ⇁ −−⇁ ′ 0 b′1 b′1 × − x− i−1 [Ri ] + b2 × xi−1 [Si ] − ⇁ 1 x− i−1 [Li + 1] .. .. = . . − − ⇁ 1 xi−1 [Ri ] .. . − ⇁ x− i−1 [Si ] .. . − − ⇁ xi−1 [m] b′2 .. . 1 −−⇁ xi−1 [1] .. . −−⇁ xi−1 [Li − 1] −−⇁ xi−1 [Li ] −−⇁ xi−1 [Li + 1] .. . −−⇁ xi−1 [Ri ] .. . −−⇁ xi−1 [Si ] .. .. . . − ⇁ x− 1 i−1 [m] b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ]) b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 30 / 73 A (Not So) Brief Tutorial on AD − ⇁ − ⇁ −−⇁ xi = fi xi−1 x i−1 ⇁ = (J fi xi−1 ) × − x− i−1 − ⇁ 1 x− i−1 [1] .. .. . . − ⇁ 1 x− i−1 [Li − 1] ⇁ −−⇁ ′ 0 b′1 b′1 × − x− i−1 [Ri ] + b2 × xi−1 [Si ] − ⇁ 1 x− i−1 [Li + 1] .. .. = . . − − ⇁ 1 xi−1 [Ri ] .. . − ⇁ x− i−1 [Si ] .. . − − ⇁ xi−1 [m] b′2 .. . 1 −−⇁ xi−1 [1] .. . −−⇁ xi−1 [Li − 1] −−⇁ xi−1 [Li ] −−⇁ xi−1 [Li + 1] .. . −−⇁ xi−1 [Ri ] .. . −−⇁ xi−1 [Si ] .. .. . . − ⇁ 1 x− i−1 [m] b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ]) b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 30 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 31 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi ↽ − = (J fi xi−1 )⊤ × xi Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 31 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi ↽ − = (J fi xi−1 )⊤ × xi ↽ − xi [1] .. . ↽ − xi [Li − 1] 0 ↽ − xi [Li + 1] . . . ′ ↽ ↽ − u × − xi [Li ] + xi [Ri ] .. . ↽ − xi [m] = 1 .. . 1 0 1 .. u′ . 1 .. . 1 ↽ − xi [1] . .. − ↽ xi [Li − 1] ↽ − xi [Li ] ↽ − x i [Li + 1] . .. − ↽ xi [Ri ] . . . ↽ − xi [m] u′ = D ui xi−1 [Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 31 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi ↽ − = (J fi xi−1 )⊤ × xi ↽ − xi [1] .. . ↽ − xi [Li − 1] 0 ↽ − xi [Li + 1] . . . ′ ↽ ↽ − u × − xi [Li ] + xi [Ri ] .. . ↽ − xi [m] = 1 .. . 1 0 1 .. u′ . 1 .. . 1 ↽ − xi [1] . .. − ↽ xi [Li − 1] ↽ − xi [Li ] ↽ − x i [Li + 1] . .. − ↽ xi [Ri ] . . . ↽ − xi [m] u′ = D ui xi−1 [Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 31 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi ↽ − = (J fi xi−1 )⊤ × xi ↽ − xi [1] .. . ↽ − xi [Li − 1] 0 ↽ − xi [Li + 1] . . . ′ ↽ ↽ − u × − xi [Li ] + xi [Ri ] .. . ↽ − xi [m] = 1 .. . 1 0 1 .. u′ . 1 .. . 1 ↽ − xi [1] . .. − ↽ xi [Li − 1] ↽ − xi [Li ] ↽ − xi [Li + 1] . .. − ↽ xi [Ri ] . . . ↽ − xi [m] u′ = D ui xi−1 [Ri ] Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 31 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 32 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi ↽ − = (J fi xi−1 )⊤ × xi Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 32 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi ↽ − = (J fi xi−1 )⊤ × xi ↽ − xi [1] 1 .. .. . . ↽ − xi [Li − 1] 1 0 0 ↽ − 1 xi [Li + 1] .. .. . . = ↽ − ↽ − b′1 1 b′1 × xi [Li ] + xi [Ri ] .. .. . . ′ ↽ − ↽ − ′ b 1 2 b2 × xi [Li ] + xi [Si ] .. .. . . ↽ − 1 xi [m] ↽ − xi [1] . .. ↽ − xi [Li − 1] ↽ − xi [Li ] ↽ − xi [Li + 1] . .. ↽ − xi [Ri ] . .. ↽ − xi [Si ] .. . ↽ − xi [m] b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ]) b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 32 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi ↽ − = (J fi xi−1 )⊤ × xi ↽ − xi [1] 1 .. .. . . ↽ − xi [Li − 1] 1 0 0 ↽ − 1 xi [Li + 1] .. .. . . = ↽ − ↽ − b′1 1 b′1 × xi [Li ] + xi [Ri ] .. .. . . ′ ↽ − ↽ − ′ b 1 2 b2 × xi [Li ] + xi [Si ] .. .. . . ↽ − 1 xi [m] ↽ − xi [1] . .. ↽ − xi [Li − 1] ↽ − xi [Li ] ↽ − xi [Li + 1] . .. ↽ − xi [Ri ] . .. ↽ − xi [Si ] .. . ↽ − xi [m] b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ]) b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 32 / 73 A (Not So) Brief Tutorial on AD ↽−− ↽ − ↽ − xi−1 = fi xi−1 xi ↽ − = (J fi xi−1 )⊤ × xi ↽ − xi [1] 1 .. .. . . ↽ − xi [Li − 1] 1 0 0 ↽ − 1 xi [Li + 1] .. .. . . = ↽ − ↽ − b′1 1 b′1 × xi [Li ] + xi [Ri ] .. .. . . ′ ↽ − ↽ − ′ b 1 2 b2 × xi [Li ] + xi [Si ] .. .. . . ↽ − 1 xi [m] ↽ − xi [1] . .. ↽ − xi [Li − 1] ↽ − xi [Li ] ↽ − xi [Li + 1] . .. ↽ − xi [Ri ] . .. ↽ − xi [Si ] .. . ↽ − xi [m] b′1 = D1 bi (xi−1 [Ri ], xi−1 [Si ]) b′2 = D2 bi (xi−1 [Ri ], xi−1 [Si ]) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 32 / 73 A (Not So) Brief Tutorial on AD xn = fn xn−1 HH H HH H Jeffrey Mark Siskind (Purdue/ECE) − x⇁ n = .. . = = f1 x0 .. . − ⇁ − f1 x0 x⇁ 0 − ⇁ ⇁ fn xn−1 − x− n−1 − x⇁ 1 x1 x1 − x⇁ 1 = f1 x0 − ⇁ = f1 x0 − x⇁ 0 .. . xn − ⇁ xn = fn xn−1 − ⇁ ⇁ = fn xn−1 − x− n−1 AD of Functional Programs Lecture Circuit 2009 33 / 73 A (Not So) Brief Tutorial on AD xn = fn xn−1 HH H HH H Jeffrey Mark Siskind (Purdue/ECE) − x⇁ n = .. . = = f1 x0 .. . − ⇁ − f1 x0 x⇁ 0 − ⇁ ⇁ fn xn−1 − x− n−1 − x⇁ 1 x1 x1 − x⇁ 1 = f1 x0 − ⇁ = f1 x0 − x⇁ 0 .. . xn − ⇁ xn = fn xn−1 − ⇁ ⇁ = fn xn−1 − x− n−1 AD of Functional Programs Lecture Circuit 2009 33 / 73 A (Not So) Brief Tutorial on AD xn = fn xn−1 HH H HH H Jeffrey Mark Siskind (Purdue/ECE) − x⇁ n = .. . = = f1 x0 .. . − ⇁ − f1 x0 x⇁ 0 − ⇁ ⇁ fn xn−1 − x− n−1 − x⇁ 1 x1 x1 − x⇁ 1 = f1 x0 − ⇁ = f1 x0 − x⇁ 0 .. . xn − ⇁ xn = fn xn−1 − ⇁ ⇁ = fn xn−1 − x− n−1 AD of Functional Programs Lecture Circuit 2009 33 / 73 A (Not So) Brief Tutorial on AD x1 = f1 x0 − ⇁ − ⇁ x1 = f1 x0 − x⇁ 0 .. . xn = fn xn−1 − ⇁ − ⇁ ⇁ xn = fn xn−1 − x− n−1 − ⇁ − ⇁ (x1 , − x⇁ 1 ) = ((f1 x0 ), ( f1 x0 x0 )) .. . − ⇁ − ⇁ ⇁ (xn , xn ) = ((fn xn−1 ), ( fn xn−1 − x− n−1 )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 34 / 73 A (Not So) Brief Tutorial on AD x1 = f1 x0 .. . xn = fn xn−1 − ⇀− − ⇀ x⇀ 1 = f1 x0 .. . − − ⇀ −⇀ ⇀ xn = fn − xn−1 − ⇀ ⇁ x ≡ (x, − x) − ⇀− − ⇁ ⇁ f ⇀ x ≡ ((f x), ( f x − x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 35 / 73 A (Not So) Brief Tutorial on AD xLi := ui xRi xLi := bi (xRi , xSi ) ⇀ − − ⇀− x⇀ Li := ui xRi − ⇀ − x⇀ := b (− x⇀, − x⇀) Li i Ri Si − ⇀ ⇁ x ≡ (x, − x) − ⇀ ⇀ ⇁ u − x ≡ ((u x), ((D u x) × − x )) − ⇀ − ⇀ ⇁ ⇁ b (⇀ x1 , − x2 ) ≡ ((b (x1 , x2 )), ((D1 b (x1 , x2 )) × − x1 ) + ((D2 b (x1 , x2 )) × − x2 ))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 36 / 73 A (Not So) Brief Tutorial on AD H xn ↽−−− xn−1 ↽− x0 Jeffrey Mark Siskind (Purdue/ECE) HH H HH x1 xn = fn xn−1 ↽−−− ↽ − ↽− xn−1 = fn xn−1 xn .. . ↽− ↽ − ↽− x0 = f1 x0 x1 x1 = f1 x0 .. . = f1 x0 .. . = fn xn−1 ↽ − ↽− = fn xn−1 xn .. . ↽ − ↽− = f1 x0 x1 AD of Functional Programs Lecture Circuit 2009 37 / 73 A (Not So) Brief Tutorial on AD H xn ↽−−− xn−1 ↽− x0 Jeffrey Mark Siskind (Purdue/ECE) HH H HH x1 xn = fn xn−1 ↽−−− ↽ − ↽− xn−1 = fn xn−1 xn .. . ↽− ↽ − ↽− x0 = f1 x0 x1 x1 = f1 x0 .. . = f1 x0 .. . = fn xn−1 ↽ − ↽− = fn xn−1 xn .. . ↽ − ↽− = f1 x0 x1 AD of Functional Programs Lecture Circuit 2009 37 / 73 A (Not So) Brief Tutorial on AD H xn ↽−−− xn−1 ↽− x0 Jeffrey Mark Siskind (Purdue/ECE) HH H HH x1 xn = fn xn−1 ↽−−− ↽ − ↽− xn−1 = fn xn−1 xn .. . ↽− ↽ − ↽− x0 = f1 x0 x1 x1 = f1 x0 .. . = f1 x0 .. . = fn xn−1 ↽ − ↽− = fn xn−1 xn .. . ↽ − ↽− = f1 x0 x1 AD of Functional Programs Lecture Circuit 2009 37 / 73 A (Not So) Brief Tutorial on AD H xn ↽−−− xn−1 ↽− x0 Jeffrey Mark Siskind (Purdue/ECE) HH H HH x1 xn = fn xn−1 ↽−−− ↽ − ↽− xn−1 = fn xn−1 xn .. . ↽− ↽ − ↽− x0 = f1 x0 x1 x1 = f1 x0 .. . = f1 x0 .. . = fn xn−1 ↽ − ↽− = fn xn−1 xn .. . ↽ − ↽− = f1 x0 x1 AD of Functional Programs Lecture Circuit 2009 37 / 73 A (Not So) Brief Tutorial on AD H xn ↽−−− xn−1 ↽− x0 Jeffrey Mark Siskind (Purdue/ECE) HH H HH x1 xn = fn xn−1 ↽−−− ↽ − ↽− xn−1 = fn xn−1 xn .. . ↽− ↽ − ↽− x0 = f1 x0 x1 x1 = f1 x0 .. . = f1 x0 .. . = fn xn−1 ↽ − ↽− = fn xn−1 xn .. . ↽ − ↽− = f1 x0 x1 AD of Functional Programs Lecture Circuit 2009 37 / 73 A (Not So) Brief Tutorial on AD x1 = f1 x0 ↽ − ↽ − ↽ − x1 = λ x x0 ( f1 x0 x ) .. . xn = fn xn−1 ↽ − ↽ − ↽ − xn = λ x xn−1 ( f1 x0 x ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 38 / 73 A (Not So) Brief Tutorial on AD x1 = f1 x0 ↽ − ↽ − ↽ − x1 = λ x x0 ( f1 x0 x ) .. . xn = fn xn−1 ↽ − ↽ − ↽ − xn = λ x xn−1 ( f1 x0 x ) ↽ − ↽ − ↽ − (x1 , x1 ) = ((f1 x0 ), (λ x x0 ( f1 x0 x ))) .. . ↽ − ↽ − ↽ − (xn , xn ) = ((fn xn−1 ), (λ x xn−1 ( f1 x0 x ))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 39 / 73 A (Not So) Brief Tutorial on AD x1 = f1 x0 .. . xn = fn xn−1 ↼ −↼ ↼ − x− 1 = f1 x0 .. . ↼ ↼ − −− − xn = fn ↼ xn−1 ↼ x− ≡ (x, x) ↼ −↼ ↽ − ↽ − ↽ − f x− ≡ ((f x), (λ x x ( f x x ))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 40 / 73 A (Not So) Brief Tutorial on AD ↼ − ↽ − ↽ − ↽ − f x ≡ begin x := λ x x ( f x x ); (f x) end Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 41 / 73 A (Not So) Brief Tutorial on AD xLi := ui xRi xLi := bi (xRi , xSi ) ↽− ↽− x− x := λ[ ] begin xRi +:= (D ui ↼ Ri ) × xLi ; ↽− xLi := 0; x [ ] end ↼ − −:= u ↼ x x i Ri Li ↽− ↽− ↼ − x− x := λ[ ] begin xRi +:= (D1 bi (↼ Ri , xSi )) × xLi ; ↽− ↽− ↼ − x− xSi +:= (D2 bi (↼ Ri , xSi )) × xLi ; ↽− xLi := 0; x [ ] end ↼ x−) x−, ↼ x−:= b (↼ Li i Ri Si ↼ x− ≡ x Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 42 / 73 A (Not So) Brief Tutorial on AD xLi := ui xRi xLi := bi (xRi , xSi ) ↼ ↼ − x− Li := ui xRi .. . ↽ − ↽− xRi +:= (D ui ↼ x− Ri ) × xLi ; ↽− xLi := 0 ↼ − ↼ − x x− Li := bi (↼ Ri , xSi ) .. . ↽− ↽− ↼ − xRi +:= (D1 bi (↼ x− Ri , xSi )) × xLi ; ↽− ↽− ↼ − xSi +:= (D2 bi (↼ x− Ri , xSi )) × xLi ; ↽− xLi := 0 ↼ x− ≡ x Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 43 / 73 The Functional Reverse-Mode Transformation xL1 := u1 xS1 .. . xLn := un xSn Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 44 / 73 The Functional Reverse-Mode Transformation xL1 xLn .. . := u1 xS1 := un xSn Jeffrey Mark Siskind (Purdue/ECE) xL1 xLn ↽ − x1 ↽− xm ↽ − x S ↽−n xLn ↽− xS1 ↽− xL1 := u1 xS1 := := .. . un xSn 0 .. . := 0 ↽− +:= (D un xSn ) × xLn := 0 .. . ↽− +:= (D u1 xS1 ) × xL1 := =0 AD of Functional Programs Lecture Circuit 2009 45 / 73 The Functional Reverse-Mode Transformation xL1 xLn .. . := u1 xS1 := un xSn Jeffrey Mark Siskind (Purdue/ECE) xL1 xLn ↽ − x1 ↽− xm ↽ − x S ↽−n xLn ↽− xS1 ↽− xL1 := u1 xS1 := := .. . un xSn 0 .. . := 0 ↽− +:= (D un xSn ) × xLn := 0 .. . ↽− +:= (D u1 xS1 ) × xL1 := =0 AD of Functional Programs Lecture Circuit 2009 45 / 73 The Functional Reverse-Mode Transformation xL1 xLn .. . := u1 xS1 := un xSn Jeffrey Mark Siskind (Purdue/ECE) xL1 xLn ↽ − x1 ↽− xm ↽ − x S ↽−n xLn ↽− xS1 ↽− xL1 := u1 xS1 := := .. . un xSn 0 .. . := 0 ↽− +:= (D un xSn ) × xLn := 0 .. . ↽− +:= (D u1 xS1 ) × xL1 := =0 AD of Functional Programs Lecture Circuit 2009 45 / 73 The Functional Reverse-Mode Transformation xL1 xLn .. . := u1 xS1 := un xSn Jeffrey Mark Siskind (Purdue/ECE) xL1 xLn ↽ − x1 ↽− xm ↽ − x S ↽−n xLn ↽− xS1 ↽− xL1 := u1 xS1 := := .. . un xSn 0 .. . := 0 ↽− +:= (D un xSn ) × xLn := 0 .. . ↽− +:= (D u1 xS1 ) × xL1 := =0 AD of Functional Programs Lecture Circuit 2009 45 / 73 The Functional Reverse-Mode Transformation x1 .. . xn = u1 xS1 = un xSn Jeffrey Mark Siskind (Purdue/ECE) = x1 .. . xn = ↽ − x0 := .. . ↽−−− xn−1 := ↽− xSn +:= .. . ↽− xS1 +:= AD of Functional Programs u1 xS1 un xSn 0 0 ↽ − (D un xSn ) × xn ↽ − (D u1 xS1 ) × x1 Lecture Circuit 2009 46 / 73 The Functional Reverse-Mode Transformation x1 .. . xn = u1 xS1 = un xSn Jeffrey Mark Siskind (Purdue/ECE) = x1 .. . xn = ↽ − x0 := .. . ↽−−− xn−1 := ↽− xSn +:= .. . ↽− xS1 +:= AD of Functional Programs u1 xS1 un xSn 0 0 ↽ − (D un xSn ) × xn ↽ − (D u1 xS1 ) × x1 Lecture Circuit 2009 46 / 73 The Functional Reverse-Mode Transformation − ↽ − △ ↽ ui = λ x (D ui xSi ) × x x1 = u1 xS1 .. . xn = un xSn Jeffrey Mark Siskind (Purdue/ECE) x1 = .. . u1 xS1 xn = ↽ − x0 := .. . ↽−−− xn−1 := ↽− xSn +:= .. . ↽− xS1 +:= un xSn 0 AD of Functional Programs 0 ↽ − un xn ↽ − u1 x1 Lecture Circuit 2009 47 / 73 The Functional Reverse-Mode Transformation ↽ − ↽ − △ ↼ u− x = ((u x), (λ x (D u x) × x )) x1 = u1 xS1 .. . xn = un xSn Jeffrey Mark Siskind (Purdue/ECE) (x1 , x1 ) (xn , xn ) ↽ − x0 = .. . = := .. . ↼ u− 1 xS1 ↼ u− n xSn 0 ↽−−− xn−1 := 0 ↽ − ↽− xSn +:= xn xn .. . ↽− ↽ − xS1 +:= x1 x1 AD of Functional Programs Lecture Circuit 2009 48 / 73 The Functional Reverse-Mode Transformation ↽ − ↽ − △ ↼ u− x = ((u x), (λ x (D u x) × x )) x1 = u1 xS1 .. . xn = un xSn Jeffrey Mark Siskind (Purdue/ECE) (x1 , x1 ) (xn , xn ) ↽ − x0 = .. . = := .. . ↼ u− 1 xS1 ↼ u− n xSn 0 ↽−−− xn−1 := 0 ↽ − ↽− xSn +:= xn xn .. . ↽− ↽ − xS1 +:= x1 x1 AD of Functional Programs Lecture Circuit 2009 48 / 73 The Functional Reverse-Mode Transformation x1 = xR1 xS1 .. . xn = xRn xSn Jeffrey Mark Siskind (Purdue/ECE) (↼ x− 1 , x1 ) = .. . (↼ x− , x ) = n n ↽ − x0 := .. . ↽−−− xn−1 := ↽− xSn +:= .. . ↽− xS1 +:= AD of Functional Programs ↼ xR−1 ↼ x− S1 ↼ xR−n ↼ x− Sn 0 0 ↽ − xn xn ↽ − x1 x1 Lecture Circuit 2009 49 / 73 The Functional Reverse-Mode Transformation x1 = xR1 xS1 .. . xn = xRn xSn Jeffrey Mark Siskind (Purdue/ECE) (↼ x− 1 , x1 ) = .. . (↼ x− , x ) = n n ↽ − x0 := .. . ↽−−− xn−1 := ↽− xSn +:= .. . ↽− xS1 +:= AD of Functional Programs ↼ xR−1 ↼ x− S1 ↼ x− xR−n ↼ Sn 0 0 ↽ − xn xn ↽ − x1 x1 Lecture Circuit 2009 49 / 73 The Functional Reverse-Mode Transformation x1 = xR1 xS1 .. . xn = xRn xSn Jeffrey Mark Siskind (Purdue/ECE) (↼ x− 1 , x1 ) = .. . (↼ x− , x ) = n n ↽ − x0 := .. . ↽−−− xn−1 := ↽− xSn +:= .. . ↽− xS1 +:= AD of Functional Programs ↼ xR−1 ↼ x− S1 ↼ x− xR−n ↼ Sn 0 0 ↽ − xn xn ↽ − x1 x1 Lecture Circuit 2009 49 / 73 The Functional Reverse-Mode Transformation x1 = xR1 xS1 .. . xn = xRn xSn Jeffrey Mark Siskind (Purdue/ECE) (↼ x− 1 , x1 ) = .. . (↼ x− , x ) = n n ↽ − x0 := .. . ↽−−− xn−1 := ↽− xSn +:= .. . ↽− xS1 +:= AD of Functional Programs ↼ xR−1 ↼ x− S1 ↼ x− xR−n ↼ Sn 0 0 ↽ − xn xn ↽ − x1 x1 Lecture Circuit 2009 49 / 73 The Functional Reverse-Mode Transformation x1 = xR1 xS1 .. . xn = xRn xSn Jeffrey Mark Siskind (Purdue/ECE) (↼ x− 1 , x1 ) = .. . (↼ x− , x ) = n n ↽ − x0 := .. . ↽−−− xn−1 := ↽− xSn ⊕:= .. . ↽− xS1 ⊕:= AD of Functional Programs ↼ xR−1 ↼ x− S1 ↼ x− xR−n ↼ Sn ← −−1 ↼ x− 0 (J 0) ← − −−) xn−1 0 ( J −1 ↼ ↽ − xn xn ↽ − x1 x1 Lecture Circuit 2009 50 / 73 The Functional Reverse-Mode Transformation △ λx0 let x1 = xR1 xS1 ; .. . △ xn = xRn xSn in xn end Jeffrey Mark Siskind (Purdue/ECE) △ ↼ −↼ − ↼ − λ↼ x− 0 let ( x1 , x1 ) = xR1 xS1 ; .. . △ −↼ ↼ − ( xn , xn ) = ↼ xRn x− Sn ↽ − ↽ −△ ← − ↼ − x− in ( xn , (λ xn let x0 = 0 ( J −1 ↼ 0 ); .. . ↽−−− △ ← − −−); xn−1 xn−1 = 0 ( J −1 ↼ ↽ − ↽ − △ xSn ⊕ = xn xn ; .. . ↽− △ ↽ − xS1 ⊕ = x1 x1 ↽ − in x0 end)) end AD of Functional Programs Lecture Circuit 2009 51 / 73 The Functional Reverse-Mode Transformation △ λx0 let x1 = xR1 xS1 ; .. . △ xn = xRn xSn in xn end △ ↼ −↼ − ↼ − λ↼ x− 0 let ( x1 , x1 ) = xR1 xS1 ; .. . △ −↼ ↼ − ( xn , xn ) = ↼ xRn x− Sn ↽ − ↽ −△ ← − ↼ − x− in ( xn , (λ xn let x0 = 0 ( J −1 ↼ 0 ); .. . ↽−−− △ ← − −−); xn−1 xn−1 = 0 ( J −1 ↼ ↽ − ↽ − △ xSn ⊕ = xn xn ; .. . ↽− △ ↽ − xS1 ⊕ = x1 x1 ↽ − in x0 end)) end The above is a white lie. The truth is a lot more complicated. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 51 / 73 Modularity ∇f x Jeffrey Mark Siskind (Purdue/ECE) △ = ∂f (x) ∂f (x) , . . . , ∂x ∂x1 n AD of Functional Programs Lecture Circuit 2009 52 / 73 Modularity △ ∇f x = G RADIENT D ESCENT f x0 = Jeffrey Mark Siskind (Purdue/ECE) △ ∂f (x) ∂f (x) , . . . , ∂x ∂x1 n . . . xi+1 := . . . ∇ f xi . . . AD of Functional Programs Lecture Circuit 2009 52 / 73 Modularity △ ∇f x = G RADIENT D ESCENT f x0 = argmin f = Jeffrey Mark Siskind (Purdue/ECE) △ △ ∂f (x) ∂f (x) , . . . , ∂x ∂x1 n . . . xi+1 := . . . ∇ f xi . . . . . . G RADIENT D ESCENT f x0 . . . AD of Functional Programs Lecture Circuit 2009 52 / 73 Modularity △ ∇f x = G RADIENT D ESCENT f x0 = argmin f = N EUTRON F LUX r = Jeffrey Mark Siskind (Purdue/ECE) △ △ △ ∂f (x) ∂f (x) , . . . , ∂x ∂x1 n . . . xi+1 := . . . ∇ f xi . . . . . . G RADIENT D ESCENT f x0 . . . classified AD of Functional Programs Lecture Circuit 2009 52 / 73 Modularity △ ∇f x = G RADIENT D ESCENT f x0 = argmin f = N EUTRON F LUX r = D EVIATION r = Jeffrey Mark Siskind (Purdue/ECE) △ △ △ △ ∂f (x) ∂f (x) , . . . , ∂x ∂x1 n . . . xi+1 := . . . ∇ f xi . . . . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 AD of Functional Programs Lecture Circuit 2009 52 / 73 Modularity △ ∇f x = G RADIENT D ESCENT f x0 = argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ △ △ ∂f (x) ∂f (x) , . . . , ∂x ∂x1 n . . . xi+1 := . . . ∇ f xi . . . . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity △ ∇f x = G RADIENT D ESCENT f x0 = argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ △ △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) . . . xi+1 := . . . ∇ f xi . . . . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = G RADIENT D ESCENT f x0 = argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ △ △ △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) . . . xi+1 := . . . ∇ f xi . . . . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ △ . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ △ . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = D EVIATION r = △ ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 r∗ = △ −−−−−−−⇀ argmin D EVIATION △ classified Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = D EVIATION r = D EVIATION r∗ △ △ ADIFOR △ = classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ − ⇀ − ⇀ ⇁ ⇁ ( f x ⊲− e1 ), . . . , ( f x ⊲ − en ) − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity − ⇀ ∇ f x = △ ↼ − ... f x... − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ G RADIENT D ESCENT f x0 = △ − ⇀ . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ G RADIENT D ESCENT f x0 = △ ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... ↼ − G RADIENT D ESCENT f x0 = △ ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ − ⇀ . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... ↼ − G RADIENT D ESCENT f x0 = △ ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ argmin f = △ ↼ − . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... ↼ − G RADIENT D ESCENT f x0 = △ ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − argmin f = △ ↼ − . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION −−−−−−−⇀ argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... ↼ − G RADIENT D ESCENT f x0 = △ ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − argmin f = △ ↼ − . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified −−−−−−−−−−⇀ N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 −−−−−−−⇀ D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... ↼ − G RADIENT D ESCENT f x0 = △ ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − argmin f = △ ↼ − . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ ADIFOR △ = ADIFOR △ = classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... ↼ − G RADIENT D ESCENT f x0 = △ ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − argmin f = △ ↼ − . . . G RADIENT D ESCENT f x0 . . . N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ TAPENADE △ = TAPENADE △ = classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x △ ↼ − ... f x... △ ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . ↼ − . . . G RADIENT D ESCENT f x0 . . . = ↼ − G RADIENT D ESCENT f x0 ↼ − N EWTONS M ETHOD f x0 ↼ − argmin f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ = △ = △ △ TAPENADE △ = TAPENADE △ = classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x △ ↼ − ... f x... △ ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . ↼ − . . . N EWTONS M ETHOD f x0 . . . = ↼ − G RADIENT D ESCENT f x0 ↼ − N EWTONS M ETHOD f x0 ↼ − argmin f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ = △ = △ △ TAPENADE △ = TAPENADE △ = classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = Hf x = △ △ ↼ − G RADIENT D ESCENT f x0 ↼ − N EWTONS M ETHOD f x0 ↼ − argmin f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ ↼ − ... f x... △ = △ = △ △ TAPENADE △ = TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . ↼ − . . . N EWTONS M ETHOD f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... Hf x = △ − ⇀ ↼ − ... f ...x... ↼ − G RADIENT D ESCENT f x0 ↼ − N EWTONS M ETHOD f x0 ↼ − argmin f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ = △ = △ △ TAPENADE △ = TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . ↼ − . . . N EWTONS M ETHOD f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ ↼ − H f x = △ − ⇀ ↼ − ... f ...x... ↼ − G RADIENT D ESCENT f x0 ↼ − N EWTONS M ETHOD f x0 ↼ − argmin f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ = △ = △ △ TAPENADE △ = TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . ↼ − . . . N EWTONS M ETHOD f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ ↼ − H f x = △ − ⇀ ↼ − ... f ...x... ↼ − G RADIENT D ESCENT f x0 ↼ − N EWTONS M ETHOD f x0 ↼ − argmin f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ = △ = △ △ TAPENADE △ = TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ ↼ − ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . ↼ − . . . N EWTONS M ETHOD f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ ↼ − H f x = △ − ⇀ ↼ − ... f ...x... ↼ − G RADIENT D ESCENT f x0 − ⇀ ↼ −↼ − N EWTONS M ETHOD f f x0 ↼ − argmin f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ = △ = △ △ TAPENADE △ = TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ ↼ − ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . ↼ − . . . N EWTONS M ETHOD f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ ↼ − H f x = △ − ⇀ ↼ − ... f ...x... ↼ − G RADIENT D ESCENT f x0 − ⇀ ↼ −↼ − N EWTONS M ETHOD f f x0 ↼ − argmin f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ = △ = △ △ TAPENADE △ = TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ ↼ − ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . − ⇀ ↼ −↼ − . . . N EWTONS M ETHOD f f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ ↼ − H f x = △ − ⇀ ↼ − ... f ...x... ↼ − G RADIENT D ESCENT f x0 − ⇀ ↼ −↼ − N EWTONS M ETHOD f f x0 − ⇀ ↼ −↼ − argmin f f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ = △ = △ △ TAPENADE △ = TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ ↼ − ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . − ⇀ ↼ −↼ − . . . N EWTONS M ETHOD f f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION ↼−−−−−−− argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ ↼ − H f x = △ − ⇀ ↼ − ... f ...x... ↼ − G RADIENT D ESCENT f x0 − ⇀ ↼ −↼ − N EWTONS M ETHOD f f x0 − ⇀ ↼ −↼ − argmin f f = N EUTRON F LUX r = N EUTRON F LUX D EVIATION r D EVIATION r∗ △ = △ = △ △ TAPENADE △ = TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ ↼ − ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . − ⇀ ↼ −↼ − . . . N EWTONS M ETHOD f f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION − ⇀ ↼ −− −− −− −− −− − − ↼−−−−−−− − argmin D EVIATION D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Breaking Modularity ↼ − ∇ f x = △ ↼ − ... f x... − ⇀ ↼ − H f x = △ − ⇀ ↼ − ... f ...x... ↼ − G RADIENT D ESCENT f x0 − ⇀ ↼ −↼ − N EWTONS M ETHOD f f x0 − ⇀ ↼ −↼ − argmin f f = N EUTRON F LUX r = N EUTRON F LUX ↼−−−−−−−−−− N EUTRON F LUX D EVIATION r D EVIATION ↼−−−−−−− D EVIATION r∗ △ = △ = △ △ TAPENADE TAPENADE △ = TAPENADE TAPENADE △ = ↼ − . . . xi+1 := . . . ∇ f xi . . . − ⇀ ↼ − ↼ − . . . xi+1 := . . . ∇ f xi . . . H f xi . . . − ⇀ ↼ −↼ − . . . N EWTONS M ETHOD f f x0 . . . classified ↼−−−−−−−−−− N EUTRON F LUX − − ⇀ ↼ −− −− −− −− −− −− −− −− − − N EUTRON F LUX ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 ↼−−−−−−− D EVIATION −− ⇀ ↼ −− −− −− −− −− − − D EVIATION − ⇀ ↼ −− −− −− −− −− − − ↼−−−−−−− − argmin D EVIATION D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Restoring Modularity △ ∇f x = Hf x = G RADIENT D ESCENT f x0 = N EWTONS M ETHOD f x0 △ △ △ = △ argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ . . . xi+1 := . . . ∇ f xi . . . . . . xi+1 := . . . ∇ f xi . . . H f xi . . . . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Restoring Modularity △ ∇f x = Hf x = G RADIENT D ESCENT f x0 = N EWTONS M ETHOD f x0 → − → − ⇁ ⇁ (( J f ) x ⊲ − e1 ), . . . , (( J f ) x ⊲ − en ) △ △ △ = △ argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ . . . xi+1 := . . . ∇ f xi . . . . . . xi+1 := . . . ∇ f xi . . . H f xi . . . . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Restoring Modularity △ ∇f x = Hf x = G RADIENT D ESCENT f x0 = N EWTONS M ETHOD f x0 ← − . . . (J f ) x . . . △ △ △ = △ argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ . . . xi+1 := . . . ∇ f xi . . . . . . xi+1 := . . . ∇ f xi . . . H f xi . . . . . . G RADIENT D ESCENT f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Restoring Modularity △ ← − . . . (J f ) x . . . △ → − ← − . . . ( J ( J f )) . . . x . . . ∇f x = Hf x = G RADIENT D ESCENT f x0 = N EWTONS M ETHOD f x0 △ △ = △ argmin f = N EUTRON F LUX r = D EVIATION r = r∗ = △ △ △ . . . xi+1 := . . . ∇ f xi . . . . . . xi+1 := . . . ∇ f xi . . . H f xi . . . . . . N EWTONS M ETHOD f x0 . . . classified ((N EUTRON F LUX r) − N EUTRON F LUXcritical )2 argmin D EVIATION Fermi, E. (1946). The Development of the first chain reaction pile. Proceedings of the American Philosophy Society, 90:20–4. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 52 / 73 Having your cake and eating it too Convenient Fast Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 53 / 73 Having your cake and eating it too Convenient D formulated as a higher-order function in the language Fast Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 53 / 73 Having your cake and eating it too Convenient D formulated as a higher-order function in the language no arbitrary restrictions applies to all data types and constructs in the language, including code produced by D and even D itself Fast Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 53 / 73 Having your cake and eating it too Convenient D formulated as a higher-order function in the language no arbitrary restrictions applies to all data types and constructs in the language, including code produced by D and even D itself higher-order derivatives (D (D f)) Fast Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 53 / 73 Having your cake and eating it too Convenient D formulated as a higher-order function in the language no arbitrary restrictions applies to all data types and constructs in the language, including code produced by D and even D itself higher-order derivatives (D (D f)) nesting (D (lambda (. . .) . . . (D (lambda (. . .) . . .)) . . .)) Fast Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 53 / 73 Having your cake and eating it too Convenient D formulated as a higher-order function in the language no arbitrary restrictions applies to all data types and constructs in the language, including code produced by D and even D itself higher-order derivatives (D (D f)) nesting (D (lambda (. . .) . . . (D (lambda (. . .) . . .)) . . .)) Fast D implemented by reflective transformation of environments and code associated with closures Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 53 / 73 Having your cake and eating it too Convenient D formulated as a higher-order function in the language no arbitrary restrictions applies to all data types and constructs in the language, including code produced by D and even D itself higher-order derivatives (D (D f)) nesting (D (lambda (. . .) . . . (D (lambda (. . .) . . .)) . . .)) Fast D implemented by reflective transformation of environments and code associated with closures compile away reflection with partial evaluation implemented by flow analysis Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 53 / 73 Monovariant Flow Analysis: 0-CFA (define (D f) . . .) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f) . . .) (D (lambda (x) 2x3 )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 )) . . .) (D (lambda (x) 2x3 )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 )) . . .:(λx 6x2 )) (D (lambda (x) 2x3 )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 )) . . .:(λx 6x2 )) (D (lambda (x) 2x3 )):(λx 6x2 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 )) . . .:(λx 6x2 )) (D (lambda (x) 2x3 )):(λx 6x2 ) (D (lambda (x) 3x4 )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 ) ∪ (λx 3x4 )) . . .:(λx 6x2 )) (D (lambda (x) 2x3 )):(λx 6x2 ) (D (lambda (x) 3x4 )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 ) ∪ (λx 3x4 )) . . .:(λx 6x2 ) ∪ (λx 12x3 )) (D (lambda (x) 2x3 )):(λx 6x2 ) (D (lambda (x) 3x4 )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 ) ∪ (λx 3x4 )) . . .:(λx 6x2 ) ∪ (λx 12x3 )) (D (lambda (x) 2x3 )):(λx 6x2 ) (D (lambda (x) 3x4 )):(λx 12x3 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 ) ∪ (λx 3x4 )) . . .:(λx 6x2 ) ∪ (λx 12x3 )) (D (lambda (x) 2x3 )):(λx 6x2 ) ∪ (λx 12x3 ) (D (lambda (x) 3x4 )):(λx 6x2 ) ∪ (λx 12x3 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx 2x3 ) ∪ (λx 3x4 )) . . .:(λx 6x2 ) ∪ (λx 12x3 )) (D (lambda (x) 2x3 )):(λx 6x2 ) ∪ (λx 12x3 ) (D (lambda (x) 3x4 )):(λx 6x2 ) ∪ (λx 12x3 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f) . . .) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f) . . .) (D (D (lambda (x) e2x ))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx e2x )) . . .) (D (D (lambda (x) e2x ))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx e2x )) . . .:(λx 2e2x )) (D (D (lambda (x) e2x ))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx e2x )) . . .:(λx 2e2x )) (D (D (lambda (x) e2x )):(λx 2e2x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx e2x ) ∪ (λx 2e2x )) . . .:(λx 2e2x )) (D (D (lambda (x) e2x )):(λx 2e2x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx e2x ) ∪ (λx 2e2x )) . . .:(λx 2e2x ) ∪ (λx 4e2x )) (D (D (lambda (x) e2x )):(λx 2e2x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx e2x ) ∪ (λx 2e2x )) . . .:(λx 2e2x ) ∪ (λx 4e2x )) (D (D (lambda (x) e2x )):(λx 2e2x ) ∪ (λx 4e2x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx e2x ) ∪ (λx 2e2x ) ∪ (λx 4e2x )) . . .:(λx 2e2x ) ∪ (λx 4e2x )) (D (D (lambda (x) e2x )):(λx 2e2x ) ∪ (λx 4e2x )) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Monovariant Flow Analysis: 0-CFA (define (D f:(λx e2x ) ∪ (λx 2e2x ) ∪ (λx 4e2x ) ∪ . . .) . . .:(λx 2e2x ) ∪ (λx 4e2x ) ∪ . . .) (D (D (lambda (x) e2x )):(λx 2e2x ) ∪ (λx 4e2x ) ∪ . . .) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 54 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (D f) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (D f) . . .) (define (g . . .) . . . (D (lambda (x) 2x3 )) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f) . . .) (define (g . . .) . . . (D (lambda (x) 2x3 )) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f:(λx 2x3 )) . . .) (define (g . . .) . . . (D (lambda (x) 2x3 )) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f:(λx 2x3 )) . . .:(λx 6x2 )) (define (g . . .) . . . (D (lambda (x) 2x3 )) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f:(λx 2x3 )) . . .:(λx 6x2 )) (define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f:(λx 2x3 )) . . .:(λx 6x2 )) (define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .) (define (h . . .) . . . (D (lambda (x) 3x4 )) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f:(λx 2x3 )) . . .:(λx 6x2 )) (define (Dh f) . . .) (define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .) (define (h . . .) . . . (D (lambda (x) 3x4 )) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f:(λx 2x3 )) . . .:(λx 6x2 )) (define (Dh f:(λx 3x4 )) . . .) (define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .) (define (h . . .) . . . (D (lambda (x) 3x4 )) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f:(λx 2x3 )) . . .:(λx 6x2 )) (define (Dh f:(λx 3x4 )) . . .:(λx 12x3 )) (define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .) (define (h . . .) . . . (D (lambda (x) 3x4 )) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define (Dg f:(λx 2x3 )) . . .:(λx 6x2 )) (define (Dh f:(λx 3x4 )) . . .:(λx 12x3 )) (define (g . . .) . . . (D (lambda (x) 2x3 )):(λx 6x2 ) . . .) (define (h . . .) . . . (D (lambda (x) 3x4 )):(λx 12x3 ) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define ((compose n f) x) (if (zero? n) x ((compose (- n 1) f) (f x)))) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define ((compose n f) x) (if (zero? n) x ((compose (- n 1) f) (f x)))) ((compose k D) g) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define ((compose n f) x) (if (zero? n) x ((compose (- n 1) f) (f x)))) ((compose k D) g) (define (Dcompose f:g) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define ((compose n f) x) (if (zero? n) x ((compose (- n 1) f) (f x)))) ((compose k D) g) (define (Dcompose f:g) . . .) (define (Dcompose:compose f:g′ ) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define ((compose n f) x) (if (zero? n) x ((compose (- n 1) f) (f x)))) ((compose k D) g) (define (Dcompose f:g) . . .) (define (Dcompose:compose f:g′ ) . . .) (define (Dcompose:compose:compose f:g′′ ) . . .) Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis: k-CFA with Bounded Context Sensitivity (define ((compose n f) x) (if (zero? n) x ((compose (- n 1) f) (f x)))) ((compose k D) g) (define (Dcompose f:g) . . .) (define (Dcompose:compose f:g′ ) . . .) (define (Dcompose:compose:compose f:g′′ ) . . .) . . . Shivers, III, O. G. (1991). Control-Flow Analysis of Higher-Order Languages or Taming Lambda. Ph.D. thesis, CMU. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 55 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Memoize E indexed (by suitable equivalence relations on) e and σ. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Memoize E indexed (by suitable equivalence relations on) e and σ. Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Memoize E indexed (by suitable equivalence relations on) e and σ. Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs. Is suitable for F ORTRAN-like programs. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Memoize E indexed (by suitable equivalence relations on) e and σ. Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs. Is suitable for F ORTRAN-like programs. Necessary for migrating reflective source-code transformation to compile time. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Memoize E indexed (by suitable equivalence relations on) e and σ. Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs. Is suitable for F ORTRAN-like programs. Necessary for migrating reflective source-code transformation to compile time. Side benefit: union-free Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Memoize E indexed (by suitable equivalence relations on) e and σ. Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs. Is suitable for F ORTRAN-like programs. Necessary for migrating reflective source-code transformation to compile time. Side benefit: union-free No tags, tag checking, tag dispatching, indirect calls Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Memoize E indexed (by suitable equivalence relations on) e and σ. Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs. Is suitable for F ORTRAN-like programs. Necessary for migrating reflective source-code transformation to compile time. Side benefits: union-free, no cyclic abstract values No tags, tag checking, tag dispatching, indirect calls Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Polyvariant Flow Analysis with Unbounded Context Sensitivity E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei σ ::= {x1 7→ v1 , . . .} E :e×σ →v v ::= #t | #f | () | R | (v1 , v2 ) | hσ, ei | R σ ::= {x1 7→ v1 , . . .} Memoize E indexed (by suitable equivalence relations on) e and σ. Not suitable for arbitrary (i.e., typical S CHEME, ML, H ASKELL, etc.) programs. Is suitable for F ORTRAN-like programs. Necessary for migrating reflective source-code transformation to compile time. Side benefits: union-free, no cyclic abstract values No tags, tag checking, tag dispatching, indirect calls Allows complete unboxing: no allocation, reclamation, indirection Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 56 / 73 Game Theory b1 . . . a1 .. . A ai .. . B bj . . . bn .. . . . . . PAYOFF(ai , bj ) . . . .. .. . . .. am von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 57 / 73 Game Theory b1 . . . a1 .. . A ai .. . B bj . . . bn .. . . . . . PAYOFF(ai , bj ) . . . .. .. . . .. am max min PAYOFF(a, b) a∈A b∈B von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 57 / 73 Game Theory ... Rm .. . a .. . Rn b ... .. . . . . . PAYOFF(a, b) . . . .. .. . . .. max min PAYOFF(a, b) a∈Rm b∈Rn von Neumann, J. and Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press, Princeton, NJ. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 57 / 73 Cathode Ray Tubes Path of Charged Particle 10 potential: p(x; w) = 8 6 4 ẍ(t) = − ∇x p(x)|x=x(t) ẋ(t + ∆t) = ẋ(t) + ∆t ẍ(t) x(t + ∆t) = x(t) + ∆t ẋ(t) When: x1 (t + ∆t) ≤ 0 let: ∆tf = −x1 (t)/ẋ1 (t) tf = t + ∆tf x(tf ) = x(t) + ∆tf ẋ(t) Error: E(w) = 2 (0) w(1)=0 w =-0.272 (2) w(3)=-0.267 w(4)=-0.266 w =-0.266 0 0 2 4 6 8 kx − (10, 10 − w)k−1 + kx − (10, 0)k−1 Find: x0 (tf )2 argmin E(w) w 10 Sprague, C. S. and George, R. H. (1939). Cathode Ray Deflecting Electrode. US Patent 2,161,437. George, R. H. (1940). Cathode Ray Tube. US Patent 2,222,942. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 58 / 73 Performance Comparison VLAD S TALIN∇ F ORTRAN ADIFOR C ++ ML H ASKELL S CHEME TAPENADE FADBAD ++ MLTON OC AML SML / NJ GHC B IGLOO C HICKEN G AMBIT I KARUS L ARCENY MIT S CHEME MZC M Z S CHEME S CHEME ->C SCMUTILS S TALIN FF 1.00 1.52 3.40 65.69 53.89 160.50 106.21 165.22 505.90 1120.37 444.13 192.07 726.59 1472.26 2073.26 2344.70 391.42 3321.20 208.10 particle FR RF 1.00 1.00 RR 1.00 88.88 340.35 182.45 16.08 147.91 105.04 28.06 263.66 185.15 761.40 2026.31 752.63 312.28 1108.18 2500.00 3434.64 4076.16 605.26 104.81 425.60 138.34 61.79 144.55 309.66 340.30 409.95 109.77 228.56 1872.85 256.30 114.87 270.14 591.36 655.83 843.68 198.43 366.08 51.84 91.86 FF 1.00 2.07 2.56 22.44 40.39 107.71 84.38 121.18 423.69 889.58 362.65 158.88 571.81 1243.26 2436.26 2000.89 324.95 2800.71 166.96 saddle FR RF 1.00 1.00 RR 1.00 51.21 156.33 106.01 1.86 6.75 3.55 2.67 13.51 6.31 440.25 1144.65 420.48 205.97 613.65 1428.57 1996.40 2332.43 328.84 15.77 35.73 14.08 6.75 19.14 51.36 72.45 80.78 12.74 24.59 68.94 23.87 11.40 29.77 79.10 150.02 134.00 18.28 212.93 7.68 11.40 not implemented but could implement not implemented in existing tool can’t implement Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 59 / 73 Gradient-Based Optimization (define (e i n) (if (zero? n) ’() (cons (if (zero? i) 1.0 0.0) (e (- i 1) (- n 1))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 60 / 73 Gradient-Based Optimization (define (e i n) (if (zero? n) ’() (cons (if (zero? i) 1.0 0.0) (e (- i 1) (- n 1))))) (define ((gradient f) x) (let ((n (length x))) (map (lambda (i) (tangent ((j* f) (bundle x (e i n))))) (iota n)))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 60 / 73 Gradient-Based Optimization (define (e i n) (if (zero? n) ’() (cons (if (zero? i) 1.0 0.0) (e (- i 1) (- n 1))))) (define ((gradient f) x) (let ((n (length x))) (map (lambda (i) (tangent ((j* f) (bundle x (e i n))))) (iota n)))) (define (gradient-ascent f x0 n eta) (if (zero? n) (list x0 (f x0) ((gradient f) x0)) (gradient-ascent f (zip (lambda (xi gi) (+ xi (* eta gi))) x0 ((gradient f) x0)) (- n 1) eta))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 60 / 73 Gradient-Based Optimization (define ((gradient f) x) (cdr ((cdr ((*j f) (*j x))) 1.0))) (define (gradient-ascent f x0 n eta) (if (zero? n) (list x0 (f x0) ((gradient f) x0)) (gradient-ascent f (zip (lambda (xi gi) (+ xi (* eta gi))) x0 ((gradient f) x0)) (- n 1) eta))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 60 / 73 Neural Networks p r q p 0 0 1 1 q 0 1 0 1 r 0 1 1 0 Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323:533–6. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 61 / 73 Neural Networks in VLAD (define ((sum-activities activities) bias ws) ((fold + bias) (zip * ws activities))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 62 / 73 Neural Networks in VLAD (define ((sum-activities activities) bias ws) ((fold + bias) (zip * ws activities))) (define (sum-layer activities ws-layer) (map (sum-activities activities) ws-layer)) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 62 / 73 Neural Networks in VLAD (define ((sum-activities activities) bias ws) ((fold + bias) (zip * ws activities))) (define (sum-layer activities ws-layer) (map (sum-activities activities) ws-layer)) (define (sigmoid x) (/ 1 (+ (exp (- 0 x)) 1))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 62 / 73 Neural Networks in VLAD (define ((sum-activities activities) bias ws) ((fold + bias) (zip * ws activities))) (define (sum-layer activities ws-layer) (map (sum-activities activities) ws-layer)) (define (sigmoid x) (/ 1 (+ (exp (- 0 x)) 1))) (define ((forward-pass ws-layers) in) (if (null? ws-layers) in ((forward-pass (cdr ws-layers)) (map sigmoid (sum-layer in (car ws-layers)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 62 / 73 Neural Networks in VLAD (define ((sum-activities activities) bias ws) ((fold + bias) (zip * ws activities))) (define (sum-layer activities ws-layer) (map (sum-activities activities) ws-layer)) (define (sigmoid x) (/ 1 (+ (exp (- 0 x)) 1))) (define ((forward-pass ws-layers) in) (if (null? ws-layers) in ((forward-pass (cdr ws-layers)) (map sigmoid (sum-layer in (car ws-layers)))))) (define ((error-on-dataset dataset) ws-layers) ((fold + 0) (map (lambda ((list in target)) (* 0.5 (magnitude-squared (v- ((forward-pass ws-layers) in) target)))) dataset))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 62 / 73 Neural Networks in VLAD (define ((sum-activities activities) bias ws) ((fold + bias) (zip * ws activities))) (define (sum-layer activities ws-layer) (map (sum-activities activities) ws-layer)) (define (sigmoid x) (/ 1 (+ (exp (- 0 x)) 1))) (define ((forward-pass ws-layers) in) (if (null? ws-layers) in ((forward-pass (cdr ws-layers)) (map sigmoid (sum-layer in (car ws-layers)))))) (define ((error-on-dataset dataset) ws-layers) ((fold + 0) (map (lambda ((list in target)) (* 0.5 (magnitude-squared (v- ((forward-pass ws-layers) in) target)))) dataset))) (gradient-descent (error-on-dataset ’(((0 0) (0)) ((0 1) (1)) ((1 0) (1)) ((1 1) (0)))) ’(((0 -0.284227 1.16054) (0 0.617194 1.30467)) ((0 -0.084395 0.648461))) 1000.0 0.3) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 62 / 73 Performance Comparison VLAD S TALIN∇ F ORTRAN ADIFOR TAPENADE C ADIC C ++ ADOL – C C PPAD FADBAD ++ ML MLTON OC AML SML / NJ H ASKELL S CHEME GHC B IGLOO C HICKEN G AMBIT I KARUS L ARCENY MIT S CHEME MZC M Z S CHEME S CHEME ->C SCMUTILS S TALIN backprop Fs Fv 1.00 11.84 2.68 11.35 4.33 16.33 3.93 12.34 3.89 42.15 98.96 33.15 73.94 157.75 142.71 577.45 1391.75 545.20 216.42 955.98 1900.04 2439.93 3477.86 484.24 4544.48 832.68 R 1.00 6.24 35.53 23.69 53.03 37.94 149.14 94.97 306.60 971.91 341.73 147.49 486.64 1141.22 1571.52 1866.28 233.75 367.84 not implemented but could implement not implemented in existing tool can’t implement Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 63 / 73 Probabilistic Lambda Calculus P = if x0 then 0 else if x1 then 1 else 2 Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian Inference for Stochastic Programs. Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 740–7. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 64 / 73 Probabilistic Lambda Calculus P = if x0 then 0 else if x1 then 1 else 2 Pr(x0 → 7 true) = p0 Pr(x1 → 7 true) = p1 Pr(x0 → 7 false) = 1 − p0 Pr(x1 → 7 false) = 1 − p1 Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian Inference for Stochastic Programs. Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 740–7. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 64 / 73 Probabilistic Lambda Calculus P = if x0 then 0 else if x1 then 1 else 2 Pr(x0 → 7 true) = p0 Pr(x1 → 7 true) = p1 Pr(x0 → 7 false) = 1 − p0 Pr(x1 → 7 false) = 1 − p1 Pr(E(P) = 0|p0 , p1 ) = p0 Pr(E(P) = 1|p0 , p1 ) = (1 − p0 )p1 Pr(E(P) = 2|p0 , p1 ) = (1 − p0 )(1 − p1 ) Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian Inference for Stochastic Programs. Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 740–7. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 64 / 73 Probabilistic Lambda Calculus P = if x0 then 0 else if x1 then 1 else 2 Pr(x0 → 7 true) = p0 Pr(x1 → 7 true) = p1 Pr(x0 → 7 false) = 1 − p0 Pr(x1 → 7 false) = 1 − p1 Pr(E(P) = 0|p0 , p1 ) = p0 Pr(E(P) = 1|p0 , p1 ) = (1 − p0 )p1 Pr(E(P) = 2|p0 , p1 ) = (1 − p0 )(1 − p1 ) Y Pr(E(P) = v|p0 , p1 ) = p0 (1 − p0 )3 p1 (1 − p1 )2 v∈{0,1,2,2} Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian Inference for Stochastic Programs. Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 740–7. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 64 / 73 Probabilistic Lambda Calculus P = if x0 then 0 else if x1 then 1 else 2 Pr(x0 → 7 true) = p0 Pr(x1 → 7 true) = p1 Pr(x0 → 7 false) = 1 − p0 Pr(x1 → 7 false) = 1 − p1 Pr(E(P) = 0|p0 , p1 ) = p0 Pr(E(P) = 1|p0 , p1 ) = (1 − p0 )p1 Pr(E(P) = 2|p0 , p1 ) = (1 − p0 )(1 − p1 ) Y Pr(E(P) = v|p0 , p1 ) = p0 (1 − p0 )3 p1 (1 − p1 )2 v∈{0,1,2,2} argmax p0 ,p1 Y Pr(E(P) = v|p0 , p1 ) = v∈{0,1,2,2} 1 1 , 4 3 Koller, D., McAllester, D. , and Pfeffer, A. (1997). Effective Bayesian Inference for Stochastic Programs. Proceedings of the 14th National Conference on Artificial Intelligence (AAAI), pp. 740–7. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 64 / 73 Probabilistic Prolog p(0). p(X):-q(X). q(1). q(2). Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 65 / 73 Probabilistic Prolog Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 65 / 73 Probabilistic Prolog Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 Pr(?-p(0).) = p0 Pr(?-p(1).) = (1 − p0 )p1 Pr(?-p(2).) = (1 − p0 )(1 − p1 ) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 65 / 73 Probabilistic Prolog Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 Pr(?-p(0).) = p0 Pr(?-p(1).) = (1 − p0 )p1 Pr(?-p(2).) = (1 − p0 )(1 − p1 ) Y Pr(?-q.) = p0 (1 − p0 )3 p1 (1 − p1 )2 q∈{p(0),p(1),p(2),p(2)} Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 65 / 73 Probabilistic Prolog Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 Pr(?-p(0).) = p0 Pr(?-p(1).) = (1 − p0 )p1 Pr(?-p(2).) = (1 − p0 )(1 − p1 ) Y Pr(?-q.) = p0 (1 − p0 )3 p1 (1 − p1 )2 q∈{p(0),p(1),p(2),p(2)} argmax p0 ,p1 Y Pr(?-q.) = q∈{p(0),p(1),p(2),p(2)} Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs 1 1 , 4 3 Lecture Circuit 2009 65 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (define (evaluate expression environment) (cond ((constant-expression? expression) (singleton-tagged-distribution (constant-expression-value expression))) ((variable-access-expression? expression) (lookup-value (variable-access-expression-variable expression) environment)) ((lambda-expression? expression) (singleton-tagged-distribution (lambda (tagged-distribution) (evaluate (lambda-expression-body expression) (cons (make-binding (lambda-expression-variable expression) tagged-distribution) environment))))) (else (let ((tagged-distribution (evaluate (application-argument expression) environment))) (map-tagged-distribution (lambda (value) (value tagged-distribution)) (evaluate (application-callee expression) environment)))))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 66 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Lambda Calculus (gradient-ascent (lambda (p) (let ((tagged-distribution (evaluate if x0 then 0 else if x1 then 1 else 2 (list Pr(x0 7→ true) = p0 Pr(x0 7→ false) = 1 − p0 Pr(x1 7→ true) = p1 Pr(x1 7→ false) = 1 − p1 . . .)))) (map-reduce * 1.0 (lambda (value) (likelihood value tagged-distribution)) ’(0 1 2 2)))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 67 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (define (proof-distribution term clauses) (let ((offset . . .)) (map-reduce append ’() (lambda (clause) (let ((clause (alpha-rename clause offset))) (let loop ((p (clause-p clause)) (substitution (unify term (clause-term clause))) (terms (clause-terms clause))) (if (boolean? substitution) ’() (if (null? terms) (list (make-double p substitution)) (map-reduce append ’() (lambda (double) (loop (* p (double-p double)) (append substitution (double-substitution double)) (rest terms))) (proof-distribution (apply-substitution substitution (first terms)) clauses))))))) clauses))) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 68 / 73 Probabilistic Prolog (gradient-ascent (lambda (p) (let ((clauses (list Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 ))) (map-reduce * 1.0 (lambda (query) (likelihood (proof-distribution query clauses))) ’(p(0) p(1) p(2) p(2))))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 69 / 73 Probabilistic Prolog (gradient-ascent (lambda (p) (let ((clauses (list Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 ))) (map-reduce * 1.0 (lambda (query) (likelihood (proof-distribution query clauses))) ’(p(0) p(1) p(2) p(2))))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 69 / 73 Probabilistic Prolog (gradient-ascent (lambda (p) (let ((clauses (list Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 ))) (map-reduce * 1.0 (lambda (query) (likelihood (proof-distribution query clauses))) ’(p(0) p(1) p(2) p(2))))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 69 / 73 Probabilistic Prolog (gradient-ascent (lambda (p) (let ((clauses (list Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 ))) (map-reduce * 1.0 (lambda (query) (likelihood (proof-distribution query clauses))) ’(p(0) p(1) p(2) p(2))))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 69 / 73 Probabilistic Prolog (gradient-ascent (lambda (p) (let ((clauses (list Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 ))) (map-reduce * 1.0 (lambda (query) (likelihood (proof-distribution query clauses))) ’(p(0) p(1) p(2) p(2))))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 69 / 73 Probabilistic Prolog (gradient-ascent (lambda (p) (let ((clauses (list Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 ))) (map-reduce * 1.0 (lambda (query) (likelihood (proof-distribution query clauses))) ’(p(0) p(1) p(2) p(2))))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 69 / 73 Probabilistic Prolog (gradient-ascent (lambda (p) (let ((clauses (list Pr(p(0).) = p0 Pr(p(X):-q(X).) = 1 − p0 Pr(q(1).) = p1 Pr(q(2).) = 1 − p1 ))) (map-reduce * 1.0 (lambda (query) (likelihood (proof-distribution query clauses))) ’(p(0) p(1) p(2) p(2))))) ’(0.5 0.5) 1000.0 0.1) Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 69 / 73 Generated Code static void f2679(double a_f2679_0,double a_f2679_1,double a_f2679_2,double a_f2679_3){ int t272381=((a_f2679_2==0.)?0:1); double t272406; double t272405; double t272404; double t272403; double t272402; if((t272381==0)){ double t272480=(1.-a_f2679_0); double t272572=(1.-a_f2679_1); double t273043=(a_f2679_0+0.); double t274185=(t272480*a_f2679_1); double t274426=(t274185+0.); double t275653=(t272480*t272572); double t275894=(t275653+0.); double t277121=(t272480*t272572); double t277362=(t277121+0.); double t277431=(t277362*1.); double t277436=(t275894*t277431); double t277441=(t274426*t277436); double t277446=(t273043*t277441); ... double t1777107=(t1774696+t1715394); double t1777194=(0.-t1745420); double t1778533=(t1777194+t1419700); t272406=a_f2679_0; t272405=a_f2679_1; t272404=t277446; t272403=t1778533; t272402=t1777107;} else {. . .} r_f2679_0=t272406; r_f2679_1=t272405; r_f2679_2=t272404; r_f2679_3=t272403; r_f2679_4=t272402;} Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 70 / 73 Performance Comparison VLAD ML H ASKELL S CHEME S TALIN∇ MLTON OC AML SML / NJ probabilisticlambda-calculus F R 1.00 1.00 106.45 124.95 215.73 538.68 197.75 272.45 probabilisticprolog F R 1.00 1.00 789.41 483.47 1207.13 1534.61 2448.02 1471.94 832.92 2305.98 879.88 437.46 1651.01 3491.10 5289.17 6235.78 682.15 6456.99 1240.73 14422.16 66948.70 24316.03 8242.92 25589.62 85819.57 154206.95 166129.12 10530.66 80100.23 22511.79 GHC B IGLOO C HICKEN G AMBIT I KARUS L ARCENY MIT S CHEME MZC M Z S CHEME S CHEME ->C SCMUTILS S TALIN 1048.11 3283.00 1153.86 531.10 1673.22 4130.19 5929.14 7134.71 794.31 1137.41 8286.06 37792.84 13649.81 4845.86 14833.53 48335.38 83480.27 91630.70 5980.27 10986.43 not implemented but could implement, including F ORTRAN, C, and C ++ not implemented in existing tool can’t implement Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 71 / 73 It is, of course, not excluded that the range of arguments or range of values of a function should consist wholly or partly of functions. The derivative, as this notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both ranges consist of functions. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 72 / 73 It is, of course, not excluded that the range of arguments or range of values of a function should consist wholly or partly of functions. The derivative, as this notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both ranges consist of functions. (¶4) Church, A. (1941). The Calculi of Lambda Conversion, Princeton University Press, Princeton, NJ. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 72 / 73 It is, of course, not excluded that the range of arguments or range of values of a function should consist wholly or partly of functions. The derivative, as this notion appears in the elementary differential calculus, is a familiar mathematical example of a function for which both ranges consist of functions. (¶4) Church, A. (1941). The Calculi of Lambda Conversion, Princeton University Press, Princeton, NJ. Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 72 / 73 Gottfried Leibniz Jacob Bernoulli Johann Bernoulli Leonhard Euler Joseph Louis Lagrange Simeon Poisson Michel Chasles Hubert Anson Newton Eliakim Hastings Moore Oswald Veblen Alonzo Church Jeffrey Mark Siskind (Purdue/ECE) AD of Functional Programs Lecture Circuit 2009 73 / 73