Towards a Modern Floating-Point Environment PhD thesis defense Olga Kupriianova PEQUAN team, CalSci Dept, LIP6, UPMC advisors: Jean-Claude Bajard, Christoph Lauter i Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 1/ 44 What is Floating-Point Arithmetic Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 2/ 44 What is Floating-Point Arithmetic Continuous R numbers 3/2 = 1.5 π = 3.141592653589 ... √ 2 = 1.414213562373 . . . Olga Kupriianova Discrete Floating point (FP) numbers 3/2 = 1.5e0 π √ = 3.14159e0 2 = 1.41421e0 Towards a Modern Floating-Point Environment Dec 11, 2015 2/ 44 What is Floating-Point Arithmetic Continuous R numbers 3/2 = 1.5 π = 3.141592653589 ... √ 2 = 1.414213562373 . . . Discrete Floating point (FP) numbers 3/2 = 1.5e0 π √ = 3.14159e0 2 = 1.41421e0 ± m0 .m1 m2 . . . mk−1 ·β E | {z } significand E is exponent, k precision, β radix Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 2/ 44 Rounding How to represent a real number in machine? Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 3/ 44 Rounding How to represent a real number in machine? π ≈ 3.14 e ≈ 2.71 π ≈ 3.141 e ≈ 2.718 π ≈ 3.1415 e ≈ 2.7182 π ≈ 3.14159 e ≈ 2.71828 π ≈ 3.141592 e ≈ 2.718281 π ≈ 3.1415926 e ≈ 2.7182818 Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 3/ 44 Rounding How to represent a real number in machine? π ≈ 3.14 e ≈ 2.71 π ≈ 3.141 e ≈ 2.718 π ≈ 3.1415 e ≈ 2.7182 π ≈ 3.14159 e ≈ 2.71828 π ≈ 3.141592 e ≈ 2.718281 π ≈ 3.1415926 e ≈ 2.7182818 ± m0 .m1 m2 . . . mk−1 ·β E , | {z } significand E is exponent, k precision Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 3/ 44 How to Round Rounding modes: Rounding Down (RD) Rounding Up (RU) Rounding toward Zero (RZ) Rounding to the Nearest : ties to even (RN), ties to away (RA) 3.14 Olga Kupriianova 3.15 Towards a Modern Floating-Point Environment Dec 11, 2015 4/ 44 How to Round Rounding modes: Rounding Down (RD) Rounding Up (RU) Rounding toward Zero (RZ) Rounding to the Nearest : ties to even (RN), ties to away (RA) x = 3.14159 3.14 Olga Kupriianova 3.15 Towards a Modern Floating-Point Environment Dec 11, 2015 4/ 44 How to Round Rounding modes: Rounding Down (RD) Rounding Up (RU) Rounding toward Zero (RZ) Rounding to the Nearest : ties to even (RN), ties to away (RA) x = 3.14159 3.14 RD(x) RZ(x) RN (x), RA(x) Olga Kupriianova 3.15 RU (x) Towards a Modern Floating-Point Environment Dec 11, 2015 4/ 44 How to Round Rounding modes: Rounding Down (RD) Rounding Up (RU) Rounding toward Zero (RZ) Rounding to the Nearest : ties to even (RN), ties to away (RA) x = 3.145 3.14 Olga Kupriianova 3.15 Towards a Modern Floating-Point Environment Dec 11, 2015 4/ 44 How to Round Rounding modes: Rounding Down (RD) Rounding Up (RU) Rounding toward Zero (RZ) Rounding to the Nearest : ties to even (RN), ties to away (RA) x = 3.145 3.14 RD(x) RZ(x) RN (x) Olga Kupriianova 3.15 RU (x) RA(x) Towards a Modern Floating-Point Environment Dec 11, 2015 4/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 2E m Olga Kupriianova 2E (m + 1) Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 x 2E m Olga Kupriianova 2E (m + 1) Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 x̂ 2E m Olga Kupriianova 2E (m + 1) Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 RN RN (x) 2E m Olga Kupriianova x̂ 2E (m + 1) Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 x̂ 2E m Olga Kupriianova 2E (m + 1) Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 RN x̂ 2E m Olga Kupriianova 2E (m + 1) Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 RN x̂ 2E m Olga Kupriianova 2E (m + 1) Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 RN x̂ 2E m 2E (m + 1) Table Maker’s Dilemma (TMD) Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Accuracy and Rounding Accuracy Mathematical x → x̂ in FP arithmetic Due to roundings, |x − x̂| > 0 RN x̂ 2E m 2E (m + 1) Table Maker’s Dilemma (TMD) How to determine the accuracy of the approximation x̂ Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 5/ 44 Floating-Point Environment IEEE754-1985: binary FP numbers, rounding modes, arithmetic operations In total ∼ 70 operations Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 6/ 44 Floating-Point Environment IEEE754-1985: binary FP numbers, rounding modes, arithmetic operations In total ∼ 70 operations IEEE754-2008 : decimal FP numbers and operations, recommendations for mathematical function implementations, heterogeneous operations: mix precisions In total ∼ 350 operations for binary arithmetic Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 6/ 44 Floating-Point Environment IEEE754-1985: binary FP numbers, rounding modes, arithmetic operations In total ∼ 70 operations IEEE754-2008 : decimal FP numbers and operations, recommendations for mathematical function implementations, heterogeneous operations: mix precisions In total ∼ 350 operations for binary arithmetic Current situation: decimal and binary worlds, intersect only in conversion ...next revisions (2018?): even more operations? Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 6/ 44 Floating-Point Environment IEEE754-1985: binary FP numbers, rounding modes, arithmetic operations In total ∼ 70 operations IEEE754-2008 : decimal FP numbers and operations, recommendations for mathematical function implementations, heterogeneous operations: mix precisions In total ∼ 350 operations for binary arithmetic Current situation: decimal and binary worlds, intersect only in conversion ...next revisions (2018?): even more operations? mix binary and decimals in one operation Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 6/ 44 Floating-Point Environment IEEE754-1985: binary FP numbers, rounding modes, arithmetic operations In total ∼ 70 operations IEEE754-2008 : decimal FP numbers and operations, recommendations for mathematical function implementations, heterogeneous operations: mix precisions In total ∼ 350 operations for binary arithmetic Current situation: decimal and binary worlds, intersect only in conversion ...next revisions (2018?): even more operations? mix binary and decimals in one operation provide more implementations of mathematical functions Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 6/ 44 Implementation of Mathematical Functions Why do we need more implementations of mathematical functions? Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 7/ 44 Implementation of Mathematical Functions Why do we need more implementations of mathematical functions? Mathematics Computer Science exp(x) = ex = lim n→∞ 1+ logb (x) = y; by = x Z x 2 erf(x) = e−t dt x n n Several implementations for each: exp - correctly rounded exp - faithfully rounded exp - with accuracy ≤ 2−45 ... 0 Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 7/ 44 Implementation of Mathematical Functions Why do we need more implementations of mathematical functions? Mathematics Computer Science exp(x) = ex = lim n→∞ 1+ logb (x) = y; by = x Z x 2 erf(x) = e−t dt x n n Several implementations for each: exp - correctly rounded exp - faithfully rounded exp - with accuracy ≤ 2−45 ... 0 The existing libraries give only few implementations per function Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 7/ 44 Contributions/Structure 1. Mixed-Radix Arithmetic and Arbitrary Precision Base Conversion 2. Automatic Generation of Mathematical Functions (Metalibm) Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 8/ 44 Mixed-Radix Arithmetic Radix conversion algorithm for Mixed-Radix Arithmetic Correctly-rounded conversion from decimal character sequence to a binary FP number (scanf analogue) Research for fused multiply-add operation xy ± z Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 9/ 44 Mixed-Radix Arithmetic Radix Conversion Two-steps algorithm: 1 2 Exponent computation is errorless Compute mantissa with specified accuracy Integer-based computations Memory consumption is known in advance Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 9/ 44 Mixed-Radix Arithmetic Radix conversion algorithm for Mixed-Radix Arithmetic Correctly-rounded conversion from decimal character sequence to a binary FP number (scanf analogue) Research for fused multiply-add operation xy ± z Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 9/ 44 Mixed-Radix Arithmetic Our scanf version User input is arbitrarily long No way to precompute the worst cases for the TMD Algorithm with integer computations, no memory allocations Two-ways scheme is used: fast easy rounding or slow hard Two conversions: decimal-binary and binary-decimal Gives correctly-rounded (CR) result for all the rounding modes Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 9/ 44 Mixed-Radix Arithmetic Radix conversion algorithm for Mixed-Radix Arithmetic Correctly-rounded conversion from decimal character sequence to a binary FP number (scanf analogue) Research for fused multiply-add operation (FMA) xy ± z Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 9/ 44 Mixed-Radix Arithmetic Mixed-radix FMA xy ± z 2−4 · 139 × 10−16 · 346 − 23 · 42 = 10E m Addition and multiplication within only one rounding Worst cases search: reduce the iterations number with the use of continued fraction theory and variable relations Iterations reduced at ∼ 99% Still about ∼ 16 000 000 000 iterations First results obtained in ∼ 10 weeks of computations Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 9/ 44 Code Generation for Mathematical Functions 1. Introduction & Motivation for Code Generation 2. Manual Function Implementation. Background 3. Metalibm: General Overview 4. Metalibm: Domain Splitting 5. Metalibm: Reconstruction for Vectorizable Code 6. Conclusion and Perspectives Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 10/ 44 Existing Mathematical Libraries (libm) Existing implementations Intel’s Math Kernel Libraries glibc libm AMD’s libm CRLibm by ENS Lyon ARM’s mathlib newlib libmcr by Sun OpenLibm for Julia ... Yeppp! Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 11/ 44 Existing Mathematical Libraries (libm) Existing implementations Intel’s Math Kernel Libraries glibc libm AMD’s libm CRLibm by ENS Lyon ARM’s mathlib newlib libmcr by Sun OpenLibm for Julia ... Yeppp! All these libms are static Only few implementations per function Limited dictionary of functions Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 11/ 44 Existing Mathematical Libraries (libm) Existing implementations Intel’s Math Kernel Libraries glibc libm AMD’s libm CRLibm by ENS Lyon ARM’s mathlib newlib libmcr by Sun OpenLibm for Julia ... Yeppp! All these libms are static Only few implementations per function Limited dictionary of functions Olga Kupriianova How to get exp(x) with 40 accurate bits Towards a Modern Floating-Point Environment Dec 11, 2015 11/ 44 Existing Mathematical Libraries (libm) Existing implementations Intel’s Math Kernel Libraries glibc libm AMD’s libm CRLibm by ENS Lyon ARM’s mathlib newlib libmcr by Sun OpenLibm for Julia ... Yeppp! All these libms are static Only few implementations per function Limited dictionary of functions Olga Kupriianova How to get exp(x) R x sin t with 40 accurate bits t dt with 25 bits 0 Towards a Modern Floating-Point Environment Dec 11, 2015 11/ 44 One Size Does not Fit All Current request Several performance/accuracy options (“quick and dirty”, faithful, correctly-rounded) Several portability options (SIMD instructions) Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 12/ 44 One Size Does not Fit All Current request Several performance/accuracy options (“quick and dirty”, faithful, correctly-rounded) Several portability options (SIMD instructions) Some people are still not happy More performance, less compliance degraded accuracy reduced domain Functions not from standard libm (e.g. Olga Kupriianova Rx 0 sin t t dt, Towards a Modern Floating-Point Environment dilogarithm) Dec 11, 2015 12/ 44 One Size Does not Fit All Current request Several performance/accuracy options (“quick and dirty”, faithful, correctly-rounded) Several portability options (SIMD instructions) Some people are still not happy More performance, less compliance degraded accuracy reduced domain Functions not from standard libm (e.g. Rx 0 sin t t dt, dilogarithm) Who is going to write all these variants? Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 12/ 44 Task Rough Computations 3 precisions from IEEE754 ∼ 50 functions in libm ∼ 5 accuracies for each precision ∼ 5 various approximations Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 13/ 44 Task Rough Computations 3 precisions from IEEE754 ∼ 50 functions in libm ∼ 5 accuracies for each precision ∼ 5 various approximations I have to implement 1300 functions... Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 13/ 44 Task Rough Computations 3 precisions from IEEE754 ∼ 50 functions in libm ∼ 5 accuracies for each precision ∼ 5 various approximations I have to implement 1300 functions... Each function implementation takes ∼ 1 man-month Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 13/ 44 Task Rough Computations 3 precisions from IEEE754 ∼ 50 functions in libm ∼ 5 accuracies for each precision ∼ 5 various approximations I have to implement 1300 functions... Each function implementation takes ∼ 1 man-month It will take me more then 100 years to rewrite the libm Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 13/ 44 Task Rough Computations 3 precisions from IEEE754 ∼ 50 functions in libm ∼ 5 accuracies for each precision ∼ 5 various approximations I have to implement 1300 functions... Each function implementation takes ∼ 1 man-month It will take me more then 100 years to rewrite the libm Generate parametrized implementations of mathematical functions Metalibm Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 13/ 44 Code Generation for Mathematical Functions 1. Introduction & Motivation for Code Generation 2. Manual Function Implementation. Background 3. Metalibm: General Overview 4. Metalibm: Domain Splitting 5. Metalibm: Reconstruction for Vectorizable Code 6. Conclusion and Perspectives Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 14/ 44 How to Implement a Mathematical Function Elementary functions have transcendental values exp(1) = 2.71828182845904523536028747135266 . . . log2 (10) = 3.32192809488736234787031942948939 . . . Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 15/ 44 How to Implement a Mathematical Function Elementary functions have transcendental values exp(1) = 2.71828182845904523536028747135266 . . . log2 (10) = 3.32192809488736234787031942948939 . . . Compute polynomial approximations Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 15/ 44 How to Implement a Mathematical Function Elementary functions have transcendental values exp(1) = 2.71828182845904523536028747135266 . . . log2 (10) = 3.32192809488736234787031942948939 . . . Compute polynomial approximations Lagrange, Newton, Chebyshev Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 15/ 44 How to Implement a Mathematical Function Elementary functions have transcendental values exp(1) = 2.71828182845904523536028747135266 . . . log2 (10) = 3.32192809488736234787031942948939 . . . Compute polynomial approximations Lagrange, Newton, Chebyshev error bounds Remez algorithm Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 15/ 44 Approximation Polynomials f = exp Compute different approximations on [0, 5] deg(p1 ) = 7, deg(p2 ) = 8, deg(p3 ) = 9 p1 p2 p3 0.005 0.004 0.003 0.002 0.001 0 -0.001 -0.002 -0.003 -0.004 -0.005 Olga Kupriianova 0 1 2 3 Towards a Modern Floating-Point Environment 4 5 Dec 11, 2015 16/ 44 Approximation Polynomials f = exp Compute different approximations on [0, 5] deg(p1 ) = 7, deg(p2 ) = 8, deg(p3 ) = 9 Larger domain, larger degree Conclusion: approximate only on small domains Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 16/ 44 Argument Reduction. Example Example Implement f (x) = ex Exploit the algebraic property ea+b = ea · eb x j ex = 2 log 2 = 2 Olga Kupriianova x log 2 m x j − x m · 2 log 2 log 2 = 2E · ex−E log 2 = 2E · er log(2) log(2) r∈ − , 2 2 Towards a Modern Floating-Point Environment Dec 11, 2015 17/ 44 Argument Reduction. Example Example Implement f (x) = ex Exploit the algebraic property ea+b = ea · eb P. Tang’s table-based method log(2) log(2) x m i/2t r∗ ∗ , e =2 ·2 ·e , r ∈ − 2 · 2t 2 · 2t m ∈ Z, Olga Kupriianova t ∈ Z, 0 ≤ i ≤ 2t − 1, t 2i/2 in table Towards a Modern Floating-Point Environment Dec 11, 2015 17/ 44 Argument Reduction. Does It Always Work? Useful algebraic properties: exp(a + b) = exp(a) · exp(b) log(ab) = log(a) + log(b) sin(a + 2πk) = sin(a) ... Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 18/ 44 Argument Reduction. Does It Always Work? Useful algebraic properties: exp(a + b) = exp(a) · exp(b) log(ab) = log(a) + log(b) sin(a + 2πk) = sin(a) ... How to implement Dickman’s function? uρ0 (u) + ρ(u − 1) = 0, Olga Kupriianova ρ(u) = 1 for 0 ≤ u ≤ 1 Towards a Modern Floating-Point Environment Dec 11, 2015 18/ 44 Argument Reduction. Does It Always Work? Useful algebraic properties: exp(a + b) = exp(a) · exp(b) log(ab) = log(a) + log(b) sin(a + 2πk) = sin(a) ... How to implement Dickman’s function? uρ0 (u) + ρ(u − 1) = 0, ρ(u) = 1 for 0 ≤ u ≤ 1 Piecewise-polynomial approximation Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 18/ 44 General Scheme x x Argument Reduction Reduced argument r Tabulation index i Polynomial Table Reconstruction f (x) Olga Kupriianova Domain Splitting x Subdomain index i Polynomial Table Reconstruction f (x) Towards a Modern Floating-Point Environment Dec 11, 2015 19/ 44 General Scheme x x Argument Reduction Reduced argument r Tabulation index i Polynomial Table Reconstruction f (x) Domain Splitting x Subdomain index i Polynomial Table Reconstruction f (x) And filtering of special cases Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 19/ 44 Code Generation for Mathematical Functions 1. Introduction & Motivation for Code Generation 2. Manual Function Implementation. Background 3. Metalibm: General Overview 4. Metalibm: Domain Splitting 5. Metalibm: Reconstruction for Vectorizable Code 6. Conclusion and Perspectives Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 20/ 44 Structure parameters Metalibm implementation function domain final accuracy max poly degree table size ... Olga Kupriianova Towards a Modern Floating-Point Environment C code for function Dec 11, 2015 21/ 44 What is Metalibm Academic Prototype Open-Source code generator for parametrized libm functions Part of ANR Metalibm Project Available at http://metalibm.org/ Objectives Push-button tool to implement functions f : R → R automatic argument reduction automatic polynomial approximation automatic domain splitting Support black-box functions no function dictionary specify function by an expression or external code Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 22/ 44 What is Metalibm Academic Prototype Open-Source code generator for parametrized libm functions Part of ANR Metalibm Project Available at http://metalibm.org/ Objectives Push-button tool to implement functions f : R → R automatic argument reduction automatic polynomial approximation automatic domain splitting Support black-box functions no function dictionary specify function by an expression or external code ⇒ use only C 3 functions that can be evaluated over intervals Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 22/ 44 Generation Levels parameters C 3 function domain final accuracy max poly degree table size Metalibm 1 implementation 2 C code for function 3 1 - Properties detection 2 - Domain splitting 3 - Approximation Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 23/ 44 Property Detection: Example on Exponential Task Generate f (x) on [a, b] with accuracy ε̄ Generation hypothesis f (x) is of type β x , unknown β Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 24/ 44 Property Detection: Example on Exponential Task Generate f (x) on [a, b] with accuracy ε̄ Generation hypothesis f (x) is of type β x , unknown β Finding the base β = exp log(f (ξ)) ξ , for some ξ ∈ [a, b] Decision of acceptance x [a,b] ε = fβ(x) − 1 ∞ Olga Kupriianova |ε| < |ε̄| Towards a Modern Floating-Point Environment Dec 11, 2015 24/ 44 Property Detection: Example on Exponential error 2*eps eps -eps -2eps 1.5e-14 1e-14 5e-15 0 -5e-15 -1e-14 -1.5e-14 -100 Olga Kupriianova -50 0 50 Towards a Modern Floating-Point Environment 100 Dec 11, 2015 25/ 44 Property Detection: Example on Exponential error 2*eps eps -eps -2eps 1.5e-14 1e-14 5e-15 0 -5e-15 -1e-14 -1.5e-14 -100 Olga Kupriianova -50 0 50 Towards a Modern Floating-Point Environment 100 Dec 11, 2015 25/ 44 Property Detection: Example on Exponential error 2*eps eps -eps -2eps 1.5e-14 1e-14 5e-15 0 -5e-15 -1e-14 -1.5e-14 -100 Olga Kupriianova -50 0 50 Towards a Modern Floating-Point Environment 100 Dec 11, 2015 25/ 44 Property Detection Currently detectable properties Exponential f (a + b) = f (a)f (b) Logarithmic f (ab) = f (a) + f (b) Periodic f (x + C) = f (x) Symmetric f (x) = f (−x) and f (x) = −f (x) Sinh-like family f (x) = β x − β −x Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 26/ 44 When property detection does not work Argument reduction Step is based on mathematical properties: na+b = na · nb , sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . . Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 27/ 44 When property detection does not work Argument reduction Step is based on mathematical properties: na+b = na · nb , sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . . What properties do we know for erf(x), or a black-box function? Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 27/ 44 When property detection does not work Argument reduction Step is based on mathematical properties: na+b = na · nb , sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . . What properties do we know for erf(x), or a black-box function? When argument reduction does not work Piecewise-polynomial approximation Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 27/ 44 When property detection does not work Argument reduction Step is based on mathematical properties: na+b = na · nb , sin(x + 2π) = sin(x), log(a · b) = log(a) + log(b), . . . What properties do we know for erf(x), or a black-box function? When argument reduction does not work Piecewise-polynomial approximation Algorithm to split the domain Reconstruction becomes an execution of if-statements Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 27/ 44 Code Generation for Mathematical Functions 1. Introduction & Motivation for Code Generation 2. Manual Function Implementation. Background 3. Metalibm: General Overview 4. Metalibm: Domain Splitting 5. Metalibm: Reconstruction for Vectorizable Code 6. Conclusion and Perspectives Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 28/ 44 Problem Statement Task: Split the domain [a, b] into I0 , I1 , . . . , IN Approximation degrees on Ik : dk ≤ dmax Olga Kupriianova Towards a Modern Floating-Point Environment function initial domain accuracy degree bound Dec 11, 2015 f [a, b] ε̄ dmax 29/ 44 Problem Statement Task: Split the domain [a, b] into I0 , I1 , . . . , IN Approximation degrees on Ik : dk ≤ dmax function initial domain accuracy degree bound f [a, b] ε̄ dmax Naive Solution: Split domain into N equal parts, N is large. Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 29/ 44 Problem Statement f = asin, [a, b] = [0, 0.85], dmax ≤ 8, ε̄ = 2−52 1.2 splitting asin(x) 9 degrees 8 1 7 0.8 6 5 0.6 4 0.4 3 2 0.2 1 0 0 45 intervals Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 29/ 44 Problem Statement f = asin, [a, b] = [0, 0.85], dmax ≤ 8, ε̄ = 2−52 1.2 splitting asin(x) 9 degrees 8 1 7 0.8 6 5 0.6 4 0.4 3 2 0.2 1 0 0 45 intervals Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 29/ 44 Problem Statement The quantity of intervals N → min Classic problem: having f , [a, b] and d find a polynomial p and accuracy ε We need: having f , [a, b], bounds ε̄ and dmax find I and polynomial p Solution: compute Remez polynomials and check the constraints? Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 29/ 44 Problem Statement Remez Test OK Olga Kupriianova Bisect Towards a Modern Floating-Point Environment Dec 11, 2015 29/ 44 Domain Splitting Theorem of de la Vallée-Poussin: a continuous function f on [a, b], its approximating polynomial p find the bounds for optimal error error bounds 6e-05 0 -6e-05 Olga Kupriianova 0 0.2 0.4 0.6 0.8 Towards a Modern Floating-Point Environment 1 Dec 11, 2015 30/ 44 Domain Splitting Interpolation on Chebyshev nodes Test OK Remez OK Olga Kupriianova Bisect Bisect Towards a Modern Floating-Point Environment Dec 11, 2015 30/ 44 Domain Splitting error bounds target acc 6e-05 0 -6e-05 0 0.2 0.4 0.6 0.8 1 Need a split Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 30/ 44 Domain Splitting error bounds target acc 6e-05 0 -6e-05 0 0.2 0.4 0.6 0.8 1 No split Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 30/ 44 Domain Splitting error bounds target acc 6e-05 0 -6e-05 0 0.2 0.4 0.6 0.8 1 Need Remez Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 30/ 44 Domain Splitting. Our method Bisection f = asin, [a, b] = [0, 0.85], dmax ≤ 8, ε̄ = 2−52 2 9 splitting asin(x) degrees 8 7 1.5 6 5 1 4 3 0.5 2 1 5 0.8 5 0.8 0.7 5 0.7 0.6 5 0.6 0.5 5 0.5 0.4 5 0.4 0.3 5 0.3 0.2 5 0.2 0.1 5 0.0 0 0.1 0 0 24 intervals Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 31/ 44 Domain Splitting. Our method Our optimized method f = asin, [a, b] = [0, 0.85], dmax ≤ 8, ε̄ = 2−52 2 9 splitting asin(x) degrees 8 7 1.5 6 5 1 4 3 0.5 2 1 5 0.8 5 0.8 0.7 5 0.7 0.6 5 0.6 0.5 5 0.5 0.4 5 0.4 0.3 5 0.3 0.2 5 0.2 0.1 5 0.0 0 0.1 0 0 18 intervals Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 31/ 44 Some results measure subdomains in bisection subdomains in our method subdomains saved coefficients saved memory saved (bytes) memory saved (%) name f1 f2 f3 f4 Olga Kupriianova f asin asin erf erf ε̄ 2−52 2−45 2−51 2−45 f1 24 18 25% 42 336 23% f2 15 10 30% 34 272 35% domain I [0, 0.85] [−0.75, 0.75] [−0.75, 0.75] [−0.75, 0.75] Towards a Modern Floating-Point Environment f3 9 5 44 % 30 240 38.5% f4 12 8 30% 28 224 45% dmax 8 8 9 7 Dec 11, 2015 32/ 44 Code Generation for Mathematical Functions 1. Introduction & Motivation for Code Generation 2. Manual Function Implementation. Background 3. Metalibm: General Overview 4. Metalibm: Domain Splitting 5. Metalibm: Reconstruction for Vectorizable Code 6. Conclusion and Perspectives Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 33/ 44 Problem Statement Reconstruction gets tricky with arbitrary domain splitting: Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 34/ 44 Problem Statement Reconstruction gets tricky with arbitrary domain splitting: f (x) ≈ pk (x), x ∈ Ik , k ∈ [0, N ] ∩ Z Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 34/ 44 Problem Statement Reconstruction gets tricky with arbitrary domain splitting: f (x) ≈ pk (x), x ∈ Ik , k ∈ [0, N ] ∩ Z ∀x ∈ [a, b] find k such that x ∈ Ik Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 34/ 44 Problem Statement Reconstruction gets tricky with arbitrary domain splitting: f (x) ≈ pk (x), x ∈ Ik , k ∈ [0, N ] ∩ Z f ≈ p2 (x) ∀x ∈ [a, b] find k such that x ∈ Ik x a Olga Kupriianova Towards a Modern Floating-Point Environment 01 2 3 4 5 6 7 8 9 Dec 11, 2015 b 34/ 44 Problem Statement /∗ compute i so that a[i] < x < a[i+1] ∗/ i=31; if (x < arctan_table[i][A].d) i−= 16; else i+=16; if (x < arctan_table[i][A].d) i−= 8; else i+= 8; if (x < arctan_table[i][A].d) i−= 4; else i+= 4; if (x < arctan_table[i][A].d) i−= 2; else i+= 2; if (x < arctan_table[i][A].d) i−= 1; else i+= 1; if (x < arctan_table[i][A].d) i−= 1; xmBihi = x−arctan_table[i][B].d; xmBilo = 0.0; Listing 1: Code sample for arctan function from crlibm library Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 34/ 44 Vectorizable Reconstruction f ≈ p2 (x) x Splitting of the domain [a, b]: a Olga Kupriianova 01 Towards a Modern Floating-Point Environment 2 3 4 5 6 Dec 11, 2015 7 8 9 b 35/ 44 Vectorizable Reconstruction To determine the subinterval we build a mapping function P (x) = k, x ∈ Ik 10 9 8 7 6 5 4 3 2 1 0 a Olga Kupriianova I0 I1 I2 I3 Towards a Modern Floating-Point Environment I4 I5 I6 I7 I8 Dec 11, 2015 I9 b 35/ 44 Vectorizable Reconstruction To determine the subinterval we build a mapping function P (x) = k, x ∈ Ik p(x) : bp(x)c = k, x ∈ Ik 10 9 8 7 6 5 4 3 2 1 0 a Olga Kupriianova I0 I1 I2 I3 Towards a Modern Floating-Point Environment I4 I5 I6 I7 I8 Dec 11, 2015 I9 b 35/ 44 Vectorizable Reconstruction To determine the subinterval we build a mapping function P (x) = k, x ∈ Ik p(x) : bp(x)c = k, x ∈ Ik 10 9 8 7 6 5 4 3 2 1 0 a Olga Kupriianova I0 I1 I2 I3 Towards a Modern Floating-Point Environment I4 I5 I6 I7 I8 Dec 11, 2015 I9 b 35/ 44 Vectorizable Reconstruction Interpolation polynomial with a posteriori condition check 10 9 8 7 6 5 4 3 2 1 0 a Olga Kupriianova I0 I1 I2 I3 Towards a Modern Floating-Point Environment I4 I5 I6 I7 I8 Dec 11, 2015 I9 b 35/ 44 Vectorizable Reconstruction mapping function P (x) = k, x ∈ Ik p(x) : bp(x)c = k, x ∈ Ik 10 p(x) 9 bp(x)c 8 7 6 5 4 3 2 1 0 I0 a I1 Olga Kupriianova I2 I3 I4 Towards a Modern Floating-Point Environment I5 I6 I7 I8 I9 b Dec 11, 2015 35/ 44 Example p 7 6 5 4 3 2 1 0 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 f = asin(x); dom = [-0.75; 0]; ε̄ = 2−48 ; dmax = 10; Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 36/ 44 A Posteriori Condition Check Fails p 7 6 5 4 3 2 1 0 -1.4 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 f = atan(x); dom = [-π/2; 0]; ε̄ = 2−40 ; dmax = 8; Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 37/ 44 How to Improve our Method Interval Arithmetic approach: 10 9 8 7 6 5 4 3 2 1 0 a Olga Kupriianova b Towards a Modern Floating-Point Environment Dec 11, 2015 38/ 44 How to Improve our Method Interval Arithmetic approach: 10 9 8 7 6 5 4 3 2 1 0 a Olga Kupriianova b Towards a Modern Floating-Point Environment Dec 11, 2015 38/ 44 How to Improve our Method Interval Arithmetic approach: 10 9 8 7 6 5 4 3 2 1 0 a Olga Kupriianova b Towards a Modern Floating-Point Environment Dec 11, 2015 38/ 44 How to Improve our Method Interval Arithmetic approach: Interval system of linear algebraic equations 10 9 8 7 6 5 4 3 2 1 0 a Olga Kupriianova c0 b0 1 a0 · · · an0 1 a1 · · · an c1 b1 1 .. .. . . .. · .. = .. . . . . . . cn bn 1 an · · · ann b Towards a Modern Floating-Point Environment Dec 11, 2015 38/ 44 How to Improve our Method Interval Arithmetic approach: Interval system of linear algebraic equations 10 9 8 7 6 5 4 3 2 1 0 c0 b0 1 a0 · · · an0 1 a1 · · · an c1 b1 1 .. .. . . .. · .. = .. . . . . . . cn bn 1 an · · · ann a b tolerance solution: united solution: Ξtol = c ∈ RN | ∀a ∈ a, ∀b ∈ b, Ac = b Ξuni = c ∈ RN | ∃a ∈ a, ∃b ∈ b, Ac = b S. Shary HdR thesis “Интервальные алгебраические задачи и их численное решение” Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 38/ 44 Achievements Generate the specified function versions in few minutes Detect algebraic properties automatically Use the improved (optimized) domain splitting algorithm Generation of the vectorizable implementations has started Extra bonuses: composite functions and a set of function variants Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 39/ 44 Code Generation for Mathematical Functions 1. Introduction & Motivation for Code Generation 2. Manual Function Implementation. Background 3. Metalibm: General Overview 4. Metalibm: Domain Splitting 5. Metalibm: Reconstruction for Vectorizable Code 6. Conclusion and Perspectives Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 40/ 44 Conclusion Paving the road for mixed-radix arithmetic: Novel algorithm for radix conversion Algorithm for conversion from decimal character sequence to a binary FP number Worst cases search for FMA Code generation for mathematical functions: Novel algorithm for domain splitting Novel approach to generate vectorizable implementations Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 41/ 44 Future Work Mixed-Radix arithmetic Finish the worst cases search for FMA Start its implementation Research on other arithmetic operations Obtain test/comparison results for our scanf Code generation of mathematical functions Filtering of special cases Improve vectorizable reconstruction: Interval arithmetic for a priori approach Overlapping intervals Connection between reconstruction and domain splitting Add more parameters Integrate with N.Brunie’s version Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 42/ 44 Perspectives Long-term Future Integrate Metalibm to the glibc/gcc Apply the similar algorithms for filter generation Formal proof for scanf algorithm Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 43/ 44 Questions N. Brunie, F. de Dinechin, O. Kupriianova, and C. Lauter. Code generators for mathematical functions. In ARITH22 - June 2015, Lyon, France. Proceedings, pp 66-73. Best paper award. O. Kupriianova and C. Q. Lauter. A domain splitting algorithm for the mathematical functions code generator. In Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, November, 2014, pp 1271-1275. O. Kupriianova and C. Q. Lauter. Replacing branches by polynomials in vectorizable elementary functions. 16th GAMM-IMACS International Symposium on Scientific Computing, Computer Arithmetic and Validated Numerics, September 2014, Würzburg, Germany. O. Kupriianova and C. Q. Lauter. Metalibm: A mathematical functions code generator. In Mathematical Software - ICMS 2014 - 4th International Congress, Seoul, South Korea, August 2014. Proceedings, pp 713–717. O. Kupriianova, C. Q. Lauter, and J.-M. Muller. Radix conversion for IEEE754-2008 mixed radix floating-point arithmetic. In Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, November 2013, pp 1134–1138. Olga Kupriianova Towards a Modern Floating-Point Environment Dec 11, 2015 44/ 44