An Open and Efficient Type Switch for C++ Yuriy Solodkyy • Gabriel Dos Reis • Bjarne Stroustrup Texas A&M University October 25, 2012 OOPSLA’12, Tucson, AZ Partially supported by NSF grants: CCF-0702765, CCF-1043084, CCF-1150055 http://parasol.tamu.edu/mach7/ Traversing graphs with many kinds of nodes The Functional Programming: pattern matching Simple and elegant o Fast o Closed: adding a new variants implies distributed code changes o The Object-Oriented Programming: visitors Complicated, verbose, hard to teach and use o Slow compared to some alternatives o Semi-Open: allows sub-classes, but restricts cases to distinguish them o Our library: type switch o o o As simple and elegant as pattern matching As fast as pattern matching and faster than visitors Fully open to class extension OOPSLA'12: An Open and Efficient Type Switch for C++ 2 Type Switch Type Switch o a multi-way branch on object’s dynamic type switch (object) { case Type1: action1; ... case Typen: actionn; } // object is of Type1 // object is of Typen Open Type Switch o No closed world assumption We can add new classes and new functions without modifying existing code Independent modular extension of class hierarchy, including at run-time OOPSLA'12: An Open and Efficient Type Switch for C++ 3 Functional-Style Pattern Matching exp ::= val | exp + exp | exp - exp | exp * exp | exp / exp type expr Value | Plus | Minus | ... ;; = of int of expr * expr of expr * expr Pro – Elegant – Efficient – Adding a new function is easy Without modifying existing types or functions Con let rec eval e = – Not for class hierarchies match e with Value v ->v Variants are closed & disjoint | Plus (a,b)->(eval a)+(eval b) Hierarchies are extensible | Minus(a,b)->(eval a)-(eval b) Hierarchies are multilevel | ... – Adding a variant modifies type ;; OOPSLA'12: An Open and Efficient Type Switch for C++ 4 Object-Oriented Dynamic Lookup exp ::= val | exp + exp | exp - exp | exp * exp | exp / exp struct Expr { virtual int eval(); }; Pro – Modularity – Encapsulation – Adding a new subclass is easy struct Value : Expr { int eval() { return value; } int value; }; Con – Adding a new function is hard struct Plus : Expr { requires modification to int eval() { many classes in the return e1.eval()+e2.eval(); hierarchy } Expr& e1; Expr& e2; OOPSLA'12: An Open and Efficient Type Switch for C++ 5 }; Visitor Design Pattern struct Value; struct Plus; ... Pro – Adding a new function is easy struct Visitor { – Not too expensive virtual void visit(Value&)= 0; 2 virtual function calls virtual void visit(Plus&) = 0; – Library solution // ... – Commonly used }; Con struct Expr { – Hard to teach and use virtual void accept(Visitor&); Control inversion }; – Ugly struct Value : Expr { – Intrusive void accept(Visitor& v) – Specific to each hierarchy { v.visit(*this); } Lots of boilerplate }; – Hinders extensibility of classes OOPSLA'12: An Open and Efficient Type Switch for C++ 6 // ... Functional Programming Notation OCaml let rec eval e = match e with Value v | Plus (a,b) | Minus (a,b) | Times (a,b) | Divide(a,b) ;; -> -> -> -> -> v (eval (eval (eval (eval a) a) a) a) + * / (eval (eval (eval (eval b) b) b) b) Other functional languages have roughly similar syntax OOPSLA'12: An Open and Efficient Type Switch for C++ 7 Experimental C++ Notation C++ with type switch library (Mach7) int eval(Expr& e) { Match(e) Case(Value& x) Case(Plus& x) Case(Minus& x) Case(Times& x) Case(Divide& x) EndMatch } return return return return return x.value; eval(x.e1) eval(x.e1) eval(x.e1) eval(x.e1) + * / eval(x.e2); eval(x.e2); eval(x.e2); eval(x.e2); Logically equivalent to functional programming notation We could improve the syntax if we modified a compiler OOPSLA'12: An Open and Efficient Type Switch for C++ 8 Type Switch Design Pro – Easy to teach and use Non-intrusive No control inversion – Extensible Functions and classes – Fast – General Not specific to hierarchy – Library solution We use several industrial compilers for experimentation and measurement Con – Slower on the 1st call for each dynamic type passed OOPSLA'12: An Open and Efficient Type Switch for C++ 9 Make Type Switching Fast 140 120 Fast Dynamic Cast 80 60 Cycles 100 40 20 0 Case:0 10 20 30 40 50 60 70 OOPSLA'12: An Open and Efficient Type Switch for C++ Cohen’s Algorithm Binary Matrix Visitors Open Type Switch C Switch on integers 10 Key Features of Our Solution Memoization device o o Maps types to execution paths Uses dynamic types of objects Hash of dynamic type o A pointer to a virtual function table (v-table) identifies o Type of object Sub-object offset High-quality hashing needed for performance Experiments to achieve perfect and/or compact hashing We use the structure of v-table pointers OOPSLA'12: An Open and Efficient Type Switch for C++ 11 Open but Inefficient Solution switch (object) { case Type1: action1; case Type2: action2; ... case Typen: actionn; } if (Type1* match=dynamic_cast<Type1*>(object)) { action1; } else if (Type2* match=dynamic_cast<Type2*>(object)) { action2; } else ... if (Typen* match=dynamic_cast<Typen*>(object)) { actionn; } OOPSLA'12: An Open and Efficient Type Switch for C++ 12 Examine a Type Only Once Hypothetical Statement Execute the 1st statement si whose predicate Pi is true switch (x) { case P1(x): s1; ... case Pn(x): sn; } Memoize Clause of the 1st successful predicate Assume Generated Code typedef decltype(x) T; static hash_map<T,size_t> labels; switch (size_t& l = labels[x]) { default: // we have not seen x yet if (P1(x)) { l = 1; case 1: s1; } else ... if (Pn(x)) { l = n; case n: sn; } else l = n+1; case n+1: // none is true on x } Functional behavior OOPSLA'12: An Open and Efficient Type Switch for C++ 13 Intermediate Solution We use: – source sub-object as a hash map key – Pi(x) ≡ dynamic_cast<Typei*>(x)!=0 We map it to: – offset to target sub-object – jump target On first entry – memoize this-pointer offset – memoize jump target On subsequent entries – jump to target – adjust this-pointer struct info { ptrdiff_t offset; size_t target; }; // -------- Case(Typei* t) -------if (auto t=dynamic_cast<Typei*>(x)) { n.offset = int(t)-int(x); n.target = i; case i: auto match = adjust_ptr<Typei>(x,n.offset); actioni; } OOPSLA'12: An Open and Efficient Type Switch for C++ 14 Structure of V-Table Pointers 𝑉 = 𝑣1 , … , 𝑣𝑛 is a set of v-table pointers passed through a given Match-statement V= 00000001100110010011000111011100 00000001100110001111111000110100 00000001100110010010111111111100 00000001100110010011000001010100 00000001100110010011000101110100 00000001100110010010111110100100 00000001100110010011000100010100 000000011001100XXX1XXXXXXXXXX100 k k For each such set 𝑉 we: Use cache of 2𝑘 entries addressed by: n 𝐻𝑘𝑙 𝑣 = 𝑣 mod 2𝑘 𝑙 2 ll We choose 𝑘 and 𝑙 to minimize the number of conflicts in cache OOPSLA'12: An Open and Efficient Type Switch for C++ 15 Performance Evaluation Open Visual C++ Win 14% 1% 12% 48% 0% 9% 22% 8% 233% 135% 25% 3% Forwarding G++ REP SEQ RND REP SEQ RND Lnx 16% 56% 56% 33% 55% 78% Closed Visual C++ Win 122% 100% 467% 29% 470% 35% 49% 24% 290% 48% 33% 8% G++ Lnx 124% 640% 603% 53% 86% 88% Each number represents a median of 101 experiments timing 1,000,000 dispatches each 9% - visitors faster by 56% - type switch faster by 250% 200% 150% G++/Lnx 100% G++/Win 50% Visual C++ 0% -50% Case analysis on leaf classes REP OOPSLA'12: An Open and Efficient Type Switch for C++ SEQ RND 16 Performance Evaluation Open Visual C++ Win 14% 1% 12% 48% 0% 9% 22% 8% 233% 135% 25% 3% Forwarding G++ REP SEQ RND REP SEQ RND Lnx 16% 56% 56% 33% 55% 78% Closed Visual C++ Win 122% 100% 467% 29% 470% 35% 49% 24% 290% 48% 33% 8% G++ Lnx 124% 640% 603% 53% 86% 88% Each number represents a median of 101 experiments timing 1,000,000 dispatches each 9% - visitors faster by 56% - type switch faster by 250% 200% 150% G++/Lnx 100% G++/Win 50% Visual C++ 0% -50% Case analysis on base classes REP OOPSLA'12: An Open and Efficient Type Switch for C++ SEQ RND 17 Performance Evaluation Our library code is approximately as fast as natively compiled functional code. We will squeeze even more with a language solution. 0.025 0.015 0.010 Seconds 0.020 0.005 Language/Encoding 0.000 OCaml Haskell C++/Kind C++/Closed C++/Open Note: This is not a detailed language comparison, but we compare well with “the gold standard” for pattern matching OOPSLA'12: An Open and Efficient Type Switch for C++ 18 Real World Class Hierarchies LIB DG2 DG3 ET+ GEO JAV LOV NXT SLF UNI VA2 VA2 VW1 VW2 LANGUAGE CLASSES PATHS HEIGHT ROOTS LEAFS BOTH SMALLTALK SMALLTALK C++ EIFFEL JAVA EIFFEL OBJECTIVE-C SELF C++ SMALLTALK SMALLTALK SMALLTALK SMALLTALK OVERALLS 534 534 1356 1356 370 370 1318 13798 604 792 436 1846 310 310 1801 36420 613 633 3241 3241 2320 2320 387 387 1956 1956 15246 63963 11 13 8 14 10 10 7 17 9 14 13 9 15 17 2 381 2 923 87 289 1 732 1 445 1 218 2 246 51 1134 147 481 1 2582 1 1868 1 246 1 1332 298 10877 PARENTS CHILDREN AVG MAX AVG MAX 11 11 79 1 0 1.89 0 1.08 0 1.72 11 0 1.05 117 1.02 01 01 01 01 199 1.11 1 1 1 16 3 10 1 9 2 1 1 1 1 16 3.48 3.13 3.49 4.75 4.64 3.55 4.81 2.76 3.61 4.92 5.13 2.74 3.13 3.89 59 142 51 323 210 78 142 232 39 249 240 87 181 323 % sub-hierarchies 1% 3% 5% 10% 20% 25% 50% 64% 100% with more than … classes 700 110 50 20 10 7 3 2 1 OOPSLA'12: An Open and Efficient Type Switch for C++ 19 Efficiency of Hashing m= 1 0.40 2 3 4 5 6 7 ... 4369 type switches 87.5% of them rendered a hash function with no conflicts 0.35 Probability of Conflict p 0.30 0.25 0.20 0.15 0.10 0.05 m=0 0.00 1 2 4 8 16 32 64 128 Number of sub-objects n 256 512 1024 2048 Conflicts 0 1 2 3 4 5 6 >6 Type Switches 87.50% 5.58% 2.63% 0.87% 0.69% 0.69% 0.30% 1.76% OOPSLA'12: An Open and Efficient Type Switch for C++ 20 Related Work N.Wirth. Type extensions. 1988 W. R. Cook. Object-oriented programming versus abstract data types. 1991 P.Wadler. The expression problem. 1998 N. Glew. Type dispatch for named hierarchical types. 1999 M. Zenger, M. Odersky. Independently extensible solutions to the expression problem. 2005 M.Homer, J.Noble, K.Bruce, A.Black, D.Pearce. Patterns as Objects in Grace. 2012 N. H. Cohen. Type-extension type test can be performed in constant time. TOPLAS 1991 Y. Caseau. Efficient handling of multiple inheritance hierarchies. OOPSLA 1993 J. Vitek, R. N. Horspool, A. Krall. Efficient type inclusion tests. OOPSLA 1997 Y. Zibin, J. Y. Gil. Efficient subtyping tests with PQ-encoding. OOPSLA 2001 M. Gibbs, B. Stroustrup. Fast dynamic casting. SPE 2006 R. Ducournau. Perfect hashing as an almost perfect subtype test. TOPLAS 2008 OOPSLA'12: An Open and Efficient Type Switch for C++ 21 THANKS! http://parasol.tamu.edu/mach7/ Mach7 Matches the gold standard for notation Matches the gold standard for performance Handles both open and closed cases Special Thanks To: Question s Xavier Leroy Luc Maranget Gregory Berkolaiko Suhasini Subba Rao Jaakko Järvi Peter Pirkelbauer Andrew Sutton Abe Skolnik Karel Driesen OOPSLA'12: An Open and Efficient Type Switch for C++ 22 Summary of Contributions Technique for type switching On extensible hierarchical data types o Open by construction o Full support of general multiple inheritance of C++ o Efficiency Similar to pattern matching o Close or better than visitors o Outperforms existing open approaches o Library implementation Notational convenience of pattern matching o No changes to the C++ object model o No computations or code generation at link or load time o Unique partitioning of objects based on sub-objects o Suitable for other optimizations OOPSLA'12: An Open and Efficient Type Switch for C++ 23 Expression Problem exp ::= val | exp + exp | exp - exp | exp * exp | exp / exp Functional Languages Object-Oriented Languages type expr = class class class class class class | | | | Value Plus Minus Times Divide let rec eval e = match e with Value v -> | Plus (a,b) -> | Minus (a,b) -> | Times (a,b) -> | Divide(a,b) -> ;; of of of of of v (eval (eval (eval (eval int expr expr expr expr a) a) a) a) * * * * + * / expr expr expr expr ;; (eval (eval (eval (eval Easy to add new functions Adding new variants is intrusive b) b) b) b) int int int int int Expr { Value : Plus : Minus : Times : Divide: }; virtual int eval(); }; Expr { int value; }; Expr { Expr &e1, &e2; }; Expr { Expr &e1, &e2; }; Expr { Expr &e1, &e2; }; Expr { Expr &e1, &e2; }; Value::eval() { Plus ::eval() { Minus::eval() { Times::eval() { Divide::eval(){ return return return return return value; } e1.eval()+e2.eval(); e1.eval()-e2.eval(); e1.eval()*e2.eval(); e1.eval()/e2.eval(); Easy to add new variants Adding new functions is intrusive OOPSLA'12: An Open and Efficient Type Switch for C++ 24 } } } } Problem of Type Switching in C++ Classes are: – Extensible Important: Separate compilation Important: Dynamic linking – Hierarchical Multiple Inheritance Up-, down- and cross-casts Cast is not a no-op Ambiguities Existing approaches – Closed world: jump tables Unrealistic for modern C++ use – Open world: constant-time subtype tests + decision trees Most are not suitable for repeated multiple inheritance Most require computations or runtime code generation at load time Time increases with case number 140 120 100 80 60 Fast Dynamic Cast Cohen's Algorithm Binary Matrix Visitors Switch Cycles 40 20 0 Case 0 10 20 30 40 OOPSLA'12: An Open and Efficient Type Switch for C++ 50 60 70 80 90 25 Uniqueness of V-Table Pointers OOPSLA'12: An Open and Efficient Type Switch for C++ 26 V-Table Pointers Facts Are unique per same static type only o Can be many for same sub-object o can be shared with primary base class e.g. numerous copies of the same v-table in DLLs May change during [de]construction affects outcome of a type switch in constructors and destructors o is in line with C++ semantics for virtual function calls o Are at fixed offset within the dynamic type o o we can memoize offsets obtained on one instance and reapply them to another instance of the same dynamic type from the same sub-object OOPSLA'12: An Open and Efficient Type Switch for C++ 27 Visitors Comparison Open MS Visual C++ PGO w/o PGO G++ Lnx Win Forwarding x86-32 x86-32 x86-32 x86-64 x86-32 x86-64 REP SEQ RND REP SEQ RND 16% 56% 56% 33% 55% 78% Closed desktop with Lnx: Dell Dimension® Intel® D (Dual Core) CPU at G++ Pentium® MS Visual C++ 2.80 GHz; 1GB of RAM; Fedora Core 13 Lnx PGOwith -O2; w/o G++Win 4.4.5 executed x86PGO binaries x86-32 x86-32VAIO® x86-32 laptop x86-64 with x86-32 x86-64 Win: Sony Intel® Core™i5 460M CPU at 2.53 GHz; 14% 1% 18% 2% 37% 124% 122% 100% 41% 76% 37% 6GB of RAM; Windows 7 Pro. 12% 48% 22% 2% 46% 640% 29% executed 15% 30% 10% G++467% 4.6.1 / MinGW with -O2; x86 binaries 0% 9% 19% 5% 46% 603% 470% 35% 20% 32% 6% MS Visual C++ 2010 Professional x86/x64 22% 8% 17% 24% 36% 53% 49% 11%Profile-Guided 20% 36% binaries with 24% and without Optimizations 233% 135% 135% 193% 32% 86% 290% 48% 139% 12% 24% 25% 3% 4% 13% 23% 88% 33% 8% 1% 18% 16% 70 Visitors Open Type Switch 30 20 Cycles per Iteration Each number represents 60 a median of 101 experiments timing 1,000,000 dispatches each 18% - percentage visitors 50 are faster by 14% - percentage type40switch is faster by 10 0 OOPSLA'12: Benchmark REPAn Open and Efficient C++ SEQ Type Switch for RND 28 % of hierarchies with that number Sizes of Class Hierarchies 100.00% 50.00% 25.00% 12.50% 6.25% 3.13% 1.56% 0.78% 0.39% 0.20% 0.10% 0.05% 0.02% 0.01% 1 4 16 64 256 Classes in Hierarchy OOPSLA'12: An Open and Efficient Type Switch for C++ 1024 4096 29 Minimization of Conflicts OOPSLA'12: An Open and Efficient Type Switch for C++ 30 Effect of Conflicts Minimization OOPSLA'12: An Open and Efficient Type Switch for C++ 31