slides - Texas A&M University

advertisement
An Open and Efficient Type Switch for C++
Yuriy Solodkyy • Gabriel Dos Reis • Bjarne Stroustrup
Texas A&M University
October 25, 2012
OOPSLA’12, Tucson, AZ
Partially supported by NSF grants:
CCF-0702765, CCF-1043084, CCF-1150055
http://parasol.tamu.edu/mach7/
Traversing graphs with many kinds of nodes

The Functional Programming: pattern matching
Simple and elegant
o Fast
o Closed: adding a new variants implies distributed code changes
o

The Object-Oriented Programming: visitors
Complicated, verbose, hard to teach and use
o Slow compared to some alternatives
o Semi-Open: allows sub-classes, but restricts cases to distinguish them
o

Our library: type switch
o
o
o
As simple and elegant as pattern matching
As fast as pattern matching and faster than visitors
Fully open to class extension
OOPSLA'12: An Open and Efficient Type Switch for C++
2
Type Switch

Type Switch
o
a multi-way branch on object’s dynamic type
switch (object)
{
case Type1: action1;
...
case Typen: actionn;
}

// object is of Type1
// object is of Typen
Open Type Switch
o
No closed world assumption


We can add new classes and new functions without modifying existing code
Independent modular extension of class hierarchy, including at run-time
OOPSLA'12: An Open and Efficient Type Switch for C++
3
Functional-Style Pattern Matching
exp ::= val | exp + exp | exp - exp | exp * exp | exp / exp
type expr
Value
| Plus
| Minus
| ...
;;
=
of int
of expr * expr
of expr * expr

Pro
– Elegant
– Efficient
– Adding a new function is easy
 Without modifying existing
types or functions

Con
let rec eval e =
– Not for class hierarchies
match e with
Value
v ->v
 Variants are closed & disjoint
| Plus (a,b)->(eval a)+(eval b)
 Hierarchies are extensible
| Minus(a,b)->(eval a)-(eval b)
 Hierarchies are multilevel
| ...
– Adding a variant modifies type
;;
OOPSLA'12: An Open and Efficient Type Switch for C++
4
Object-Oriented Dynamic Lookup
exp ::= val | exp + exp | exp - exp | exp * exp | exp / exp
struct Expr {
virtual int eval();
};

Pro
– Modularity
– Encapsulation
– Adding a new subclass is easy
struct Value : Expr {
int eval() { return value; }
int value;
};

Con
– Adding a new function is hard
struct Plus : Expr {
 requires modification to
int eval() {
many classes in the
return e1.eval()+e2.eval();
hierarchy
}
Expr& e1;
Expr& e2;
OOPSLA'12: An Open and Efficient Type Switch for C++
5
};
Visitor Design Pattern
struct Value; struct Plus; ...

Pro
– Adding a new function is easy
struct Visitor {
– Not too expensive
virtual void visit(Value&)= 0;
 2 virtual function calls
virtual void visit(Plus&) = 0;
– Library solution
// ...
– Commonly used
};

Con
struct Expr {
– Hard to teach and use
virtual void accept(Visitor&);
 Control inversion
};
– Ugly
struct Value : Expr {
– Intrusive
void accept(Visitor& v)
– Specific to each hierarchy
{ v.visit(*this); }
 Lots of boilerplate
};
– Hinders extensibility of classes
OOPSLA'12: An Open and Efficient Type Switch for C++
6
// ...
Functional Programming Notation
OCaml
let rec eval e =
match e with
Value v
| Plus (a,b)
| Minus (a,b)
| Times (a,b)
| Divide(a,b)
;;
->
->
->
->
->
v
(eval
(eval
(eval
(eval
a)
a)
a)
a)
+
*
/
(eval
(eval
(eval
(eval
b)
b)
b)
b)
Other functional languages have roughly similar syntax
OOPSLA'12: An Open and Efficient Type Switch for C++
7
Experimental C++ Notation
C++ with type switch library (Mach7)
int eval(Expr& e) {
Match(e)
Case(Value& x)
Case(Plus&
x)
Case(Minus& x)
Case(Times& x)
Case(Divide& x)
EndMatch
}
return
return
return
return
return
x.value;
eval(x.e1)
eval(x.e1)
eval(x.e1)
eval(x.e1)
+
*
/
eval(x.e2);
eval(x.e2);
eval(x.e2);
eval(x.e2);
Logically equivalent to functional programming notation
We could improve the syntax if we modified a compiler
OOPSLA'12: An Open and Efficient Type Switch for C++
8
Type Switch Design

Pro
– Easy to teach and use


Non-intrusive
No control inversion
– Extensible

Functions and classes
– Fast
– General

Not specific to hierarchy
– Library solution


We use several industrial compilers for experimentation and
measurement
Con
– Slower on the 1st call for each dynamic type passed
OOPSLA'12: An Open and Efficient Type Switch for C++
9
Make Type Switching Fast
140
120
Fast Dynamic Cast
80
60
Cycles
100
40
20
0
Case:0
10
20
30
40
50
60
70
OOPSLA'12: An Open and Efficient Type Switch for C++
Cohen’s Algorithm
Binary Matrix
Visitors
Open Type Switch
C Switch on integers
10
Key Features of Our Solution

Memoization device
o
o

Maps types to execution paths
Uses dynamic types of objects
Hash of dynamic type
o
A pointer to a virtual function table (v-table) identifies


o
Type of object
Sub-object offset
High-quality hashing needed for performance


Experiments to achieve perfect and/or compact hashing
We use the structure of v-table pointers
OOPSLA'12: An Open and Efficient Type Switch for C++
11
Open but Inefficient Solution
switch (object)
{
case Type1: action1;
case Type2: action2;
...
case Typen: actionn;
}
if (Type1* match=dynamic_cast<Type1*>(object)) { action1; } else
if (Type2* match=dynamic_cast<Type2*>(object)) { action2; } else
...
if (Typen* match=dynamic_cast<Typen*>(object)) { actionn; }
OOPSLA'12: An Open and Efficient Type Switch for C++
12
Examine a Type Only Once

Hypothetical Statement
Execute the 1st statement si
whose predicate Pi is true
switch (x) {
case P1(x): s1;
...
case Pn(x): sn;
}

Memoize
Clause of the 1st successful
predicate

Assume

Generated Code
typedef decltype(x) T;
static hash_map<T,size_t> labels;
switch (size_t& l = labels[x])
{
default: // we have not seen x yet
if (P1(x)) { l = 1; case 1: s1; }
else
...
if (Pn(x)) { l = n; case n: sn; }
else
l = n+1;
case n+1: // none is true on x
}
Functional behavior
OOPSLA'12: An Open and Efficient Type Switch for C++
13
Intermediate Solution

We use:
– source sub-object as a hash
map key
– Pi(x) ≡ dynamic_cast<Typei*>(x)!=0

We map it to:
– offset to target sub-object
– jump target

On first entry
– memoize this-pointer offset
– memoize jump target

On subsequent entries
– jump to target
– adjust this-pointer
struct info
{
ptrdiff_t offset;
size_t
target;
};
// -------- Case(Typei* t) -------if (auto t=dynamic_cast<Typei*>(x))
{
n.offset = int(t)-int(x);
n.target = i;
case i:
auto match =
adjust_ptr<Typei>(x,n.offset);
actioni;
}
OOPSLA'12: An Open and Efficient Type Switch for C++
14
Structure of V-Table Pointers
𝑉 = 𝑣1 , … , 𝑣𝑛 is a set of v-table pointers
passed through a given Match-statement
V=
00000001100110010011000111011100
00000001100110001111111000110100
00000001100110010010111111111100
00000001100110010011000001010100
00000001100110010011000101110100
00000001100110010010111110100100
00000001100110010011000100010100
000000011001100XXX1XXXXXXXXXX100
k
k
For each such set 𝑉 we:

Use cache of 2𝑘 entries
addressed by:
n
𝐻𝑘𝑙 𝑣 =
𝑣
mod 2𝑘
𝑙
2
ll
We choose 𝑘 and 𝑙 to minimize the number of conflicts in cache
OOPSLA'12: An Open and Efficient Type Switch for C++
15
Performance Evaluation
Open
Visual
C++
Win
14%
1%
12% 48%
0%
9%
22%
8%
233% 135%
25%
3%
Forwarding
G++
REP
SEQ
RND
REP
SEQ
RND
Lnx
16%
56%
56%
33%
55%
78%
Closed
Visual
C++
Win
122% 100%
467% 29%
470% 35%
49% 24%
290% 48%
33%
8%
G++
Lnx
124%
640%
603%
53%
86%
88%
Each number represents a median of
101 experiments timing 1,000,000
dispatches each
9% - visitors faster by
56% - type switch faster by
250%
200%
150%
G++/Lnx
100%
G++/Win
50%
Visual C++
0%
-50%
Case analysis on leaf classes
REP
OOPSLA'12: An Open and Efficient Type Switch for C++
SEQ
RND
16
Performance Evaluation
Open
Visual
C++
Win
14%
1%
12% 48%
0%
9%
22%
8%
233% 135%
25%
3%
Forwarding
G++
REP
SEQ
RND
REP
SEQ
RND
Lnx
16%
56%
56%
33%
55%
78%
Closed
Visual
C++
Win
122% 100%
467% 29%
470% 35%
49% 24%
290% 48%
33%
8%
G++
Lnx
124%
640%
603%
53%
86%
88%
Each number represents a median of
101 experiments timing 1,000,000
dispatches each
9% - visitors faster by
56% - type switch faster by
250%
200%
150%
G++/Lnx
100%
G++/Win
50%
Visual C++
0%
-50%
Case analysis on base classes
REP
OOPSLA'12: An Open and Efficient Type Switch for C++
SEQ
RND
17
Performance Evaluation
Our library code is approximately as fast as natively compiled
functional code. We will squeeze even more with a language solution.
0.025
0.015
0.010
Seconds
0.020
0.005
Language/Encoding
0.000
OCaml
Haskell
C++/Kind C++/Closed C++/Open
Note: This is not a detailed language comparison, but we compare well
with “the gold standard” for pattern matching
OOPSLA'12: An Open and Efficient Type Switch for C++
18
Real World Class Hierarchies
LIB
DG2
DG3
ET+
GEO
JAV
LOV
NXT
SLF
UNI
VA2
VA2
VW1
VW2
LANGUAGE CLASSES PATHS HEIGHT ROOTS LEAFS BOTH
SMALLTALK
SMALLTALK
C++
EIFFEL
JAVA
EIFFEL
OBJECTIVE-C
SELF
C++
SMALLTALK
SMALLTALK
SMALLTALK
SMALLTALK
OVERALLS
534
534
1356 1356
370
370
1318 13798
604
792
436 1846
310
310
1801 36420
613
633
3241 3241
2320 2320
387
387
1956 1956
15246 63963
11
13
8
14
10
10
7
17
9
14
13
9
15
17
2
381
2
923
87
289
1
732
1
445
1
218
2
246
51 1134
147
481
1 2582
1 1868
1
246
1 1332
298 10877
PARENTS CHILDREN
AVG MAX AVG MAX
11
11
79 1
0 1.89
0 1.08
0 1.72
11
0 1.05
117 1.02
01
01
01
01
199 1.11
1
1
1
16
3
10
1
9
2
1
1
1
1
16
3.48
3.13
3.49
4.75
4.64
3.55
4.81
2.76
3.61
4.92
5.13
2.74
3.13
3.89
59
142
51
323
210
78
142
232
39
249
240
87
181
323
% sub-hierarchies
1% 3% 5% 10% 20% 25% 50% 64% 100%
with more than … classes 700 110 50 20 10
7
3
2
1
OOPSLA'12: An Open and Efficient Type Switch for C++
19
Efficiency of Hashing
m= 1
0.40
2
3
4 5 6 7 ...
4369 type switches
87.5% of them rendered a hash
function with no conflicts
0.35
Probability of Conflict p
0.30
0.25
0.20
0.15
0.10
0.05
m=0
0.00
1
2
4
8
16
32
64
128
Number of sub-objects n
256
512
1024
2048
Conflicts
0
1
2
3
4
5
6
>6
Type Switches 87.50% 5.58% 2.63% 0.87% 0.69% 0.69% 0.30% 1.76%
OOPSLA'12: An Open and Efficient Type Switch for C++
20
Related Work






N.Wirth. Type extensions. 1988
W. R. Cook. Object-oriented
programming versus abstract data
types. 1991
P.Wadler. The expression problem.
1998
N. Glew. Type dispatch for named
hierarchical types. 1999
M. Zenger, M. Odersky.
Independently extensible
solutions to the expression
problem. 2005
M.Homer, J.Noble, K.Bruce,
A.Black, D.Pearce. Patterns as
Objects in Grace. 2012






N. H. Cohen. Type-extension type
test can be performed in constant
time. TOPLAS 1991
Y. Caseau. Efficient handling of
multiple inheritance hierarchies.
OOPSLA 1993
J. Vitek, R. N. Horspool, A. Krall.
Efficient type inclusion tests.
OOPSLA 1997
Y. Zibin, J. Y. Gil. Efficient
subtyping tests with PQ-encoding.
OOPSLA 2001
M. Gibbs, B. Stroustrup. Fast
dynamic casting. SPE 2006
R. Ducournau. Perfect hashing as
an almost perfect subtype test.
TOPLAS 2008
OOPSLA'12: An Open and Efficient Type Switch for C++
21
THANKS!
http://parasol.tamu.edu/mach7/
Mach7
 Matches the gold standard for notation
 Matches the gold standard for performance
 Handles both open and closed cases
Special Thanks To:
Question
s









Xavier Leroy
Luc Maranget
Gregory Berkolaiko
Suhasini Subba Rao
Jaakko Järvi
Peter Pirkelbauer
Andrew Sutton
Abe Skolnik
Karel Driesen
OOPSLA'12: An Open and Efficient Type Switch for C++
22
Summary of Contributions

Technique for type switching
On extensible hierarchical data types
o Open by construction
o Full support of general multiple inheritance of C++
o

Efficiency
Similar to pattern matching
o Close or better than visitors
o Outperforms existing open approaches
o

Library implementation
Notational convenience of pattern matching
o No changes to the C++ object model
o No computations or code generation at link or load time
o

Unique partitioning of objects based on sub-objects
o
Suitable for other optimizations
OOPSLA'12: An Open and Efficient Type Switch for C++
23
Expression Problem
exp ::= val | exp + exp | exp - exp | exp * exp | exp / exp
Functional Languages
Object-Oriented Languages
type expr =
class
class
class
class
class
class
|
|
|
|
Value
Plus
Minus
Times
Divide
let rec eval e =
match e with
Value v
->
| Plus (a,b) ->
| Minus (a,b) ->
| Times (a,b) ->
| Divide(a,b) ->
;;
of
of
of
of
of
v
(eval
(eval
(eval
(eval
int
expr
expr
expr
expr
a)
a)
a)
a)
*
*
*
*
+
*
/
expr
expr
expr
expr ;;
(eval
(eval
(eval
(eval
Easy to add new functions
Adding new variants is intrusive
b)
b)
b)
b)
int
int
int
int
int
Expr {
Value :
Plus :
Minus :
Times :
Divide:
};
virtual int eval(); };
Expr { int value; };
Expr { Expr &e1, &e2; };
Expr { Expr &e1, &e2; };
Expr { Expr &e1, &e2; };
Expr { Expr &e1, &e2; };
Value::eval() {
Plus ::eval() {
Minus::eval() {
Times::eval() {
Divide::eval(){
return
return
return
return
return
value; }
e1.eval()+e2.eval();
e1.eval()-e2.eval();
e1.eval()*e2.eval();
e1.eval()/e2.eval();
Easy to add new variants
Adding new functions is intrusive
OOPSLA'12: An Open and Efficient Type Switch for C++
24
}
}
}
}
Problem of Type Switching in C++

Classes are:
– Extensible


Important: Separate compilation
Important: Dynamic linking
– Hierarchical




Multiple Inheritance
Up-, down- and cross-casts
Cast is not a no-op
Ambiguities
Existing approaches
– Closed world: jump tables

Unrealistic for modern C++ use
– Open world: constant-time
subtype tests + decision trees



Most are not suitable for repeated
multiple inheritance
Most require computations or runtime code generation at load time
Time increases with case number
140
120
100
80
60
Fast Dynamic Cast
Cohen's Algorithm
Binary Matrix
Visitors
Switch
Cycles

40
20
0
Case 0
10
20
30
40
OOPSLA'12: An Open and Efficient Type Switch for C++
50
60
70
80
90
25
Uniqueness of V-Table Pointers
OOPSLA'12: An Open and Efficient Type Switch for C++
26
V-Table Pointers Facts

Are unique per same static type only
o

Can be many for same sub-object
o

can be shared with primary base class
e.g. numerous copies of the same v-table in DLLs
May change during [de]construction
affects outcome of a type switch in constructors and
destructors
o is in line with C++ semantics for virtual function calls
o

Are at fixed offset within the dynamic type
o
o
we can memoize offsets obtained on one instance
and reapply them to another instance


of the same dynamic type
from the same sub-object
OOPSLA'12: An Open and Efficient Type Switch for C++
27
Visitors Comparison
Open
MS Visual C++
PGO
w/o PGO
G++
Lnx Win
Forwarding
x86-32 x86-32 x86-32 x86-64 x86-32 x86-64
REP
SEQ
RND
REP
SEQ
RND
16%
56%
56%
33%
55%
78%
Closed desktop with
Lnx: Dell Dimension®
 Intel®
D (Dual
Core)
CPU at
G++ Pentium® MS
Visual
C++
2.80 GHz; 1GB of RAM; Fedora Core 13
Lnx
PGOwith -O2;
w/o
 G++Win
4.4.5 executed
x86PGO
binaries
x86-32
x86-32VAIO®
x86-32 laptop
x86-64 with
x86-32 x86-64
Win: Sony
 Intel®
Core™i5
460M
CPU at
2.53 GHz;
14%
1% 18%
2% 37% 124%
122%
100%
41%
76%
37%
6GB of RAM; Windows 7 Pro.
12% 48% 22%
2% 46% 640%
29% executed
15% 30%
10%
 G++467%
4.6.1 / MinGW
with -O2;
x86
binaries
0%
9% 19%
5% 46% 603%
470% 35% 20% 32%
6%
 MS Visual C++ 2010 Professional x86/x64
22%
8% 17% 24% 36% 53%
49%
11%Profile-Guided
20% 36%
binaries
with 24%
and without
Optimizations
233% 135% 135% 193% 32% 86%
290% 48% 139% 12% 24%
25%
3%
4% 13% 23% 88% 33%
8%
1% 18% 16%
70
Visitors
Open Type Switch
30
20
Cycles per Iteration
Each number represents 60
a median of 101 experiments timing 1,000,000 dispatches each
18% - percentage visitors
50 are faster by
14% - percentage type40switch is faster by
10
0
OOPSLA'12:
Benchmark
REPAn
Open and Efficient
C++
SEQ Type Switch for
RND
28
% of hierarchies with that number
Sizes of Class Hierarchies
100.00%
50.00%
25.00%
12.50%
6.25%
3.13%
1.56%
0.78%
0.39%
0.20%
0.10%
0.05%
0.02%
0.01%
1
4
16
64
256
Classes in Hierarchy
OOPSLA'12: An Open and Efficient Type Switch for C++
1024
4096
29
Minimization of Conflicts
OOPSLA'12: An Open and Efficient Type Switch for C++
30
Effect of Conflicts Minimization
OOPSLA'12: An Open and Efficient Type Switch for C++
31
Download