Section 27.4.1 Composing Data Structures 771 The layout of a node

advertisement
Section 27.4.1
Composing Data Structures
771
The layout of a node is compact, and we can easily inline all performance-critical functions. What
we achieved by the slightly elaborate set of definitions was type safety and ease of
composition.
This elaboration delivers a performance advantage compared to every approach that introduces
a
void into the data structure or function interfaces. Such a use of void disables valuable type-based
optimization techniques. Choosing a low-level (C-style) programming technique in the key parts of
a balanced binary tree implementation implies a significant run-time cost.
We passed the balancer as a separate template argument:
template<typename N, typename Balance>
struct Node_base : Balance {
// ...
};
template<typename Val, typename Balance>
struct Search_node
: public Node_base<Search_node<Val, Balance>, Balance>
{
// ...
};
Some find this clear, explicit, and general; others find it verbose and confusing.
The
alternative is
struct Node_base
: N::balance_type
{ argument
// use N's
to make
the balancer
an implicit
in balance_type
the form of an associated type (a member
// ...
Val, typename Balance>
typetemplate<typename
of
};
struct Search_node
Search_node
):
: public Node_base<Search_node<Val,Balance>>
template<typename
N>
{
using balance_type = Balance;
// ...
};
This technique is heavily used in the standard library to minimize explicit template arguments.
The technique of deriving from a base class is very old. It was mentioned in the
ARM (1989)
and is sometimes referred to as the Barton-Nackman trick after an early use in mathematical
software [Barton,1994].
Jim Coplien called it the curiously recurring template pattern
(CRTP)
[Coplien,1995].
27.4.2 Linearizing Class Hierarchies
The Search_node example from §27.4.1 uses its template to compress its representation and to
avoid using void. The techniques are general and very useful. In particular, many programs
that
deal with trees rely on it for type safety and performance. For example, the ''Internal
Program
Representation'' (IPR) [DosReis,2011] is a general and systematic representation of C++ code
as
772
Templates and Hierarchies
Chapter 27
typed abstract syntax trees. It uses template parameters as base classes extensively, both as an
implementation aid (implementation inheritance) and to provide abstract interfaces in the classical
object-oriented way (interface inheritance). The design addresses a difficult set of criteria, including compactness of nodes (there can be many millions of nodes), optimized memory management,
access speed (don't introduce unnecessary indirections or nodes), type safety, polymorphic interfaces, and generality.
The users see a hierarchy of abstract classes providing perfect encapsulation and clean functional interfaces representing the semantics of a program. For example, a variable is a declaration,
which is a statement, which is an expression, which is a node:
Var > Decl > Stmt > Expr > Node
Clearly, some generalization has been done in the design of the IPR because in ISO C++ statements
cannot be used as expressions.
In addition, there is a parallel hierarchy of concrete classes providing compact and efficient
implementations of the classes in the interface hierarchy:
impl::Var > impl::Decl > impl::Stmt > impl::Expr > impl::Node
In all, there are about 80 leaf classes (such as Var, If_stmt, and Multiply) and about 20 generalizations
(such as Decl, Unary, and impl::Stmt).
The first attempt of a design was a classical multiple-inheritance ''diamond'' hierarchy (using
solid arrows to represent interface inheritance and dotted arrows for implementation inheritance):
Node
impl::Node
Expr
impl::Expr
Stmt
impl::Stmt
Decl
impl::Decl
Var
impl::Var
That worked but led to excessive memory overhead: the nodes were too large because of data
needed to navigate the virtual bases. In addition, programs were seriously slowed down by the
many indirections to access the many virtual bases in each object (§21.3.5).
The solution was to linearize the dual hierarchy so that no virtual bases were used:
Section 27.4.2
Linearizing Class Hierarchies
773
Node
impl::Node
Expr
impl::Expr
Stmt
impl::Stmt
Decl
impl::Decl
Var
impl::Var
For the full set of classes the chain of derivation becomes:
impl::Var >
impl::Decl<impl::Var> >
impl::Stmt<impl::Var> >
impl::Expr<impl::Var> >
impl::Node<impl::Var> >
ipr::Var >
ipr::Decl >
ipr::Stmt >
ipr::Expr >
ipr::Node
This is represented as a compact object with no internal ''management data'' except the single vptr
(§3.2.3, §20.3.2).
I will show how that is done. The interface hierarchy, defined in namespace ipr is described
first. Starting from the bottom, a Node holds data used to optimize traversal and Node type identification (the code_category) and to ease storage of IPR graphs in files (the node_id). These are fairly
typical ''implementation details'' hidden from the users. What a user will know is that every node
in an IPR graph has a unique base of type Node and that this can be used to implement operations
using the visitor pattern [Gamma,1994] (§22.3):
struct ipr::Node {
const int node_id;
const Category_code category;
virtual void accept(Visitor&) const = 0;
protected:
Node(Category_code);
};
Node
// hook for visitor classes
is meant to be used as a base class only, so its constructor is protected.
tual function, so it cannot be instantiated except as a base class.
It also has a pure vir-
774
Templates and Hierarchies
Chapter 27
An expression (Expr) is a Node that has a type:
struct ipr::Expr : Node {
virtual const Type& type() const = 0;
protected:
Expr(Category_code c) : Node(c) { }
};
Obviously, this is quite a generalization of C++ because it implies that even statements and
types
have types: it is an aim of the IPR to represent all of C++ without implementing all of C++'s irregularities and limitations.
A statement (Stmt) is an Expr that has a source file location and can be annotated with
various
information:
struct ipr::Stmt : Expr {
virtual const Unit_location& unit_location() const = 0;
// line in file
virtual const Source_location& source_location() const = 0; // file
virtual const Sequence<Annotation>& annotation() const = 0;
protected:
Stmt(Category_code c) : Expr(c) { }
};
A declaration (Decl) is a Stmt that introduces a name:
struct ipr::Decl : Stmt {
enum Specifier { /* storage class, virtual, access control, etc. */ };
virtual Specifier specifiers() const = 0;
virtual const Linkage& lang_linkage() const = 0;
virtual const Name& name() const = 0;
virtual const Region& home_region() const = 0;
virtual const Region& lexical_region() const = 0;
virtual bool has_initializer() const = 0;
virtual const Expr& initializer() const = 0;
// ...
protected:
Decl(Category_code c) : Stmt(c) { }
};
As you might expect,
Decl
is one of the central notions when it comes to representing C++
code.
This is where you find scope information, storage classes, access specifiers, initializers, etc.
Finally, we can define a class to represent a variable (Var) as a leaf class (most derived class) of
our interface hierarchy:
Section 27.4.2
Linearizing Class Hierarchies
775
struct ipr::Var : Category<var_cat, Decl> {
};
Basically, Category is a notational aid with the effect of deriving Var from Decl and giving the Category_code used to optimize Node type identification:
template<Category_code Cat, typename T = Expr>
struct Category : T {
protected:
Category() : T(Cat) { }
};
Every data member is a Var. That includes global, namespace, local, and class static variables and
constants.
Compared to representations you find in compilers, this interface is tiny. Except for some data
for optimizations in Node, this is just a set of classes with pure virtual functions. Note that it
is a
single hierarchy with no virtual base classes. It is a straightforward object-oriented design. However, implementing this simply, efficiently, and maintainably is not easy, and IPR's solution is certainly not what an experienced object-oriented designer would first think of.
For each IPR interface class (in ipr), there is a corresponding implementation class (in impl).
For example:
template<typename T>
struct impl::Node : T {
using Interface = T; // make the template argument type available to users
void accept(ipr::Visitor& v) const override { v.visit(this); }
};
The ''trick'' is to establish the correspondence between the ipr nodes and the impl nodes. In particular, the impl nodes must provide the necessary data members and override the abstract virtual functions in the ipr nodes. For impl::Node, we can see that if T is an ipr::Node or any class derived from
ipr::Node, then the accept() function is properly overridden.
Now, we can proceed to provide implementation classes for the rest of the ipr interface classes:
template<typename Interface>
ipr::Type
constraint;
structconst
impl::Expr
: impl::Node<Interface>
{
// constraint is the type of the expression
Expr() : constraint(0) { }
const ipr::Type& type() const override { returnutil::check(constraint); }
};
If the
Interface
argument is an
ipr::Expr
or any class derived from
ipr::Expr,
then
impl::Expr
is an
implementation for ipr::Expr. We can make sure of that. Since ipr::Expr is derived from
ipr::Node,
this implies that impl::Node gets the ipr::Node base class that it needs.
In other words, we have managed to provide implementations for two (different) interface
classes. We can proceed in this manner:
Download