Section 27.4.1 Composing Data Structures 771 The layout of a node is compact, and we can easily inline all performance-critical functions. What we achieved by the slightly elaborate set of definitions was type safety and ease of composition. This elaboration delivers a performance advantage compared to every approach that introduces a void into the data structure or function interfaces. Such a use of void disables valuable type-based optimization techniques. Choosing a low-level (C-style) programming technique in the key parts of a balanced binary tree implementation implies a significant run-time cost. We passed the balancer as a separate template argument: template<typename N, typename Balance> struct Node_base : Balance { // ... }; template<typename Val, typename Balance> struct Search_node : public Node_base<Search_node<Val, Balance>, Balance> { // ... }; Some find this clear, explicit, and general; others find it verbose and confusing. The alternative is to make the balancer an implicit argument in the form of an associated type (a member type of Search_node): template<typename N> struct Node_base : N::balance_type { // ... }; // use N’s balance_type template<typename Val, typename Balance> struct Search_node : public Node_base<Search_node<Val,Balance>> { using balance_type = Balance; // ... }; This technique is heavily used in the standard library to minimize explicit template arguments. The technique of deriving from a base class is very old. It was mentioned in the ARM (1989) and is sometimes referred to as the Barton-Nackman trick after an early use in mathematical software [Barton,1994]. Jim Coplien called it the curiously recurring template pattern (CRTP) [Coplien,1995]. 27.4.2 Linearizing Class Hierarchies The Search_node example from §27.4.1 uses its template to compress its representation and to avoid using void. The techniques are general and very useful. In particular, many programs that deal with trees rely on it for type safety and performance. For example, the ‘‘Internal Program Representation’’ (IPR) [DosReis,2011] is a general and systematic representation of C++ code as 772 Templates and Hierarchies Chapter 27 typed abstract syntax trees. It uses template parameters as base classes extensively, both as an implementation aid (implementation inheritance) and to provide abstract interfaces in the classical object-oriented way (interface inheritance). The design addresses a difficult set of criteria, including compactness of nodes (there can be many millions of nodes), optimized memory management, access speed (don’t introduce unnecessary indirections or nodes), type safety, polymorphic interfaces, and generality. The users see a hierarchy of abstract classes providing perfect encapsulation and clean functional interfaces representing the semantics of a program. For example, a variable is a declaration, which is a statement, which is an expression, which is a node: Var −> Decl −> Stmt −> Expr −> Node Clearly, some generalization has been done in the design of the IPR because in ISO C++ statements cannot be used as expressions. In addition, there is a parallel hierarchy of concrete classes providing compact and efficient implementations of the classes in the interface hierarchy: impl::Var −> impl::Decl −> impl::Stmt −> impl::Expr −> impl::Node In all, there are about 80 leaf classes (such as Var, If_stmt, and Multiply) and about 20 generalizations (such as Decl, Unary, and impl::Stmt). The first attempt of a design was a classical multiple-inheritance ‘‘diamond’’ hierarchy (using solid arrows to represent interface inheritance and dotted arrows for implementation inheritance): Node impl::Node Expr impl::Expr Stmt impl::Stmt Decl impl::Decl Var impl::Var That worked but led to excessive memory overhead: the nodes were too large because of data needed to navigate the virtual bases. In addition, programs were seriously slowed down by the many indirections to access the many virtual bases in each object (§21.3.5). The solution was to linearize the dual hierarchy so that no virtual bases were used: Section 27.4.2 Linearizing Class Hierarchies 773 Node impl::Node Expr impl::Expr Stmt impl::Stmt Decl impl::Decl Var impl::Var For the full set of classes the chain of derivation becomes: impl::Var −> impl::Decl<impl::Var> −> impl::Stmt<impl::Var> −> impl::Expr<impl::Var> −> impl::Node<impl::Var> −> ipr::Var −> ipr::Decl −> ipr::Stmt −> ipr::Expr −> ipr::Node This is represented as a compact object with no internal ‘‘management data’’ except the single vptr (§3.2.3, §20.3.2). I will show how that is done. The interface hierarchy, defined in namespace ipr is described first. Starting from the bottom, a Node holds data used to optimize traversal and Node type identification (the code_category) and to ease storage of IPR graphs in files (the node_id). These are fairly typical ‘‘implementation details’’ hidden from the users. What a user will know is that every node in an IPR graph has a unique base of type Node and that this can be used to implement operations using the visitor pattern [Gamma,1994] (§22.3): struct ipr::Node { const int node_id; const Category_code category; virtual void accept(Visitor&) const = 0; // hook for visitor classes protected: Node(Category_code); }; is meant to be used as a base class only, so its constructor is protected. It also has a pure virtual function, so it cannot be instantiated except as a base class. Node 774 Templates and Hierarchies Chapter 27 An expression (Expr) is a Node that has a type: struct ipr::Expr : Node { virtual const Type& type() const = 0; protected: Expr(Category_code c) : Node(c) { } }; Obviously, this is quite a generalization of C++ because it implies that even statements and types have types: it is an aim of the IPR to represent all of C++ without implementing all of C++’s irregularities and limitations. A statement (Stmt) is an Expr that has a source file location and can be annotated with various information: struct ipr::Stmt : Expr { virtual const Unit_location& unit_location() const = 0; // line in file virtual const Source_location& source_location() const = 0; // file virtual const Sequence<Annotation>& annotation() const = 0; protected: Stmt(Category_code c) : Expr(c) { } }; A declaration (Decl) is a Stmt that introduces a name: struct ipr::Decl : Stmt { enum Specifier { /* storage class, virtual, access control, etc. */ }; virtual Specifier specifiers() const = 0; virtual const Linkage& lang_linkage() const = 0; virtual const Name& name() const = 0; virtual const Region& home_region() const = 0; virtual const Region& lexical_region() const = 0; virtual bool has_initializer() const = 0; virtual const Expr& initializer() const = 0; // ... protected: Decl(Category_code c) : Stmt(c) { } }; As you might expect, Decl is one of the central notions when it comes to representing C++ code. This is where you find scope information, storage classes, access specifiers, initializers, etc. Finally, we can define a class to represent a variable (Var) as a leaf class (most derived class) of our interface hierarchy: Section 27.4.2 Linearizing Class Hierarchies 775 struct ipr::Var : Category<var_cat, Decl> { }; Basically, Category is a notational aid with the effect of deriving Var from Decl and giving the Category_code used to optimize Node type identification: template<Category_code Cat, typename T = Expr> struct Category : T { protected: Category() : T(Cat) { } }; Every data member is a Var. That includes global, namespace, local, and class static variables and constants. Compared to representations you find in compilers, this interface is tiny. Except for some data for optimizations in Node, this is just a set of classes with pure virtual functions. Note that it is a single hierarchy with no virtual base classes. It is a straightforward object-oriented design. However, implementing this simply, efficiently, and maintainably is not easy, and IPR’s solution is certainly not what an experienced object-oriented designer would first think of. For each IPR interface class (in ipr), there is a corresponding implementation class (in impl). For example: template<typename T> struct impl::Node : T { using Interface = T; // make the template argument type available to users void accept(ipr::Visitor& v) const override { v.visit(this); } }; The ‘‘trick’’ is to establish the correspondence between the ipr nodes and the impl nodes. In particular, the impl nodes must provide the necessary data members and override the abstract virtual functions in the ipr nodes. For impl::Node, we can see that if T is an ipr::Node or any class derived from ipr::Node, then the accept() function is properly overridden. Now, we can proceed to provide implementation classes for the rest of the ipr interface classes: template<typename Interface> struct impl::Expr : impl::Node<Interface> { const ipr::Type constraint; // constraint is the type of the expression Expr() : constraint(0) { } const ipr::Type& type() const override { returnutil::check(constraint); } }; If the Interface argument is an ipr::Expr or any class derived from ipr::Expr, then impl::Expr is an implementation for ipr::Expr. We can make sure of that. Since ipr::Expr is derived from ipr::Node, this implies that impl::Node gets the ipr::Node base class that it needs. In other words, we have managed to provide implementations for two (different) interface classes. We can proceed in this manner: