2.3. Template Programming Used to Write POOMA

The preceding section presented template programming tools needed to read this book and write programs using the POOMA Toolkit. In this section, we present template programming techniques used to implement POOMA. We extend the correspondence between compile-time template programming constructs and run-time constructs started in the previous section. Reading this section is not necessary unless you wish to understand how POOMA is implemented.

In the previous section, we used a correspondence between run-time and compile-time programming constructs to introduce template programming concepts, which occur at compile time. See Table 2-1. In implementing POOMA, more constructs are used. We list these in Table 2-2.

Table 2-2. More Correspondences Between Run-Time and Compile-Time Constructs

programming constructrun timecompile time
valuesintegers, strings, objects, functions, …types, constant integers and enumerations, pointers and references to objects and functions, executable code, …
operations on valuesIntegral values support +, -, >, ==, …. String values support [], ==, ….Types may be declared and used. Constant integral and enumeration values can be combined using +, -, >, ==, …. There are no permitted operations on code.
values stored in a collectionAn object stores values.A traits class contains values describing a type.
extracting values from collectionsAn object's named values are extracted using the . operator. A class's nested types and classes are extracted using the :: operator.
control flow to choose among operationsif, while, goto, …template class specializations with pattern matching

The only compile-time values described in the previous section were types, but any compile-time constant can also be used. Integral literals, const variables, and other constructs can be used, but the main use is enumerations. An enumeration is a distinct integral type with named constants. For example, the Array declaration declares two separate enumerations:


template<int Dim, class T, class EngineTag>
class Array
{
public:
  typedef Engine<Dim, T, EngineTag> Engine_t;
  enum { dimensions = Engine_t::dimensions };
  enum { rank = Engine_t::dimensions };
…
The first enumeration declares the constant dimensions to be equal to the value of the dimensions within the Array's Engine. The second enumeration declares the constant rank to have the same value. Semantically, both indicate the dimensionality of the array's domain. Enumeration constants have integral values so they may be used wherever integers can be used. For example,

enum { dimensionPlusRank = dimensions + rank };
could be added to the Array declaration. Declaring an enumeration is a compile-time construct analogous to assigning an integral value to a variable at run time. Note that an enumerated constant's value cannot be changed.

Enumerations are frequently used in template programming because

The use of non-integral constant values such as floating-point numbers at compile time is restricted.

Other compile-time values include pointers to objects and functions, references to objects and functions, and executable code. For example, a pointer to a function sometimes is passed to a template function to perform a specific task. Even though executable code cannot be directly represented in a program, it is a compile-time value which the compiler uses. A simple example is a class that is created by template instantiation, e.g., pair<int>. Conceptually, the int template argument is substituted throughout the pair template class to produce a class definition. Although neither the programmer nor the user sees this class definition, it is represented inside the compiler, which can use and manipulate the code.

Through template programming, the compiler's optimizer can transform complicated code into much simpler code. In Section 7.3, we describe the complicated template code used to implement efficiently data-parallel operations. Although the template code is complicated, the compiler optimization frequently greatly simplifies it to yield simple, fast loops. We illustrate this with a simple template class:


template <bool complicatedCase>
struct usuallySimpleClass {
  usuallySimpleClass() {
    if (complicatedCase)
      i = do_some_very_complicated_computation();
    else
      i = 0;
  }
  int i;
};
The usuallySimpleClass has one boolean template parameter complicatedCase, which should be true only if the constructor must perform some very complicated, time-expensive computation. When instantiated with false, the compiler substitutes this value into the template class definition. Since the if statement's conditional is false, the compiler optimizer can eliminate the statement, yielding internal code similar to

struct usuallySimpleClass<false> {
  usuallySimpleClass() {
    i = 0;
  }
  int i;
};
The optimizer might further simplify the code by inlining the constructor's assignment. Because the resulting code is never displayed, the programmer does not know how simplified it is without investigating the resulting assembly code. C++ compilers that translate C++ code into C code may permit inspecting the resulting code. For example, using the - -keep_gen_c command-line option with the KAI C++ compiler creates a file containing the intermediate code. Unfortunately, reading and understanding the code is frequently difficult.

Each category of values supports a distinct set of operations. For example, the run-time category of integer values supports combination using + and - and comparison using > and ==. At run time, the category of strings can be compared using == and characters can be extracted using subscripts with the [] operator. Compile-time operations are more limited. Types may be declared and used. The sizeof operator yields the number of bytes to represent an object of the specified type. Enumerations, constant integers, sizeof expressions, and simple arithmetic and comparison operators such as + and == can form constant expressions that can be used at compile time. These values can initialize enumerations and integer constants and be used as template arguments. At compile time, pointers and references to objects and functions can be used as template arguments, while the category of executable code supports no operations. (The compiler's optimizer may simplify it, though.)

At run time, an object can store multiple values, each having its own name. For example, a pair<int> object p stores two ints named left_ and right_. The . operator extracts a named member from an object: p.left_. At compile time, a class can store multiple values, each having its own name. These are sometimes called traits classes. For example, implementing data-parallel operations requiring storing a tree of types. The ExpressionTraits<BinaryNode<Op, Left, Right>  > traits class stores the types of a binary node representing the operation of Op on left and right children. Its definition


template<class Op, class Left, class Right>
struct ExpressionTraits<BinaryNode<Op, Left, Right>  >
{
  typedef typename ExpressionTraits<Left>::Type_t  Left_t;
  typedef typename ExpressionTraits<Right>::Type_t Right_t;
  typedef typename
    CombineExpressionTraits<Left_t, Right_t>::Type_t Type_t;
};
consists of a class definition and internal type definitions. This traits class contains three values, all types and named Left_t, Right_t, and Type_t, representing the type of the left child, the right child, and the entire node, respectively. Many traits classes, such as this one, use internal type definitions to store values. No enumerations or constant values occur in this traits class, but other such classes include them. See Section 7.3 for more details regarding the implementation of data-parallel operators.

The example also illustrates using the :: operator to extract a member of a traits class. The type ExpressionTraits<Left> contains an internal type definition of Type_t. Using the :: operator extracts it: ExpressionTraits<Left>::Type_t. Enumerations and other values can also be extracted. For example, Array<2, int, Brick>::dimensions yields the dimension of the array's domain.

Control flow determines which code is used. At run time, control-flow statements such as if, while, and goto determine which statements to execute. Template programming uses two mechanisms: template class specializations and pattern matching. These are similar to control flow in functional programming languages. A template class specialization is a class definition specific to one or more template arguments. For example, the implementation for data-parallel operations uses the templated CreateLeaf. The default definition works for any template argument T:


template<class T>
struct CreateLeaf
{
  typedef Scalar<T> Leaf_t;
  …
};
The code is different for Expression specializations:

template<class T>
struct CreateLeaf<Expression<T>  >
{
  typedef typename Expression<T>::Expression_t Leaf_t;
  …
};
The latter code is only used when CreateLeaf's template argument is an Expression type.

Pattern matching of template arguments to template parameters determines which template code is used. The code associated with the match that is most specific is the one that is used. For example, CreateLeaf<int> uses the first, more general template class definition because the int template argument does not match Expression<T> for any value of T. On the other hand, CreateLeaf<Expression<int>  > uses the second definition because both the general and the specialized template parameters match so the more specialized ones are preferred. In this case, T equals int. CreateLeaf<Expression<Expression<int>  >  > also matches the more specialized definition with T equaling Expression<int>.

Control flow using template specializations and pattern matching is similar to switch statements. A switch statement has a condition and one or more pairs of case labels and associated code. The code associated with the the case label whose value matches the condition is executed. If no case label matches the condition, the default code, if present, is used. In template programming, instantiating a template, e.g.,


CreateLeaf<Expression<int>  >
serves as the condition. The set of template parameters for the indicated template class, e.g., CreateLeaf, are analogous to the case labels, and each has an associated definition. In our example, the set of template parameters are <class T> and <Expression<class T>  >. The "best match", if any, indicates the matching code that will be used. In our example, the <class T> parameter serves as the default label since it matches any arguments. If no set of template parameters match (which is impossible for our example) or if more than one set are best matches, the code is incorrect.

Functions as well as classes may be templated. All the concepts needed to understand function templates have already been introduced so we illustrate using an example. The templated function f takes one parameter of any type:


template <typename T>
void f(const T& t) { … }
A function template defines an unbounded set of related functions, all with the same name. Our example defines functions equivalent to f(const int&), f(const bool&), f(const int*&), …. Using a templated class definition with a static member function, we can define an equivalent function:

template <typename T>
class F {
  static void f(const T& t) { … }
};
Both the templated class and the templated function take the same template arguments, but the class uses a static member function. Thus, the notation to invoke it is slightly more verbose: F<T>::f(t).

The advantage of a function template is that it can be overloaded, particularly operator functions. For example, the + operator is overloaded to add two Arrays, which require template parameters to specify:


template <int D1,class T1,class E1,
             int D2,class T2,class E2>
// complicated return type omitted
operator+(const Array<D1,T1,E1> & l,
          const Array<D2,T2,E2> & r);
Without using function templates, it would not be possible to write expressions such as a1 + a2. Member functions can also be templated. This permits, for example, overloading of assignment operators defined within templated classes.

Function objects are frequently useful in run-time code. They consist of a function plus some additional storage and are usually implemented as structures with data members and a function call operator. Analogous classes can be used at compile time. Using the transformation introduced in the previous paragraph, we see that any function can be transformed into a class containing a static member function. Internal type definitions, enumerations, and static constant values can be added to the class. The static member function can use these values during its computation. The CreateLeaf structure, introduced above, illustrates this.


template<class T>
struct CreateLeaf
{
  typedef Scalar<T> Leaf_t;
  inline static Leaf_t make(const T& a)
    { return Scalar<T>(a); }
};
Thus, CreateLeaf<T>::make is a function with a complicated name and having access to the class member named Leaf_t. Unlike for function objects, the function's name within the class must be given a name.