POOMA: A C++ Toolkit for High-Performance Parallel Scientific Computing | ||
---|---|---|
Prev | Chapter 4. Overview of POOMA Concepts | Next |
Most POOMA programs use containers to store groups of values. POOMA containers are objects that store other objects such as numbers or vectors. They control allocation and deallocation of these stored objects and access to them. They are a generalization of C arrays, but POOMA containers are first-class objects so they can be used directly in expressions. They are also similar to C++ containers such as vector, list, and stack. See Table 4-2 for a summary of the containers.
This section describes many concepts, but one need not understand them all to begin programming with the POOMA Toolkit. First, we introduce the different POOMA's containers and describe how to choose an appropriate one for a particular task. Figure 4-1 indicates which concepts must be understood when declaring a particular container. All of these concepts are described in Section 4.1.2 and Section 4.1.3. Use this figure to decide which concepts in the former are relevant. Reading the latter section is necessary only if computing using multiple processors. The programs in the previous chapter illustrate many of these concepts.
Table 4-2 briefly describes the six POOMA containers. They are more fully described in the paragraphs below.
Table 4-2. POOMA Container Summary
Array | container mapping indices to values and that may be used in expressions |
DynamicArray | one-dimensional Array whose domain can be dynamically resized |
Field | container mapping indices to one or more values and residing in multidimensional space |
Tensor | multidimensional mathematical tensor |
TinyMatrix | two-dimensional mathematical matrix |
Vector | multidimensional mathematical vector |
A POOMA Array generalizes a C array and maps indices to values. Given an index or position in an Array's domain, it returns the associated value, either by returning a stored value or by computing it. The use of indices, which are usually ordered tuples, permits constant-time access although computing a particular value may require significant time. In addition to the functionality provided by C arrays, the Array class automatically handles memory allocation and deallocation, supports a wider variety of assignments, and can be used in expressions. For example, the addition of two arrays can be assigned to an array and the product of a scalar element and an array is permissible.
A POOMA DynamicArray extends Array capabilities to support a dynamically-changing domain but is restricted to only one dimension. When the DynamicArray is resized, its values are preserved.
A POOMA Field is an Array with spatial extent. Each domain consists of cells in one-, two-, or three-dimensional space. Although indexed similarly to Arrays, each cell may contain multiple values and multiple materials. A Field's mesh stores its spatial characteristics and can yield, e.g., the cell at a particular point, the distance between two cells, or a cell's normals. A Field should be used whenever geometric or spatial computations are needed, multiple values per index are desired, or a computation involves more than one material.
A Tensor implements a multidimensional mathematical tensor. Since it is a first-class type, it can be used in expressions such as adding two Tensors.
A TinyMatrix implements a two-dimensional mathematical matrix. Since it is a first-class type, it can be used in expressions such as assignments to matrices and multiplying matrices.
A Vector implements a multidimensional mathematical vector, which is an ordered tuple of components. Since it is a first-class type, it can be used in expressions such as adding two Vectors and multiplying a TinyMatrix and a Vector.
The data of an Array, DynamicArray, or Field can be accessed using more than one container by taking a view. A view of an existing container C is a container whose domain is a subset of C's domain. The subset can equal the original domain. A view acts like a reference in that changing any of the view's values also changes the original container's and vice versa. While users sometimes explicitly create views, they are perhaps more frequently created as temporaries in expressions. For example, if A is an Array and I is a domain, A(I) - A(I-1) uses two views to form the difference between adjacent values.
The two most commonly used POOMA containers are Arrays and Fields, while Vector, TinyMatrix, and Tensor represent mathematical objects. Table 4-3 contains a decision tree describing how to choose an appropriate container.
Table 4-3. Choosing a POOMA Container
If modeling mathematical entries, | use a Vector, TinyMatrix, or Tensor. |
If indices and values reside in multidimensional space ℜd, | use a Field. |
If there are multiple values per index, | use a Field. |
If there are multiple materials participating in the same computation, | use a Field. |
If the domain's size dynamically changes and is one-dimensional, | use a DynamicArray. |
Otherwise | use an Array. |
In the previous sections, we introduced the POOMA containers and described how to choose one appropriate for a given task. In this section, we describe the concepts involved in declaring them. Concepts specific to distributed computation are described in the next section.
Figure 4-1 illustrates the containers and the concepts involved in their declarations. The containers are listed in the top row. Lines connect these containers to the components necessary for their declarations. For example, an Array declaration requires an engine and a layout. These, in turn, can depend on other POOMA concepts. Declarations necessary only for distributed, or multiprocessor, computation are also indicated. Given a desired container, one can use this figure to determine the concepts needed to declare a particular container.
An engine stores and, if necessary, computes a container's values. A container has one or more engines. The separation of a container from its storage permits optimizing a program's space and time requirements. For example, a container returning the same value for all indices can use a constant engine, which need only store one value for the entire domain. A CompressibleBrick Engine reduces its space requirements to a constant whenever all its values are the same. The separation between a container and its engine also permits taking views of containers without copying storage.
A layout maps domain indices to the processors and computer memory used by a container's engines. See Figure 4-2. A program computes a container's values using these processors and memory. The layout specifies the processors and memory to use for each particular index. A container's layout for a uniprocessor implementation consists of its domain, the processor, and its memory. For a multiprocessor implementation, the layout maps portions of the domain to (possibly different) processors and memory.
A domain is a set of points on which a container can define values. There are several different types of domains. An interval consists of all integral points between two endpoints. It is frequently represented using mathematical interval notation [a,b]; it contains only the integral points, e.g., a, a+1, a+2, …, b. The concept is generalized to multiple dimensions by forming direct products of intervals, i.e., all the integral tuples in an n-dimensional space. For example, the two-dimensional containers in the previous chapter are defined on a two-dimensional domain with the both dimensions' spanning the interval [0,n). A domain need not contain all integral points between its endpoints. A stride indicates a regular spacing between points. A range is a subset of an interval of regularly-spaced points specified by a stride.
A Field's mesh maps domain indices to spatial values in ℜd such as distances between cells, edge lengths, and normals to cells. In other words, it provides a Field's spatial extent. See also Figure 4-2. Different mesh types may support different spatial values.
A mesh's corner position specifies the point in ℜd corresponding to the cell in the lower, left corner of its domain. Combining this, the domain, and the cell size can specify the mesh's map from indices to ℜd.
A mesh's cell size specifies the spatial dimensions of a Field cell, e.g., its width, height, and depth, in ℜd. Combining this, the domain, and the corner position can specify the mesh's map from indices to ℜd.
In the previous section, we introduced the important concepts for declaring containers for use on uniprocessor computers. When using multiprocessor computers, we augment these concepts with those for distributed computation. Reading this section is important only for running a program on multiple processors. Many of these concepts were introduced in Section 3.6 and Section 3.8. Figure 3-4 illustrates the POOMA distributed computation model. In this section, we concentrate on the concepts necessary to declare a distributed container.
As we noted in Section 3.6, a POOMA programmer must specify how each container's domain should be distributed among the available processors and memory spaces. Using this information, the toolkit automatically distributes the data among the available processors and handles any required communication among them. The three concepts necessary for declaring distributed containers are a partition, a guard layer, and a context mapper tag.
A partition specifies how to divide a container's domain into distributed pieces. For example, the partition illustrated in Figure 3-4 would divide a two-dimensional domain into three equally-sized pieces along the x-dimension and two equally-sized pieces along the y-dimension. Partitions can be independent of the size of container's domain. The example partition will work on any domain as long as the size of its x-dimension is a multiple of three. A domain is separated into disjoint patches.
A guard layer surrounds each patch with read-only values. An external guard layer specifies values surrounding the entire domain. Its presence eases computation along the domain's edges by permitting the same computations as for more-internal computations. An internal guard layer duplicates values from adjacent patches so communication with adjacent patches need not occur during a patch's computation. The use of guard layers is an optimization; using external guard layers eases programming and using internal guard layers reduces communication among processors. Their use is not required.
A context mapper indicates how a container's patches are mapped to processors and shared memory. For example, the DistributedTag indicates that the patches should be distributed among the processors so each patch occurs once in the entire computation. The ReplicatedTag indicates that the patches should be replicated among the processors so each processing unit has its own copy of all the patches. While it could be wasteful to have different processors perform the same computation, replicating a container can reduce possibly more expensive communication costs.