POOMA: A C++ Toolkit for High-Performance Parallel Scientific Computing | ||
---|---|---|
Prev | Chapter 3. A Tutorial Introduction | Next |
POOMA supports data-parallel Array accesses. Many algorithms are more easily expressed using data-parallel expressions. Also, the POOMA Toolkit can sometimes reorder the data-parallel computations to be more efficient or distribute them among various processors. In this section, we concentrate on the differences between the data-parallel implementation of Doof2d listed in Example 3-3 and the element-wise implementation listed in the previous section.
Example 3-3. Data-Parallel Array Implementation of Doof2d
#include <iostream> // has std::cout, ... #include <stdlib.h> // has EXIT_SUCCESS #include "Pooma/Arrays.h" // has POOMA's Array declarations // Doof2d: POOMA Arrays, data-parallel implementation int main(int argc, char *argv[]) { // Prepare the POOMA library for execution. Pooma::initialize(argc,argv); // Ask the user for the number of averagings. long nuAveragings, nuIterations; std::cout < < "Please enter the number of averagings: "; std::cin > > nuAveragings; nuIterations = (nuAveragings+1)/2; // Each iteration performs two averagings. // Ask the user for the number n of values along one // dimension of the grid. long n; std::cout < < "Please enter the array size: "; std::cin > > n; // Specify the arrays' domains [0,n) x [0,n). Interval<1> N(0, n-1); Interval<2> vertDomain(N, N); // Set up interior domains [1,n-1) x [1,n-1) // for computation.Interval<1> I(1,n-2); Interval<1> J(1,n-2); // Create the arrays. // The Array template parameters indicate 2 dimensions, // a 'double' value // type, and ordinary 'Brick' storage. Array<2, double, Brick> a(vertDomain); Array<2, double, Brick> b(vertDomain); // Set up the initial conditions. // All grid values should be zero except for the // central value. a = b = 0.0; // Ensure all data-parallel computation finishes // before accessing a value.
Pooma::blockAndEvaluate(); b(n/2,n/2) = 1000.0; // In the average, weight elements with this value. const double weight = 1.0/9.0; // Perform the simulation. for (int k = 0; k < nuIterations; ++k) { // Read from b. Write to a.
a(I,J) = weight * (b(I+1,J+1) + b(I+1,J ) + b(I+1,J-1) + b(I ,J+1) + b(I ,J ) + b(I ,J-1) + b(I-1,J+1) + b(I-1,J ) + b(I-1,J-1)); // Read from a. Write to b. b(I,J) = weight * (a(I+1,J+1) + a(I+1,J ) + a(I+1,J-1) + a(I ,J+1) + a(I ,J ) + a(I ,J-1) + a(I-1,J+1) + a(I-1,J ) + a(I-1,J-1)); } // Print out the final central value. Pooma::blockAndEvaluate(); // Ensure all computation has finished. std::cout < < (nuAveragings % 2 ? a(n/2,n/2) : b(n/2,n/2)) < < std::endl; // The arrays are automatically deallocated. // Tell the POOMA library execution has finished. Pooma::finalize(); return EXIT_SUCCESS; }
Data-parallel expressions use containers and domain objects to indicate a set of parallel expressions. For example, in the program listed above, a(I,J) specifies the subset of a array omitting the outermost elements. The array's vertDomain domain consists of the Cartesian product of {0, 1, 2, …, n-1} with itself, while I and J each specify {1, 2, …, n-2}. Thus, a(I,J) is the subset with a domain of the Cartesian product of {1, 2, …, n-2} with itself. It is called a view of an array. It is itself an Array, with a domain and supporting element access, but its storage is the same as a's. Changing a value in a(I,J) also changes the same value in a. Changing a value in the latter also changes the former if the value is not one of a's outermost elements. The expression b(I+1,J+1) indicates the subset of b with a domain consisting of the Cartesian product of {2, 3, …, n-1}, i.e., the same domain as a(I,J) but shifted up one unit and to the right one unit. Only an Interval's value, not its name, is important so all uses of J in this program could be replaced by I without changing the semantics.
The statement assigning to a(I,J) illustrates that Arrays may participate in expressions. Each addend is a view of an array, which is itself an array. The views' indices are zero-based so their sum can be formed by adding identically indexed elements of each array. For example, the lower, left element of the result equals the sum of the lower, left elements of the addend arrays. Figure 3-2 illustrates adding two arrays.
Figure 3-2. Adding Arrays
When adding arrays, values with the same indices, indicated by the small numbers adjacent to the arrays, are added.
POOMA may reorder computation or distribute them among various processors so, before accessing individual values, the code calls Pooma::blockAndEvaluate. Before reading an individual Array value, calling this function ensures all computations affecting its value have finished, i.e., it has the correct value. Calling this function is necessary only when accessing individual array elements. For example, before the data-parallel operation of printing an array, POOMA will call blockAndEvaluate itself.