RE: [pooma-dev] timers and performance measurement under Linux

freepooma-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [pooma-dev] timers and performance measurement under Linux

From:	Julian C. Cummings
Subject:	RE: [pooma-dev] timers and performance measurement under Linux
Date:	Mon, 6 Aug 2001 11:20:56 -0700

Hi Jim,

It's probably a good idea to have an option for choosing between CPU time

and wallclock time as the benchmark measurement, since they each have

their own purpose. Normally CPU time is interesting for measuring on-node

performance and wallclock time is more useful for measuring parallel speedup.

As for threads, I think you can only measure CPU time while a thread is

active on a processor. Doing anything else doesn't really make sense. So

with clock(), you would want to call clock() when a thread first begins to work

(not before the thread is actually launched!) and then call clock() again when

the thread is done working (but before the thread exits). If each thread does

a tiny amount of work, the resolution issue you mentioned will be an issue.

The clock() function has different resolutions on different architectures. But

normally the variation in performance measurement from one run to the next

is far greater than the timer resolution anyway.

The gettimeofday() function is probably the best thing to use for wallclock

time measurement. This is what we used in the old Timer class in Pooma r1.

I haven't looked at your check-in yet, but hopefully you remembered to check

for overflow in the microseconds counter and increment the seconds counter

accordingly. Other than that, I remember that code as being pretty simple.

As for your comments on the PIII performance, I think what you are seeing

is correct. The out-of-cache performance is not very good. You will see

closer to optimal performance only when the problem size is in-cache, and

the caches are much smaller than what we were used to on the SGI boxes.

With an optimized C code kernel, you should be able to see the cache effect

and stronger flops numbers for small problem sizes. (But of course, it gets

harder to measure accurately, too.) I'm not aware of any profiling tools from

KAI, so I think prof/gprof is all there is, unless you know how to access Pentium

hardware counters. Quite a while ago, I fooled around with the performance

analysis tools that come with Intel VTune. They were not terribly helpful, other

than telling me the obvious, that all the time was being spent in the kernel loop.

The real problem is that the optimization and inlining of the compiler were poor

enough that there was little to do to improve performance. I remember being

surprised to find that Blitz code optimized much better under VTune than Pooma

code. I think the reasons had to do with the amount of template complexity

in Pooma versus Blitz. Maybe things are better under newer versions of VTune,

but Allan didn't seem too impressed with the newest VTune when I asked him

about it. In any case, I sort of thought that VTune was not an important target

for Pooma and its customers, so I haven't pursued it further.

Julian C.

-----Original Message-----
From: James Crotinger [mailto:address@hidden
Sent: Sunday, August 05, 2001 8:37 PM
To: 'address@hidden'
Subject: [pooma-dev] timers and performance measurement under Linux

How important is it that the timer used by the benchmark class measure CPU time? I'm not sure how we've even been using this class for parallel codes - in fact, I was thinking that back when we were just testing with threads, we were measuring wall-clock time. But the default implementation of Utilities/Clock.h uses the "clock()" call, which is supposed to measure "elapsed processor time", though I'm not completely sure what that means for a multithreaded program on a multi-processor. On the SGI we were using the high-performance timers, which I believe access the CPUs hardware performance registers, so I would think that that was CPU time as well. Anyway, the problem is that clock() only has 10 ms granularity under Linux and that's really crappy if you're trying to measure scaling with problem size. I've written a version that uses gettimeofday, which has a granularity of something like 15 microseconds under Linux (may depend on processor speed?), but this is definitely not a measure of CPU time. For serial programs this just means that you need to test on an unloaded machine to have the results make sense. For parallel programs, this means that you need to interpret the results differently. Anyway, I'm using this and it seems generally useful, so I'd like to check it in, but was wondering if the clock-time versus CPU-time measurement was a problem for anyone. I suppose we could put an accessor on Clock that would allow the user to find out what type of time was being presented, and Benchmark could examine this and calculate appropriately, if necessary.

As a side comment, I'm using KCC on one of our 1 GHz PIII Linux boxes. ABCTest (which is doing a simple non-stencil calculation, so this is as fast as it gets) is only getting 45 MFlops for the C version, which is the fastest. This seems pretty pathetic. What have other Linux users seen on what types of boxes? Also, what is available on Linux for profiling? Is prof/gprof it? Does KAI sell anything commercial? Does the PIII have any hardware performance monitoring and is there access to it? Does VTune (under Windows) inline sufficiently well to do performance testing there (allegedly good profiling tools, so it might be OK for the serial stuff, but I don't know if it is capable of "inlining everything").

Jim

[Prev in Thread]

Current Thread

[Next in Thread]

timers and performance measurement under Linux, James Crotinger, 2001/08/05
- RE: [pooma-dev] timers and performance measurement under Linux, Julian C. Cummings <=

Prev by Date: [newfield_revision] patch: FieldShiftEngine
Next by Date: RE: [pooma-dev] timers and performance measurement under Linux
Previous by thread: timers and performance measurement under Linux
Next by thread: RE: [pooma-dev] timers and performance measurement under Linux
Index(es):
- Date
- Thread