RE: [pooma-dev] timers and performance measurement under Linux

freepooma-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [pooma-dev] timers and performance measurement under Linux

From:	James Crotinger
Subject:	RE: [pooma-dev] timers and performance measurement under Linux
Date:	Mon, 6 Aug 2001 12:47:36 -0600

-----Original Message-----
From: Julian C. Cummings [mailto:address@hidden
Sent: Monday, August 06, 2001 12:21 PM
To: James Crotinger; address@hidden
Subject: RE: [pooma-dev] timers and performance measurement under Linux

Julian: ---------------------------------------------------

The gettimeofday() function is probably the best thing to use for wallclock

time measurement. This is what we used in the old Timer class in Pooma r1.

I haven't looked at your check-in yet, but hopefully you remembered to check

for overflow in the microseconds counter and increment the seconds counter

accordingly. Other than that, I remember that code as being pretty simple.

-----------------------------------------------------------

I just did:

return tv.tv_sec + 1.e-6 * tv.tv_usec;

This mirrors what we are doing with clock_gettime. My interpretation of gettimeofday is that tv_usec should always be less than 1e6 - it is supposed to return the number of seconds and microseconds since 12:00 am Jan 1, 1970. I checked this under Linux - tv_usec resets to zero everytime tv_sec is increased. So I don't see a reason to put our own (% 1000000) after it, and indeed if it were over 1000000 I'm not even sure how I'd interpret tv_sec.

Julian: ---------------------------------------------------
As for your comments on the PIII performance, I think what you are seeing

is correct. The out-of-cache performance is not very good. You will see

closer to optimal performance only when the problem size is in-cache, and

the caches are much smaller than what we were used to on the SGI boxes.

With an optimized C code kernel, you should be able to see the cache effect

and stronger flops numbers for small problem sizes. (But of course, it gets

harder to measure accurately, too.) I'm not aware of any profiling tools from

KAI, so I think prof/gprof is all there is, unless you know how to access Pentium

hardware counters.

-----------------------------------------------------------

Oh, this number is definitely memory bandwidth limited - there are three to four loads and two stores every trip through the loop, which does four flops (two multiplies and two adds). I get a peak C performance of about 390 MFlops for N = 60 or so. The peak POOMA II Brick performance is only 115 at a slightly higher N and then it drops off very rapidly to about 30.

I tried gprof with "KCC -pg" generated code this morning, and gprof crashed after about 10 minutes of crunching on the output of a run. Has anyone else out there seen this? I'm going to try compiling with gcc, but I'm not sure it generates good enough code for me to trust the profile results to guide me to the right optimizations.

Jim

[Prev in Thread]

Current Thread

[Next in Thread]

RE: [pooma-dev] timers and performance measurement under Linux, James Crotinger, 2001/08/06
- RE: [pooma-dev] timers and performance measurement under Linux, James Crotinger <=
  - RE: [pooma-dev] timers and performance measurement under Linux, Julian C. Cummings, 2001/08/06

Prev by Date: RE: [pooma-dev] timers and performance measurement under Linux
Next by Date: RE: [pooma-dev] timers and performance measurement under Linux
Previous by thread: RE: [pooma-dev] timers and performance measurement under Linux
Next by thread: RE: [pooma-dev] timers and performance measurement under Linux
Index(es):
- Date
- Thread