[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnugo-devel] engine/influence.c (and DFA)

From: Dave Denholm
Subject: Re: [gnugo-devel] engine/influence.c (and DFA)
Date: 03 Sep 2002 17:13:38 +0100

Marco Scheurer <address@hidden> writes:

> On Tuesday, September 3, 2002, at 05:10  pm, Arend Bayer wrote:
> > On 3 Sep 2002, Dave Denholm wrote:
> >
> >> Unrolling the loop
> >> reduced the time from 6 minutes to 5 minutes 37 seconds. (I had been
> >> wondering whether the measured speedup had been due to it doing less
> >> work, rather than due to the unrolling.)
> >
> > What I've been wondering: Shouldn't we expect a modern compiler do do 
> > the
> > loop unrolling himself?

Usually, but in this case each iteration has terms with multiplications by 0 or 
+/- 1.
By unrolling, the compiler can simplify the expressions significantly.
In the loop case, it gets the constants from a table, and so the code has
to be generated to perform the multiplication in full.

> Aren't there also cases where loop unrolling has a negative effect?

Cache effects..?

> Speaking of micro optimizations and speed/space trade offs, I've been 
> wondering about these:
> - There are many bi-dimensional arrays. Would it make sense to replace 
> them by pointers to pointers? Two indirections are supposed to be faster 
> than the arithmetic needed to access elements in a bi-directional array. 
> The syntax to read an element is the same, but some initialization code 
> is needed.

Depends - if row size is a power of two, may be able to perform
the indexing as a single instruction. ARM certainly can.

Again there are cache effects - accessing memory could be slower
than the calculation if it causes a cache miss.

> - Replace floats with doubles? The net effect probably depends on the 
> processor and the compiler, and is not likely to make a huge difference. 
> But maybe these floats could be declared as ggfloat, with an option to 
> typedef this as float or double.

Yes, I was actually just about to try that. C requires that calculations
are performed as double, and result is truncated if it has be stored
into a float variable. Chances are that all the results fit into the x86 fpu 
and so never get stored anyway.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]