[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnugo-devel] memory usage (was engine/influence.c (and DFA))

From: Dave Denholm
Subject: [gnugo-devel] memory usage (was engine/influence.c (and DFA))
Date: 10 Sep 2002 08:18:33 +0100

Arend Bayer <address@hidden> writes:

> Also we see that indeed memory is becoming more of a bottle-neck (not
> a surprise, of course): Also Athlon, but K6-2 400 MHz instead of
> Gunnar's (1,6 GHz?):
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  10.88    111.74   111.74 136937536     0.00     0.00  scan_for_patterns
>   5.53    168.53    56.79 100769412     0.00     0.00  check_pattern_light
>   4.65    216.32    47.79   174590     0.00     0.00  compute_primary_domains
>   3.44    251.63    35.31    94773     0.00     0.00  do_push_owl

> Instead, one could substantially reduce sizeof(local_owl_data) (and thus
> both speedup do_push_owl and gain overall cache benefits) by
> - maintaining eyes in a separate list, i.e. not keeping the eye
>   information copied at every
> - further compressing the eye data (e.g. the 3bit information of the
>   eye color doesn't need a whole "int", etc.)
Further to this... I got a measurable speedup (3 to 5 %), by changing
some struct entries from int to short. Specifically, the lunch, attack and
defend entries in local_owl_data (since that's what do_push_owl deals
with), and the dfa data structures.

But this is probably architecture dependent. One approach would be
to define types for various things  (move_t to hold a move, etc),
and these can be tuned for various platforms.

I don't think x86 suffers too much from having sub-int types,
so the reduction in cache misses is an overall plus. But other
platforms are probably different. Need to take care to stop compiler
generating code to constantly sign-extend numbers when promoting to int.

Using short's in dfa could, of course, limit the number of patterns it
could match. However, if the next-state fields were relative rather
than absolute, then if I understand it correctly, one can probably
juggle nodes around so that they can always be reached. No problem
for ahead-of-time dfa's, but if they are to be generated / modified
at run time, such balancing could wipe out any benefits.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]