[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [gnugo-devel] engine/influence.c (and DFA)
From: |
Evan Berggren Daniel |
Subject: |
Re: [gnugo-devel] engine/influence.c (and DFA) |
Date: |
Thu, 5 Sep 2002 21:53:59 -0400 (EDT) |
On Thu, 5 Sep 2002, Arend Bayer wrote:
>
> On 5 Sep 2002, Dave Denholm wrote:
>
> > Gunnar Farneback <address@hidden> writes:
> >
> > > Dave wrote:
> > > > I was just having a look at influence.c, since that is where the
> > > > profile shows the most time being spent.
> > >
> > > On what platform and what exactly did you run? I just did a profile on
> > > an AMD Athlon running neurogo.tst and then the influence code doesn't
> > > take much time. There the pattern matching is in the top.
> > >
> > > % cumulative self self total
> > > time seconds seconds calls s/call s/call name
> > > 16.05 37.36 37.36 142878688 0.00 0.00 scan_for_patterns
> > > 6.27 51.96 14.60 104763600 0.00 0.00 check_pattern_light
> > > 5.08 63.79 11.83 179092 0.00 0.00
> > > compute_primary_domains
> > > 3.85 72.76 8.97 64742873 0.00 0.00 fastlib
> > > 3.05 79.87 7.11 30039637 0.00 0.00 do_play_move
> > > 2.94 86.72 6.85 97175 0.00 0.00 do_push_owl
> (...)
> > > 1.46 128.20 3.39 157757 0.00 0.00 accumulate_influence
> > > [...]
> > >
> > > What we see here is probably that the neurogo test suite is rather owl
> > > intensive. For a more representative profiling one should probably
> > > replay a couple of entire games against varied opponents (self play
> > > just doesn't suffice). The most straightforward way to do that would
> > > be to have a GTP file simulating replay of a selection of games.
>
> Also we see that indeed memory is becoming more of a bottle-neck (not
> a surprise, of course): Also Athlon, but K6-2 400 MHz instead of
> Gunnar's (1,6 GHz?):
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 10.88 111.74 111.74 136937536 0.00 0.00 scan_for_patterns
> 5.53 168.53 56.79 100769412 0.00 0.00 check_pattern_light
> 4.65 216.32 47.79 174590 0.00 0.00 compute_primary_domains
> 3.44 251.63 35.31 94773 0.00 0.00 do_push_owl
> 3.18 284.33 32.70 18600234 0.00 0.00 order_moves
> 3.17 316.87 32.54 63926889 0.00 0.00 fastlib
> 2.47 342.19 25.32 29608725 0.00 0.00 do_play_move
> 2.28 365.56 23.37 2783429 0.00 0.00
> search_persistent_reading_cache
> 2.23 388.43 22.87 156910 0.00 0.00 accumulate_influence
> 2.18 410.77 22.34 17404893 0.00 0.00 assimilate_string
> (...)
>
> That makes me even wonder whether dfa is still a big speed up at all on
> CPUs as fast as Gunnar's.
>
> > But one new thought... do_push_owl() just pushes state ? But looking
> > at the structure, there is a lot of state to remember.
> >
> > The other owl functions don't appear anywhere significant in the profile.
> Well, compute_primary_domains is an owl function.
>
> > So what I'm thinking is : maybe the balance between saving state
> > and recalculating state is too far towards saving state. Maybe
> > if we saved less state, or spent more effort saving only what we need,
> > or made the owl stuff work slightly harder to get hold of its data,
> > then we'd get an overall speedup by making do_push_owl faster ?
>
> I recently realized that pushing the eye state (which is the biggest
> part of sizeof(local_owl_data) is unnecessary, it is completely
> recalculated from scratch, just didn't get around yet to do anything
> about it yet.
>
> However, I wouldn't be sure that this would be a big speedup. Note that
> we cannot overwrite the current state (we do still need the information for
> selecting the next move to try when we return from deeper recursion
> leves). A side effect of do_push_owl might be that the memory segment
> to which we copy the data is now in the CPU cache, where we need it
> anyway.
>
> (Sometime around 3.1.2x I killed pop_owl, which unnecessarily did the
> same amount of copying as do_push_owl is doing now. It turned out to be
> only a <1% speedup, although pop_owl had some 3-4% CPU time before.)
Sounds like what we have is a memory allocation issue -- ie, most of the
time spent in that function is really spent on allocating memory.
Therefore, if we know at runtime how many owl_nodes we could possibly
have, would it make sense to allocate that many initially and then keep
track of them ourselves, instead of using malloc() and free()? Just a
thought, feel free to ignore it if the fact that I haven't looked at the
code means I have no clue ;)
> Instead, one could substantially reduce sizeof(local_owl_data) (and thus
> both speedup do_push_owl and gain overall cache benefits) by
> - maintaining eyes in a separate list, i.e. not keeping the eye
> information copied at every
> - further compressing the eye data (e.g. the 3bit information of the
> eye color doesn't need a whole "int", etc.)
Sounds good. We might want to make things #define-able, because in some
cases I think it is faster to move around and do math on full-size values.
Does this sound at all on-target?
Thanks
Evan Daniel
- Re: HAVE_INLINE (Re: [gnugo-devel] engine/influence.c (and DFA)), (continued)
- Re: [gnugo-devel] engine/influence.c (and DFA), Gunnar Farneback, 2002/09/03
- Re: [gnugo-devel] engine/influence.c (and DFA), bump, 2002/09/06
- Re: [gnugo-devel] engine/influence.c (and DFA), Dave Denholm, 2002/09/06
- Re: [gnugo-devel] engine/influence.c (and DFA), bump, 2002/09/06
- Re: [gnugo-devel] engine/influence.c (and DFA), Dave Denholm, 2002/09/06
- Re: [gnugo-devel] engine/influence.c (and DFA), bump, 2002/09/06
- Re: [gnugo-devel] engine/influence.c (and DFA), Arend Bayer, 2002/09/12
- Re: [gnugo-devel] engine/influence.c (and DFA), Dave Denholm, 2002/09/12
- Re: [gnugo-devel] engine/influence.c (and DFA), Arend Bayer, 2002/09/12