[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] Windows development/test build released
From: |
Philippe Michel |
Subject: |
Re: [Bug-gnubg] Windows development/test build released |
Date: |
Sun, 17 Mar 2013 15:57:10 +0100 (CET) |
User-agent: |
Alpine 2.00 (BSF 1167 2008-08-23) |
On Sat, 16 Mar 2013, Neural Gnat wrote:
I've just re-analysed a 1000-game money session that I did about a week ago
with 2012's World Class versus Casual. This new test version has found 1839
doubtful moves, 304 bad moves and 247 very bad moves, knocking the mainstream
version down from Supernatural to World Class (-4.0).
This is a surprisingly large difference. I would expect the new version to
be better by about 1 (gnubg style error rate) or 0.5 (Snowie ER / XG PR).
On the other hand, if old has, say, an error rate of 4 vs. perfect play
and new has 3 due to different mistakes, they may well be 4 away from each
other.
The question is, how do you determine which of those opinions are correct?
Dare I mention XG? ;o)
Roll out the disagreements. All of them would take time, of course, but
only a few games' worth or the largest ones should give some idea of what
is happening. Analysing these with XGR++ instead could be a reasonable
shortcut and allow to look at more of them in a given time.
Another question is, how do I get these two versions playing each other? I
tried the "socket" players a few years ago but, with no instructions, no
result and no feedback from GnuBg, I soon gave up.
I don't know if the file is shipped in gnubg's Windows installation, but
the comments a the start of matchseries.py here should help :
http://cvs.savannah.gnu.org/viewvc/gnubg/gnubg/scripts/
But don't expect to play the two versions against each other and get
anything better than an anecdotal gross result. You would need a session
*much* longer than 1000 games for a statistically significant result, and
stock gnubg isn't suited to this. It keeps the whole session in memory and
would likely get slower and slower and crash at some point.
You could still analyse the short session with XG at the highest level of
luck analysis you can afford and get a useful variance-reduced result by a
neutral third party.