gnugo-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnugo-devel] twin endgame match


From: alain Baeckeroot
Subject: [gnugo-devel] twin endgame match
Date: Fri, 3 Mar 2006 14:38:29 +0100
User-agent: KMail/1.9.1

Hi

Following Arend advice, gg378 and twin-378 had a 85 games endgame-match:
- twin 26 win (1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 3 5 7 10 14 15 21 25 28)
- GNU Go 14 win(-9 -3 -3 -3 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1)
- 45 unchanged
The sum is +135, the average on 85 games +1.6

_but_ when one looks at the attached plot of cumulative +PASS -FAIL versus 
game_status, the twin fails a lot of end-game tests (game_status>0.85). It is 
already a huge task to check big failures, but i feel too lazy to investigate 
this 40 tests and more than 50 regressions in endgame, (and i am a very bad 
yose player ;-) 

By construction, the twin "knows" exactly how gg378 evaluates the game, and 
the twin may steal a big point before gg378 plays it, but it is still 
gnugo-logic. So i wonder if this endgame match is significant or if it is 
just a systematic error.

In other words, a reliable endgame comparison should imply an other engine, 
good at endgame, and compare the results of both against the reference 
engine.

Am i right, or just paranoid ?
Is there such an engine available ?

- Alain

PS: the plot include all boardsizes, it is not so flat when separating them, 
but i have made too much clean-up, and erased the results, so ... i re run 
regression tests again :(

Attachment: twin4-d1.5_cumul+P-F_vs_gstatus.png
Description: PNG image


reply via email to

[Prev in Thread] Current Thread [Next in Thread]