bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Bug-gnubg] Suggestion for change of rates/ratings


From: Ian Shaw
Subject: RE: [Bug-gnubg] Suggestion for change of rates/ratings
Date: Thu, 30 Mar 2006 17:03:23 +0100

 

> -----Original Message-----
> From: Christian Anthon on 30 March 2006 15:10
> 
> it is well known that gnubg judges the players harder than snowie.
> Snowie error rates seems to be defacto standard when judging 
> players. To see how gnubg and snowie error rates compare I 
> tried downloading 25 or so matches from Hardye's site, all 
> played between more or less world class players and most of 
> them quite long. They were analysed using gnubg 2ply/2ply.
> 
> To summarise:
> 
> snowie_m  snowie_c   snowie  gnubg_m  gnubg_c   gnubg
>     3.91      1.66     5.58     9.36    21.51   11.25
> 
> So given these numbers I suggest that we adapt snowie's 
> rate/rating groupings, but that we divide the gnubg 
> move-error and total-error rates with 2 and the cube error 
> rate by 4. That way it will be easy for people to understand 
> the numbers and to compare an intermediate snowie player to 
> an intermediate gnubg player. Of course the move ratings and 
> the total ratings of gnubg will not be directly comparable. 
> But it will complicate things unnecessary to make it any different.

If we that it's desirable to make gnubg's ratings comparable to
Snowie's, then this isn't the right way to do it.

The chequer play discrepancy arises because gnubg doesn't include forced
moves in the calculation, whereas Snowie includes all moves.

The cube play discrepancy arises because gnubg only includes actual or
(arbitrarily) close decisions in the calculation, whereas Snowie
includes all cube decisions.

But we already have an "equivalent Snowie rate" in the stats, which I
presume is calculated exactly the Snowie way. People can look at that if
they want to, and I do. The only difference should be the discrepancies
in the bots' analyses. (Aside: I read recently that Snowie has recently
changed to include non-contact chequer plays in its stats; I presume
they've improved their previously lousy bear-in play. What does the
gnubg Snowie implementation do?).

Gnubg calculates errors differently because the discussions on this list
concluded that forced moves don't count as decisions at all, just like
you can't make a cube error if the opponent holds the cube. This makes
more sense to me.

I would have more sympathy with making the cube error algorithm
identical to Snowie's; I can't recall the arguments in favour of the
current implementation. 

A case could be made that it is better to standardize across bots, and
Snowie is the de facto standard, even if it is less than ideal. Betamax,
anybody? I don't think that there is a mathematically right and wrong
way.

I wonder what BgBlitz does? And what Zbot will do? Perhaps we could even
persuade the next version of Snowie to go the gnubg way!

-- Ian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]