[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Bug-gnubg] more on bearoff databases
From: |
Joern Thyssen |
Subject: |
[Bug-gnubg] more on bearoff databases |
Date: |
Sun, 27 Oct 2002 10:06:58 +0000 |
User-agent: |
Mutt/1.4i |
Hi all,
I'm in the process of generating one-sided bearoff databases for gnubg.
The bearoff databases generated contains probabilities for bearing off
and the probabilities for bearing more than one chequer off (used for
calculating gammon probs).
The main problem is the size. My latest version stores only non-zero
elements. However, in order for easy retrieval of the probabilities I
store an index at the head of the file:
pos in file (4 bytes) + non zero probs (1 byte) + idx into aray(1 bytes)
+ non zero gammon probs (1 byte) + idx into gammon array(1 byte)
so the size of the file is
#positions * 8 bytes + number of non zero elements
The results so far are:
#pts #positions size (unc) size comp size comp bzip2 -9
6 54,264 6,945,792 1,467,518 959,264
7 170,544 21,829,632 4,949,046 3,156,522
8 490,314 62,760,192 15,264,056 9,643,342
9 1,307,504 167,360,512 44,096,390 27,775,598
10 3,268,870 418,401,280 118,913,118 73,642,861
11 7,726,160 988,948,480 in progress
The column "size(unc)" is the uncompressed size: #positions * 2 * 32 * 2
bytes". The "size comp" is the size when only non-zero elements are
stored + the index. I tried to bzip -9 the file; the size of the
resuting file can be seen in the last column.
In general, storing only non-zero elements gives a saving of a factor of
4. Also, additional compression with bzip2 -9 gives at most an
additional factor of 2, so it appears that the entropy of the resulting
compressed files are very high. If I bzip -9 the uncompressed
418,401,280 10-pt file I get a file with size 65,771,118, that is, only
slightly smaller than bzip'ing the comp. file.
Extrapolating the table above we get:
#pts #positions size(unc) estimated size comp
11 7,726,160 988,948,480 250 MB
12 17,383,860 2,225,134,080 550 MB
13 37,442,160 4,792,596,480 1200 MB
14 77,558,760 9,927,521,280 2500 MB
So the practiacal limit appears to be the 12-pt or perhaps the 13-pt
databases. So if we want to go further we have to do something else:
1 eliminate "ridicolous" positions, e.g., 15 chequers on the 13 point.
For example, we may opt to store only positions with fewer than, say,
10 chequers on each point. However, I did some simulations and in
general we can't save anything here. For example, for the 13 pt
database there are only 30k positions with more than 10 chequers on a
point compared to the total of 37M positions. Similar for the other dbs.
So this idea is DEAD!
2 approximate the roll distribution with something that requires
less parameters, e.g., a normal distribution. Any comments on this?
3 settle with the 13 point database and combine it with one-sided
rollouts.
4 think of another compression scheme.
(2) and (3) are probably the most promosing ones.
Jørn
--
Joern Thyssen, PhD
Vendsysselgade 3, 3., DK-9000 Aalborg, Denmark
+45 9813 2791 (private) / +45 2818 0183 (mobile) / +45 9633 7036 (work)
Note: new mobile number!
- [Bug-gnubg] more on bearoff databases,
Joern Thyssen <=