emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GC: cons sweeping and cons block size


From: Dmitry Antipov
Subject: GC: cons sweeping and cons block size
Date: Tue, 03 Jul 2007 19:16:11 +0400
User-agent: Thunderbird 1.5.0.7 (X11/20061008)

Here is another observation I've made around my previous stuff.

For the current cons sweeping method, the size of cons block (i.e. number of
conses per 'cons_block') is orthogonal to sweeping speed since the process
will scan over each bit of each cons block anyway. But this is not true for
the method I'm proposing.

Taking a convenient 32-bit hardware, consider 1K and 4K cons blocks. In the
first case, one cons block has room for 125 conses, and sweeping process for
a full block will do 3 fast and 1 slow bitmap scan. For 100000 conses, we will
allocate 800 cons blocks, and (assuming that all conses are marked) sweeping
will do 2400 fast and 800 slow scans. For the second case, one cons block has
room for ~500 conses, and sweeping process for a full block will do 15 fast and
1 slow bitmap scan. Again, for 100000 conses, we will allocate ~200 blocks and
sweeping will do ~3000 fast and ~200 slow scans.

Well-skilled eye will pay attention on the fact that non-full larger block
tends to be more fragmented than the smaller one, and this fragmentation
really reduces the fast/slow scans ratio for a larger block. This is true. But,
since 50% - 80% of a cons blocks are usually full and fast/slow scans ratio is
much better for a larger block, the potential slowdown for a non-full block is
expiated by the substantial speedup for the full one.

As usual, nothing is ideal. Since cons block must be aligned, malloc()'ing
1K-aligned 1K-block is cheaper than 4K-aligned 4K-block, and the latter 
allocation
probably tends to a higher heap fragmentation. On the other side, since malloc()
needs to maintain the number of blocks which is 4 times less, it's internal 
space
overhead is reduced.

Practically, here is a typical cons sweeping times for '(replace-string "a" 
"__A__")'
in src/ChangeLog, depends on the cons block size:

GC     1k       2k       4k       8k      16k
---------------------------------------------
 0    503      470      477      452      675
 1    581      490      488      448      428
 2    674      572      574      523      500
 3    538      416      418      352      362
 4    514      404      602      361      349
 5    597      466      448      408      353
 6    653      529      643      445      410
 7    697      554      518      458      443
 8    749      586      548      480      465
 9    892      617      564      500      482
 10   821      626      578      492      489
 11   857      644      581      510      495
 12   874      656      595      500      487
 13   906      679      608      501      481
 14   936      705      612      513      500
 15   986      737      648      537      549
 16   1040     761      674      552      539
 17   1106     885      712      671      564
 18   1214     848      746      711      586
 19   1239     910      783      652      616
 20   1317     937      833      684      657
 21   1381     992      875      710      681
 22   1458     1043     902      783      709
 23   1527     1095     960      790      752
 24   1615     1158     1021     838      792
 25   1800     1244     1077     892      844
 26   1822     1294     1145     944      888
 27   1921     1383     1201     987      939
 28   2033     1455     1259     1166     981
 29   2144     1626     1312     1089     1042
 30   2224     1578     1373     1219     1086
 31   2357     1696     1468     1221     1158
 32   2493     1796     1569     1301     1233
 33   3127     1860     1726     1338     1302
 34   2767     2000     1724     1446     1354
 35   2885     2113     1825     1599     1439

This table shows an effect described above quite well. Although the possible 
heap
fragmentation penalty caused by larger blocks needs to be investigated closer,
I believe that 4K cons blocks will be a better choice for the proposed sweeping
method than 1K ones.

Dmitry




reply via email to

[Prev in Thread] Current Thread [Next in Thread]