bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24937: "deleting unused links" GC phase is too slow


From: Ludovic Courtès
Subject: bug#24937: "deleting unused links" GC phase is too slow
Date: Sat, 13 Nov 2021 17:56:52 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> I haven't done any analysis, just grabbed the result, but here it what
> it looks for me:

There’s a bit more than 35% of deduplicated files that are < 1KiB, and
not much to be gained by deduplicating them.

On IRC several people shared the results on their machine; several had
similar results, and one person had a lot more of those small files (50%
of deduplicated files were < 1KiB).

The chart (with a kinda bogus layout) below is perhaps more interesting:
it shows the contribution of files below a certain size to the overall
space savings.

PNG image

In a nutshell:

  • Files < 1KiB contribute to 0.3% of the space savings;

  • Files < 4KiB contribute to 2.5% of the space savings;

  • Files < 256KiB contribute to 42% of the space savings.

You can create this plot with:

--8<---------------cut here---------------start------------->8---
(make-scatter-plot #:title "Contribution to space savings"
                   #:write-to-png "/tmp/space-saving-contribution.png"
                   #:chart-width 1000
                   #:y-axis-label "contribution (%)"
                   #:x-axis-label "size (B)"
                   #:log-x-base 2
                   #:min-x 513
                   #:data
                   (let ((total (saved-space l)))
                     `(("contribution"
                        ,@(map (lambda (size)
                                 (cons size
                                       (/ (saved-space (filter (lambda (file)
                                                                 (< 
(deduplicated-file-size
                                                                     file)
                                                                    size))
                                                               l))
                                          total .01)))
                               (map (cut expt 2 <>)
                                    (iota 12 10 1)))))))
--8<---------------cut here---------------end--------------->8---

You can also compute individual points like this:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
                                               (< (deduplicated-file-size file) 
1024))
                                             l))
                        (saved-space l) 1.)
$60 = 0.0034284626558736746
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
                                               (< (deduplicated-file-size file) 
4096))
                                             l))
                        (saved-space l) 1.)
$62 = 0.025190871178467848
scheme@(guile-user)> (/ (saved-space (filter (lambda (file)
                                               (< (deduplicated-file-size file) 
(expt 2 18)))
                                             l))
                        (saved-space l) 1.)
$65 = 0.42411104869782185
--8<---------------cut here---------------end--------------->8---

Choosing a deduplication threshold of 2KiB or 4KiB would have a
negligible impact on disk usage on my machine.

Thanks,
Ludo’.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]