guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: distributed substitutes: file slicing


From: Attila Lendvai
Subject: Re: distributed substitutes: file slicing
Date: Wed, 21 Jun 2023 14:32:33 +0000

> I have a question / suggestion about the distributed substitutes
> project: would downloads be split into uniformly sized chunks or could
> the sizes vary?
> Specifically, in an extreme case where an update introduced a single
> extra byte at the beginning of a file, would that result in completely
> new chunks?


most (all?) distributed storage solutions have a chunker (including ERIS with 
its 32k chunks, or Swarm with 4k chunks), and the chunks are content addressed, 
i.e. it also serves as deduplication at the chunk granularity.

if the file doesn't just grow, but shifts away a couple of bytes somewhere in 
the middle, then this chunk-level deduplication stops happening from that point 
on.

IIRC rar was the first archiver that introduced a very fast deduplication 
algorithm that detected even the non-aligned duplicated blocks of varying 
sizes. i don't think any distributed storage system has anything like that.


> An alternative I've been thinking about is this:
> find the store references in a file and split it along these references,
> optionally apply further chunking to the non-reference blobs.


chunking storage systems store only whole chunks, so too much splitting of 
files can increase the wasted storage. more so with large chunks, less so with 
smaller ones.


> It's probably best to do this at the NAR level??
> 
> Storing reference offsets is already something that we should be doing to
> speed other operations up, so this could tie in nicely with that.


if optimization of grafting is worth this amount of trouble, then maybe the 
best is to extend the NAR format to store mutable references in a separate 
table at the end of the file. that would speed up guix operations like 
grafting, and help any storage systems that have deduplication, which includes 
some copy-on-write filesystems.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“If you shut up truth and bury it under the ground, it will but grow, and 
gather to itself such explosive power that the day it bursts through it will 
knock down everything that stands in its way.”
        — Émile Zola (1840–1902)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]