[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding for Robust Immutable Storage (ERIS) and Guile

From: pukkamustard
Subject: Re: Encoding for Robust Immutable Storage (ERIS) and Guile
Date: Thu, 10 Dec 2020 09:27:02 +0100
User-agent: mu4e 1.4.13; emacs 27.1

Hi Ludo,

Block size is fixed; did you consider content-defined block boundaries
and such?  Perhaps it doesn’t bring much though.

I looked into block boundaries with a "sliding hash" (re-compute a short hash for every byte read and choose boundaries when hash is zero). This would allow a higher degree of de-duplication, but I found this to be a
bit "finicky" (and myself too impatient to tune and tweak this :).

I settled on fixed block sizes, making the encoding faster and preventing
information leaks based on block size.

An other idea to increase de-duplication: When encoding a directory, align files to the ERIS block size. This would allows de-duplication of
files across encoded images/directories.

Maybe something like SquashFS already does such an alignment? That would
be cool...

The IPFS example is nice! There are bindings to the IPFS HTTP interface floating around for Guix; would be nice to converge on these bits.

Spelunking into wip-ipfs-substitutes is on my list! Will report back
with a report on the adventure. :)

ERIS is still "experimental". This release is intended to initiate discussion and collect feedback from a wider circle. In particular I'd be interested in your thoughts on applications and the Guile API.

Do I get it right that the encoder currently keeps blocks in memory?

By default when using `(eris-encode content)`, yes. The blocks are
stored into an alist.

But the encoder is implemented as an SRFI-171 transducer that eagerly
emits (reduces) encoded blocks. So one could do this:

(eris-encode content #:block-reducer my-backend)

Where `my-backend` is a SRFI-171 reducer that takes care of the blocks as soon as they are ready. The IPFS example implements a reducer that stores blocks to IPFS. By default `eris-encode` just uses `rcons` from
`(srfi srfi-171)`.

The encoding transducer is state-full. But it only keeps references to
blocks in memory and at most log(n) at any moment, where n is the
number of blocks to encode.

The decoding interface currently looks likes this:

(eris-decode->bytevector eris-urn
 (lambda (ref) (get-block-from-my-backend ref)))

Much room for improvement...

Do you have plans to provide an interface to the storage backend so one
can easily switch between in-memory, Datashards, IPFS, etc.?

Currently the interface is a bit "low-level" - provide a SRFI-171
reducer. This can definitely be improved and I'd be happy for ideas on
how to make this more ergonomic.

Thank you for your comments!

reply via email to

[Prev in Thread] Current Thread [Next in Thread]