[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encoding for Robust Immutable Storage (ERIS) and Guile
From: |
Ludovic Courtès |
Subject: |
Re: Encoding for Robust Immutable Storage (ERIS) and Guile |
Date: |
Fri, 11 Dec 2020 09:10:50 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) |
Hello pukkamustard!
pukkamustard <pukkamustard@posteo.net> skribis:
> I looked into block boundaries with a "sliding hash" (re-compute a
> short
> hash for every byte read and choose boundaries when hash is
> zero). This
> would allow a higher degree of de-duplication, but I found this to be
> a
> bit "finicky" (and myself too impatient to tune and tweak this :).
>
> I settled on fixed block sizes, making the encoding faster and
> preventing
> information leaks based on block size.
Yeah, sounds reasonable. (I evaluated the benefits of this and other
approaches years ago, FWIW: <https://hal.inria.fr/hal-00187069/en>.)
> An other idea to increase de-duplication: When encoding a directory,
> align files to the ERIS block size. This would allows de-duplication
> of
> files across encoded images/directories.
I guess that’d work, indeed.
>> Do I get it right that the encoder currently keeps blocks in memory?
>
> By default when using `(eris-encode content)`, yes. The blocks are
> stored into an alist.
>
> But the encoder is implemented as an SRFI-171 transducer that eagerly
> emits (reduces) encoded blocks. So one could do this:
>
> (eris-encode content #:block-reducer my-backend)
>
> Where `my-backend` is a SRFI-171 reducer that takes care of the blocks
> as soon as they are ready. The IPFS example implements a reducer that
> stores blocks to IPFS. By default `eris-encode` just uses `rcons` from
> `(srfi srfi-171)`.
Ah, I see, that’s great! I’m not familiar with the transducer API so I
always have to think twice (or more) about what’s going on; the
flexibility it gives here is really nice.
> The encoding transducer is state-full. But it only keeps references to
> blocks in memory and at most log(n) at any moment, where n is the
> number of blocks to encode.
>
> The decoding interface currently looks likes this:
>
> (eris-decode->bytevector eris-urn
> (lambda (ref) (get-block-from-my-backend ref)))
OK.
>> Do you have plans to provide an interface to the storage backend so
>> one
>> can easily switch between in-memory, Datashards, IPFS, etc.?
>
> Currently the interface is a bit "low-level" - provide a SRFI-171
> reducer. This can definitely be improved and I'd be happy for ideas on
> how to make this more ergonomic.
Maybe that’s all we need after all. Maybe what would be nice is a
couple of examples, like a high-level procedure or CLI that can insert
or fetch from either (say) a local GDBM database or IPFS. That would
illustrate integration with backends as well as the high-level API.
Thanks!
Ludo’.