guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Disarchive database synchronization


From: Ludovic Courtès
Subject: Re: Disarchive database synchronization
Date: Mon, 20 Mar 2023 10:14:41 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)

Howdy Timothy!

Timothy Sample <samplet@ngyro.com> skribis:

> Ludovic Courtès <ludovic.courtes@inria.fr> writes:

[...]

>> For the remaining entries, it’s trickier.  Sometimes it’s just the
>> gzip compression parameters that differ, which could be addressed with a
>> little bit more work:
>>
>> $ file ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz 
>> ../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz
>> ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz:         
>>                 gzip compressed data, max compression, from Unix, original 
>> size modulo 2^32 446731
>> ../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz:
>>  gzip compressed data, max speed, from Unix, original size modulo 2^32 446731
>
> I’m not sure getting the compressed files to match matters.

No it doesn’t matter for sure; it’s just that it would have made it
easier to check for relevant differences between the two Disarchive
databases.

>> Sometimes it’s trickier:
>>
>> # diff -u <(gunzip -d < 
>> 0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz) 
>> <(gunzip -d < 
>> ../../disarchive/sha256/0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz)
>> --- /dev/fd/63  2023-03-14 16:13:21.635733426 +0100
>> +++ /dev/fd/62  2023-03-14 16:13:21.635733426 +0100
>> @@ -1,7 +1,7 @@
>>  (disarchive
>>    (version 0)
>>    (gzip-member
>> -    (name "webview-sys-0.6.2.tar.gz")
>> +    (name "rust-webview-sys-0.6.2.tar.gz")

[...]

> The name field is not used for data reconstruction.  It’s for human
> consumption (and it may have made some early examples of use at the
> command line easier to explain).  Here, the difference is based on the
> fact that Crate URIs are weird, and the Preservation of Guix code does
> not keep the origin file name.  Hence, the PoG version extracts the
> Crate name alone from the URI, and the Cuirass version uses the Guix
> package name with the “rust-” prefix.

OK.  Again I was looking at this from the perspective of determining
whether there were “relevant” differences between the two Disarchive
databases.  Looks like it would be quite some work to determine that
automatically.

>> As Tim pointed out, Disarchive disassembly is not fully deterministic
>> and/or might change a bit over time as Disarchive evolves, and that’s
>> prolly what we’re seeing here.
>
> I honestly think this is a good thing.  My instincts tell me that we
> should excise all sources of ambiguity, like we’re trying to do in the
> big picture.  However, Disarchive will get better at describing things
> over time.  For instance, it doesn’t handle tar extension headers
> elegantly at the moment.  In the future, if I fix this, I might consider
> creating a “migrate” feature that improves existing specifications
> (e.g., converting the old, verbose representation of extension headers
> into the new representation).  In particular, I’ve left some warts in
> the software in order to ship it, and I would be sad to try and commit
> to those for the rest of time!

That makes a lot of sense!

> We might also add other resolver addresses besides SWHIDs....
>
> Maybe I’m missing some perspective, but I don’t think trying to commit
> to reproducible outputs for Disarchive makes sense.

Yes, I feel the same.

> P.S., we’ll have to do this dance again shortly, as I just computed
> 2,023 historical bzip2 specifications.  They’re not online yet, but
> they’ll be up when I publish the next PoG report – which should take less
> than a year this time!  :p

Woow, bzip2!  I was just now looking at a concrete disappearing-tarball
issue that involves bzip2:

  https://issues.guix.gnu.org/62071#8

Thank you!

Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]