help-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: persistent reproducibility ?


From: zimoun
Subject: Re: persistent reproducibility ?
Date: Wed, 22 Mar 2017 18:39:58 +0100

Hi!

On 21 March 2017 at 17:19, Ludovic Courtès <address@hidden> wrote:
> Hello!
>
> zimoun <address@hidden> skribis:
>
>> The typical research workflow is:
>>
>> - Alice proposes new method and/or algorithm, publishes a paper and
>> illustrates that by the software `foo'. Let the best case: Alice
>> provides a Guix "recipe", and all the material is stored in Github
>> (let say). This software `foo' depends on both `bar' and `baz', one
>> also in Github and the other one included in the Guix package tree.
>>
>> - It is easy for Bob to check out and experiment. Guix allows him to
>> straightforwardly build the bit identical `foo' (all dependencies
>> included). Nice!! Repeatability is there for free.
>>
>> - New features are added to `foo', `bar' and `baz'. All the codes
>> evolve, especially the research ones.
>>
>> - Now, Joe is implementing the Alice's method; science means
>> reproducible. And Joe would like to compare his implementation to the
>> Alice one provided by `foo'. However, how ? The `foo' "ecosystem" has
>> changed with the new features. Therefore, Joe has to navigate in the
>> Git tree of the Guix "recipe" of `foo', `bar', `baz' to be able to
>> produce the bit-identical `foo' used in the initial paper. I mean, it
>> is what I understand to do, and it does not seem reasonable.
>>
>>
>> My question is: does Guix provide any mechanism to build reproducible
>> software over the time ?
>
> To add to what Alex wrote, yes it’s possible, though there are UI gaps
> that we’ll be filling.  If you do a checkout of the Guix commit that
> Alice mentioned in the paper, you can build the exact same software as
> Alice.  ‘guix pull’ allows you to specify the Guix commit you’d like to
> use, but it’s not that convenient that it’s something we’d like to
> improve.

Thank you for your quick answer.

I was not aware of the `guix pull' commit specification. I am going to try.

If I understand well your both explanations, my question overlaps the
current discussion about Channels. I mean, in the same way that people
currently include in their project conda shell files or spack python
files or whatever in this flavour, they will replace by Guile ones
(even if lisp is hard to sell ;-), and there is questions about how to
glue these channels.

One of the issues is that the Guix packages tree will never include
some softwares, even if they are open source. Because the authors
apply weird licences or non-GNU compliant licences, or simply because
authors are not so motivated to push. Even if I totally agree with the
paragraph about Proprietary Softwares in your cited paper, it is just
a fact from my humble opinion.


Therefore, what should be the "standard" way to manipulate against
history version external and decentralised packages ? and guix repo
packages too ?


Well, if I understand your both answers, the correct process should
be: Alice publishes a paper containing the exact version (commit hash
or revision number or origin hash) of both the source tree and the
recipe tree, and their both uri location, and then, Joe "just" needs
to check out each (manually for now or possibly by nice UI glue).

>From my current knowledge about Guix, considering two moments t1 and
t2, separated by let say several month, a way to build at t2 the
bit-identical `foo' of t1 should to provide at t1 a sort of manual
meta-package pointing to exacts version through `origin' and
`git-fetch' (imaging that `hg-fetch' and `svn-fetch' would be added,
or not!). If one piece of this information about the exact versions is
not pointed out at t1, it appears to me almost impossible to insure
the building of the bit-identical at t2.
If I understand well, this way is what you describe Alex, right ?

Another option should be: Guix add some meta-data in profile (kind of
manifest) that automatically tracks this information at t1. Then using
this profile at t2, it eases the building process, and it insures that
nothing had been forgotten.

Still from what I understand, Guix adds meta-data (roughly all the
propagated hashes) that allow to verify the bit-identity. However, I
am not sure if it does not lack meta-data included in profile or
elsewhere which describe the "state" at t (e.g., exact version by
commit specification) and then allow to revert from t+1 to t.

Maybe `archive' or `pack' already do the job ? Something able to pack
at the source level all the necessary materials to reproduce from
scratch a bit-identical environment.


>
> Ricardo and I wrote about the kind of workflows you describe in
> <https://arxiv.org/abs/1506.02822>.  I hope you’ll find it interesting!

Yes! I found it really interesting! :-)
I already read it long time ago and discovered spack and then how
HiePACS uses it. Thanks!
Anyway!
I am reading again with new glasses.


>
>> Last, `foo' and `bar' are stored in two Github repositories. And they
>> should disappear.
>> ( I am not talking if it is good or not to use github, right now, it
>> just is used by many teams of researchers )
>>
>> Could we used the Software Heritage initiative to maintain a kind of
>> persistency ?
>> https://www.softwareheritage.org
>
> Definitely!  Software Heritage does not expose it yet, but when it does,
> we can add it as a fallback mirror in our (guix git-download) module.
> (I’ve discussed our use case with the Software Heritage folks a few
> times, so they’re aware of it.  ;-))

Really nice !!

>
> I think the big picture is (where each arrow means “depends on”):
>
>   repro science -> repro software environments -> stable archive
>
> Things like ReScience can do the first step, Guix can do the second one,
> and Software Heritage does the third one.

"There is a long way to go, but the road is Free..." ;-)

>
> Thanks for sharing your use case!

Thanks for all this positive energy :-)
and thanks to give me a reason to learn scheme/guile because guix

--
simon

>
> Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]