guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Concerns/questions around Software Heritage Archive


From: Ian Eure
Subject: Re: Concerns/questions around Software Heritage Archive
Date: Sat, 16 Mar 2024 12:06:27 -0700
User-agent: mu4e 1.8.13; emacs 28.2


Christopher Baines <mail@cbaines.net> writes:

[[PGP Signed Part:Undecided]]

Ian Eure <ian@retrospec.tv> writes:

Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last fall, and it struck me as rather a good idea. However, I’ve seen some things
lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a developer who
wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I assume means it’s been included in SWH. While I’m dealing with their (IMO: unethical) opt-out process, I likely also need to stop new copies from
being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should *never*
be included in SWH?

Not currently, and I don't really see the point in such a mechanism. If you really never want them to store your code, then you need to license
it accordingly (and not make it free software).


I don’t want my code in SWH *because* it’s free. A primary use of LLMs is laundering freely licensed software into proprietary, commercial projects through "AI" code completion and generation. Any Free software in an LLM training set can and will be used in violation of its license, without a clear path for the author to seek recourse. I deleted my code off Github and abandoned it completely for this exact reason, and am deeply irked to be going through this nonsense again.

A more salient question may be: Is there a process within Guix (either the program or the organization) which uploads source to SWH? Or does it rely on SWH indepently?

If the latter, my problem is likely solved by blocking SWH at my network edge and opting out of their archive (or trying to) and the downstream training models they’ve already put it in. If the former, the only control I currently have to protect my license is removing packages from Guix which contain it. I don’t want that outcome.

Noting also that the path here seems to be SWH->huggingface->bigcode training set, and the opt-out process for the training set appears to be a complete sham. To opt-out, you must create a Github Issue; only one opt-out has *ever* been processed, and there are 200+ sitting there, many with no response for nearly a year[1]. I want no part of any of this.


Is there a way to tell Guix to never download source from SWH?

Also no, and it's probably best to do this at the network level on your
systems/network if you want this to be the case.


I’ll investigate this, though I’d prefer if there was a way to configure source mirrors in the Guix daemon.


Skipping back to this though:

I was also distressed to see how poorly they treated a developer who
wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

This is probably worth thinking about as Guix is in a similar situation regarding publishing source code, and people potentially wanting to change historical source code both in things Guix packages and Guix
itself.

Like Software Heritage, there's cryptographical implications for
rewriting the Git history and modifying source tarballs or nars that
contain source code.

We have 17TiB of compressed source code and built software stored for bordeaux.guix.gnu.org now and we should probably work out how to handle people asking for things to be removed or changed (for any and all
reasons).

It's probably worth working out our position on this in advance of
someone asking.


Yes, I agree that Guix needs a better solution for this.

Thanks,

 — Ian

[1]: https://github.com/bigcode-project/opt-out-v2/issues



reply via email to

[Prev in Thread] Current Thread [Next in Thread]