guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Guidelines for pre-trained ML model weight binaries (Was re: Where s


From: Simon Tournier
Subject: Re: Guidelines for pre-trained ML model weight binaries (Was re: Where should we put machine learning model parameters?)
Date: Tue, 30 May 2023 15:15:22 +0200

Hi Ludo,

On ven., 26 mai 2023 at 17:37, Ludovic Courtès <ludo@gnu.org> wrote:

>> Well, I do not know if we have reached a conclusion.  From my point of
>> view, both can be included *if* their licenses are compatible with Free
>> Software – included the weights (pre-trained model) as licensed data.
>
> We discussed it in 2019:
>
>   https://issues.guix.gnu.org/36071

Your concern in this thread was:

        My point is about whether these trained neural network data are
        something that we could distribute per the FSDG.

        https://issues.guix.gnu.org/36071#3-lineno21

and we discussed this specific concern for the package leela-zero.
Quoting 3 messages:

        Perhaps we could do the same, but I’d like to hear what others think.

        Back to this patch: I think it’s fine to accept it as long as the
        software necessary for training is included.

        The whole link is worth a click since there seems to be a ‘server
        component’ involved as well.

        https://issues.guix.gnu.org/36071#3-lineno31
        https://issues.guix.gnu.org/36071#5-lineno52
        https://issues.guix.gnu.org/36071#6-lineno18


And somehow I am rising the same concern for packages using weights.  We
could discuss case-by-case, instead I find important to sketch
guidelines about the weights because it would help to decide what to do
with neuronal networks; as “Leela Chess Zero” [1] or others (see below).

1: https://issues.guix.gnu.org/63088


> This LWN article on the debate that then took place in Debian is
> insightful:
>
>   https://lwn.net/Articles/760142/

As pointed in #36071 mentioned above, this LWN article is a digest of
some Debian discussion, and it is also worth to give a look to the raw
material (arguments):

https://lists.debian.org/debian-devel/2018/07/msg00153.html


> To me, there is no doubt that neural networks are a threat to user
> autonomy: hard to train by yourself without very expensive hardware,
> next to impossible without proprietary software, plus you need that huge
> amount of data available to begin with.

About the “others” from above, please note that GNU Backgamon, already
packaged in Guix with the name ’gnubg’, asks similar questions. :-)

Quoting the webpage [2]:

        Tournament match and money session cube handling and cubeful
        play. All governed by underlying cubeless money game based
        neural networks.


As Russ Allbery is pointing [3] – similarly as I tried to do in this
thread – it seems hard to distinguish the data resulting from a
pre-processing as some training to the data just resulting from good
fitted parameters.


2: https://www.gnu.org/software/gnubg/
3: https://lwn.net/Articles/760199/


> As a project, we don’t have guidelines about this though.  I don’t know
> if we can come up with general guidelines or if we should, at least as a
> start, look at things on a case-by-case basis.

Somehow, if we do not have guidelines for helping in deciding, it makes
harder the review of #63088 [1] asking the inclusion of lc0 or it makes
hard to know what to do about GNU Backgamon.

On these specific cases, what do we do? :-)


Cheers,
simon



reply via email to

[Prev in Thread] Current Thread [Next in Thread]