Automation of SWH save (was: Cuirass: "lint -c archival"?)

guix-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Automation of SWH save (was: Cuirass: "lint -c archival"?)

From:	zimoun
Subject:	Automation of SWH save (was: Cuirass: "lint -c archival"?)
Date:	Fri, 25 Sep 2020 10:56:53 +0200

Hi,

On Thu, 24 Sep 2020 at 21:06, Christopher Baines <mail@cbaines.net> wrote:
> zimoun <zimon.toutoune@gmail.com> writes:

> So, my understanding is that Software Heritage is a potential store for
> source material for Guix packages. I think the majority of builds
> Cuirass does are because inputs change, rather than the source of a
> package.

To be precise, Software Heritage stores all the upstream source codes,
only.  Their API entry-point for "save" is the URL of a Git or
Mercurial or Subversion repository and then they ingest the content
that this very URL serves.

And it is not necessary to build the package to send a "save" request;
 "guix lint -c archival foo" sends the request for the git-reference
source of Guix packages.

Note that Guix does not send the result of "guix build -S" but the
real upstream URL.

> I'm not sure hooking this up to Cuirass would make the most sense,
> because of the above point.
>
> Also, unfortunately, the Guix Data Service doesn't have the ideal data
> for this, as it doesn't really store the package source information in
> the way that would be useful for this.

Somehow, the GDS has this information because it reports Lint Warnings
(for example [1]: bottom "no lint warnings").  However, if I read
correctly, you added the option "--no-network" to only use the linters
which do not require network access.

Does the GDS run the linters by itself or does it use the log from Cuirass?

[1] 
<https://data.guix.gnu.org/revision/c385bd69ad407f608e3da3156fed0ac915574313/package/git/2.28.0>

BTW, please consider the patch #43261 [2] fixing issue in the current
implement of "--no-network". :-)

[2] <http://issues.guix.gnu.org/issue/43261>

> Personally though (and I'm rather biased), I think the Guix Data Service
> might still be an approach. If you take the view on this that the
> Software Heritage is a means to a store item (which I think is right?),
> the Guix Data Service knows about those store items (like [1]).
>
> 1: 
> https://data.guix.gnu.org/gnu/store/5h4dz6ild4fkida5yfv5fhh59vfd8hvk-python-boolean.py-3.6-checkout

Currently, Guix does not provide machinery to send its source
substitutes.  I am not convinced it makes sense to do so.  The model I
am imagining is:

 - short term:
    + a script runs as a cron job to lint all the packages, say once
per day (packages will be missed but it is better than what we
currently have)
    + try to implement the save request for hg and svn (I am working
on it if no one beats me :-))
 - middle term: add a hook (Cuirass or GDS) to trigger action if the
package passes.
 - long term: SWH ingest everything via sources.json

Somehow, send all the source substitutes should be done once, at the
moment from short to middle term.  Currently, SWH ingests all the
tarballs (via sources.json) and few git-reference packages: the ones
when the packager/reviewer did "guix lint -c archival".  I am
proposing to automatize instead of relying on a packager/reviewer
willing. :-)

Well, with wider point of view, the hook could send a save request to
SWH or we could also imagine that the hook could do whatever with the
results (store item): push to somewhere or dissambles the tarball (if
any) and saves it to the database (be able then to fetch from SWH).

Note that the long term does not depend on the Guix side but on the
SWH side.  So the term could be shorter. :-)

Does this make sense?

> To make the information actionable though, it would be necessary to
> store more information about the sources for packages in the Guix Data
> Service database.
>
> This is much more work than just using the existing linter, but it does
> have the advantage that you'd be able to look at coverage statistics and
> things like that, which the checker doesn't really afford.

Yes.

In summary, SWH limits the number of requests per hour (10 save
requests and 120 query requests) and so it is impossible to automatize
the saving mechanism.  I am proposing to ask them to change this rate
limit for one specific trusted machine (for example, if I understand
correctly, the Nix and Debian projects are doing so).  Therefore, the
question is:

 - which machine?
 - what is the automation process? (see above)

WDYT?

All the best,
simon

[Prev in Thread]

Current Thread

[Next in Thread]

Cuirass: "lint -c archival"?, zimoun, 2020/09/23
- Re: Cuirass: "lint -c archival"?, Mathieu Othacehe, 2020/09/24
  - Re: Cuirass: "lint -c archival"?, zimoun, 2020/09/24
- Re: Cuirass: "lint -c archival"?, Christopher Baines, 2020/09/24
  - Automation of SWH save (was: Cuirass: "lint -c archival"?), zimoun <=

Prev by Date: Re: emacs-lucid (was Re: Emacs closure at ~900MB?)
Next by Date: Failing CI evaluation for testing branch
Previous by thread: Re: Cuirass: "lint -c archival"?
Next by thread: Problem bootstrapping Guix - "make update-guix-package" result: no code for module (gcrypt hash)
Index(es):
- Date
- Thread