[bug#53818] Improving updaters and ‘guix refresh’

guix-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#53818] Improving updaters and ‘guix refresh’

From:	zimoun
Subject:	[bug#53818] Improving updaters and ‘guix refresh’
Date:	Thu, 17 Feb 2022 12:17:14 +0100

Hi,

On Thu, 17 Feb 2022 at 11:35, Ludovic Courtès <ludo@gnu.org> wrote:

>>>   • guix refresh $(guix package -A ^emacs- | cut -f1)
>>
>> This one is interesting. This illustrates that the UI is, from my point
>> of view, a bit lacking. It would be a nice improvement to add a regexp
>> mechanism built-in, like in "guix search".
>
> Makes sense, we can do that.

I agree the UI is not nice.  Well, at the command line, I never read the
complete output of “guix package -A” and I always pipe it with “cut
-f1”.  Well, I think this complete display is only useful for
third-party; the only one I have in mind is emacs-guix.  Therefore, are
we maintaining this CLI for backward compatibility when we could change
both?

Something more useful as output would be:

   name version synopsis

Whatever. :-)

Even the internal etc/completion/bash/guix has to pipe:

--8<---------------cut here---------------start------------->8---
_guix_complete_available_package ()
{
    local prefix="$1"
    if [ -z "$_guix_available_packages" ]
    then
        # Cache the complete list because it rarely changes and makes
        # completion much faster.
        _guix_available_packages="$(${COMP_WORDS[0]} package -A 2> /dev/null \
                                    | cut -f1)"
    fi
    COMPREPLY+=($(compgen -W "$_guix_available_packages" -- "$prefix"))
}
--8<---------------cut here---------------end--------------->8---

Last, I am not convinced that “guix search” would be help here.
Because:

  1. the output requires to pipe with recsel,
  2. it is much slower than “package -A” [1].

1: <https://issues.guix.gnu.org/39258#119>

>>>   • guix refresh -m packages-i-care-about.scm
>>
>> Yes, obviously, this is a nice, too. However, it doesn't scale if you
>> need to specify 1000+ packages.

[...]

>> In any case, this fails after reporting status of around 50 packages,
>> with this time:
>>
>>   real       0m41,881s
>>   user       0m12,155s
>>   sys        0m0,726s
>
> How does it fail?  If it’s the GitHub rate limit, then there’s only one
> answer: you have to provide a token.

Let mimick a collection if 1000+ packages I care about.  Consider this
manifest for packages using r-build-system only…

--8<---------------cut here---------------start------------->8---
(use-modules (guix packages)
             (gnu packages)
             (guix build-system r))

(packages->manifest
 (fold-packages (lambda (package result)
                  (if (eq? (package-build-system package) r-build-system)
                      (cons package result)
                      result))
                '()))
--8<---------------cut here---------------end--------------->8---

…it hits the issue of Github token…

--8<---------------cut here---------------start------------->8---
gnu/packages/bioconductor.scm:6034:13: 1.66.0 is already the latest version of 
r-plgem
gnu/packages/bioconductor.scm:6011:13: 1.22.0 is already the latest version of 
r-rots
gnu/packages/bioconductor.scm:12614:2: warning: 'bioconductor' updater failed 
to determine available releases for r-fourcseq
Backtrace:
          13 (primitive-load "/home/simon/.config/guix/current/bin/guix")

[...]

ice-9/boot-9.scm:1685:16: In procedure raise-exception:
Error downloading release information through the GitHub
API. This may be fixed by using an access token and setting the environment
variable GUIX_GITHUB_TOKEN, for instance one procured from
https://github.com/settings/tokens

real    10m27.306s
user    4m14.077s
sys     0m12.467s
--8<---------------cut here---------------end--------------->8---

…when most R packages come from CRAN or Bioconductor archives.

Basically, ~5000 packages come from Github which represents ~25% of
overall.  Therefore, one needs to be really lucky when updating many
package and not hit the Github rate limit.

Yes, large collection of packages cannot be updated easily.  Somehow, it
is an issue from upstream and it is hard to fix… except by
duplicating upstream or provide a token. :-)

Well, using the external centralized Repology service is a first step to
update at scale, no?  A second step could be to have this feature
included in the Data Service; but before we have other fishes to fry,
IMHO. :-)

>> Assuming I don't get the "rate limit exceeded" error, at this rate, it
>> would take more than 15 minutes to check all the packages in
>> "emacs-xyz.scm". This is a bit long.
>>
>> I don't see how this could reasonably be made faster without relying on
>> an external centralized service doing the checks regularly (e.g., once
>> a day) before the user actually requests them.
>
> Maybe you’re right, but before jumping to the conclusion, we have to
> investigate a bit.  Like I wrote, the ‘gnu’ updater for instance fetches
> a single file that remains in cache afterwards—the cost is constant.

Repology acts as this “external centralized service”, no?  On one hand,
it is a practical solution; especially by being fast enough.  On the
other hand, it serves few false positives (say 4% to fix the ideas).

Nicolas, considering the complexity of packages and their origins, do
you think it would be possible to do better (fast and accurate) than
Repology at scale?

>>>   • guix refresh -m packages-i-care-about.scm
>>
>> Yes, obviously, this is a nice, too. However, it doesn't scale if you
>> need to specify 1000+ packages.
>
> You can use ‘fold-packages’ and have three lines that return a manifest
> of 10K packages if you want it.

Yes, see example above.

>>> If not, what kind of selection mechanism could help?  ‘-s’ currently
>>> accepts only two values, but we could augment it.
>>
>> Besides regexp matching, it may be useful to filter packages per module,
>> or source file name. Package categories is a bit awkward, tho, and
>> probably not satisfying.
>
> We can add options to make it more convenient, but it’s already
> possible:

Since these features are advanced, why not keep the CLI simple and
instead on rely manifest files for complex filtering?

>>> I realize this is going off-topic, but let’s see if we can improve the
>>> existing infrastructure to make it more convenient.

[...]

> I think we have nice infrastructure but you raise important
> shortcomings.  What Xinglu Chen did might in fact be one way to address
> it, and there may also be purely UI issues that we could address.

All the points raised here are important but appears to me orthogonal
with the patch series. :-)

Cheers,
simon

[Prev in Thread]

Current Thread

[Next in Thread]

[bug#53818] [PATCH 0/3] Add Repology updater, (continued)

Prev by Date: [bug#53434] Patches to unbreak many i686 packages
Next by Date: [bug#54000] [PATCH 0/2] Not showing upgraded/added packages in 'guix pull'
Previous by thread: [bug#53818] Improving updaters and ‘guix refresh’
Next by thread: [bug#53818] Improving updaters and ‘guix refresh’
Index(es):
- Date
- Thread