guix-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug#50384] [PATCH v2] Optimise search-patch (reducing I/O)


From: Maxime Devos
Subject: [bug#50384] [PATCH v2] Optimise search-patch (reducing I/O)
Date: Sun, 05 Sep 2021 21:48:22 +0200
User-agent: Evolution 3.34.2

Ludovic Courtès schreef op zo 05-09-2021 om 00:04 [+0200]:
> Maxime Devos <maximedevos@telenet.be> skribis:
> 
> > +(define-syntax search-patch
> > +  (lambda (s)
> > +    "Search the patch FILE-NAME and compute its hash at expansion time
> > +if possible.  Return #f if not found."
> > +    (syntax-case s ()
> > +      ((_ file-name)
> > +       (string? (syntax->datum #'file-name))
> > +       ;; FILE-NAME is a constant string, so the hash can be computed
> > +       ;; in advance.
> > +       (let ((patch (try-search-patch (syntax->datum #'file-name))))
> > +         (if patch
> > +             #`(%local-patch-file file-name #,(file-hash* patch #:select? 
> > true))
> > +             (begin
> > +               (warning (source-properties->location
> > +                         (syntax-source #'file-name))
> > +                        (G_ "~a: patch not found at expansion time")
> > +                        (syntax->datum #'ile-name))
> > +               #'(%search-patch file-name)))))
> > +      ;; FILE-NAME is variable, so the hash cannot be pre-computed.
> > +      ((_ file-name) #'(%search-patch file-name))
> > +      ;; search-patch is being used used in a construct like
> > +      ;; (map search-patch ...).
> > +      (id (identifier? #'id) #'%search-patch))))
> 
> It’s clever… but also a bit evil, in that it changes the semantics of
> package files in a surprising way.  Modifying foo.patch without
> recompiling foo.scm would lead you to still use the old foo.patch, which
> can be rather off-putting and error-prone IMO.

I added two patches adding (limited) dependency tracking to compile-all.scm.
If a patch file is now modified or deleted, the corresponding package modules
will be recompiled.  This should remove the ‘evilness’ I think.

> To address this, ‘local-file’ could store the inode/mtime + computed
> store file name (rather than the SHA256).  ‘local-file-compiler’ would
> check whether the actual file has matching inode/mtime before returning
> the computed store file name.  Problem is that inode/mtime are
> guaranteed to differ once you’ve run “make install”.  :-/

An additional problem is that 'local-file-compiler' would have to 'stat'
the file even if it is already in the store, undoing the (fairly limited?)
performance gains of this patch series.

The dependency tracking avoids this.

> Intuitively, I’d have imagined a cache populated at run time; it would
> map, say, file name/inode/mtime to a store file name.  ‘add-to-store’
> (or some wrapper above it) would check the cache and return the store
> file name directly, unless ‘valid-path?’ says it no longer exists.
> Downside is that this would be a per-user cache and you’d still pay the
> cost until it’s warm.  Advantage is that you could easily tell whether
> it’s stale.
> 
> Thoughts?

Intuitively, I'd have imagined doing as much as possible at compilation time.
The cost at compilation is only paid once (or, more correctly, at every
"guix pull"), while if you delay things until runtime, you need to check the
caches.

With this patch series (+ the two patches mentioned previously), the ‘cache’
is always fresh (though possibly not warm: the patch might not yet be in the
store).

Ludovic Courtès schreef op za 04-09-2021 om 23:47 [+0200]:
> Hi!
> 
> Some initial comments…
> 
> Maxime Devos <maximedevos@telenet.be> skribis:
> 
> > +++ b/guix/gexp.scm
> > @@ -531,13 +531,37 @@ appears."
> >  (define-gexp-compiler (local-file-compiler (file <local-file>) system 
> > target)
> >   [...]
> > +       (if sha256
> > +           (let ((path (fixed-output-path name sha256 #:recursive? 
> > recursive?)))
> > +             ;; If the hash is known in advance and the store already has 
> > the
> > +             ;; item, there is no need to intern the file.
> > +             (if (file-exists? path)
> > +                 (mbegin %store-monad
> > +                   ;; Tell the GC that PATH will be used, such that it 
> > won't
> > +                   ;; be deleted.
> > +                   ((store-lift add-temp-root) path)
> > +                   ;; The GC could have deleted the item before 
> > add-temp-root
> > +                   ;; completed, so check again if PATH exists.
> > +                   (if (file-exists? path)
> > +                       (return path)
> > +                       ;; If it has been removed, fall-back interning.
> > +                       (intern)))
> > +                 ;; If PATH does not yet exist, fall back to interning.
> > +                 (intern)))
> > +           (intern))))))
> 
> ‘file-exists?’ won’t work when talking to a remote store (e.g.,
> GUIX_DAEMON_SOCKET=ssh://…).
> 
> ‘add-temp-root’ doesn’t throw if the given store item does not exist.
> So it could be written like this:
> 
>   (if sha256
>       (mbegin %store-monad
>         (add-temp-root* item)
>         (if (valid-path?* item)
>             (return item)
>             (intern)))
>       (intern))

Done in the v2.

> But then, we’d add one RPC for every ‘add-to-store’ RPC corresponding to
> a patch (you can set “GUIX_PROFILING=rpc” to see the numbers), which is
> not great.
> 
> Ludo’.

Note that 'intern' is only called if the patch isn't yet in the store.
In practice, the patch is almost always already in the store.  For example,
suppose I have a few packages in my profile.  As the packages are in my
profile, they had to have their derivation computed at some point, so the
corresponding patches had to be interned.

If I now run "guix pull && guix package -u", when computing the derivation
of the updated profile, most required patches are already in the store,
because patches don't change often.

Likewise, if I run "guix environment guix" in one terminal, then in another,
then in yet another ... possibly for the first invocation, some patches need
to be interned, but for the other invocations, it's already in the store.

Because fixed-output-path is now called more often, I've added a patch
optimising (guix base32).

Let's compare the numbers again!  This time, I've run

echo powersave |sudo tee 
/sys/devices/system/cpu/cpufreq/policy{0,1,2,3}/scaling_governor

to make sure the CPU frequency doesn't change.  On a hot (disk) cache:

# After the patch series
time GUIX_PROFILING="rpc gc" ./the-optimised-guix/bin/guix build -d --no-grafts 
pigx

Remote procedure call summary: 5949 RPCs
  built-in-builders              ...     1
  add-to-store                   ...     3
  add-to-store/tree              ...    26
  add-temp-root                  ...   195
  valid-path?                    ...   195
  add-text-to-store              ...  5529
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        312.04 MiB
  GC times:         17
  time spent in GC: 3.34 seconds (25% of user time)

# averaged over three runs
real    0m14,035s
user    0m14,138s
sys     0m0,650s

# Before the patch series
time GUIX_PROFILING="rpc gc" ./the-unoptimised-guix/bin/guix build -d 
--no-grafts pigx
/gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv
Remote procedure call summary: 5749 RPCs
  built-in-builders              ...     1
  add-to-store/tree              ...    26
  add-to-store                   ...   193
  add-text-to-store              ...  5529
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        325.24 MiB
  GC times:         18
  time spent in GC: 3.66 seconds (26% of user time)

real    0m13,700s
user    0m14,051s
sys     0m0,658s

So on a hot disk cache, there doesn't appear to be any improvement
(except for ‘time spent in GC’ -- presumably that's due to the optimisations
to guix/base32.scm).

What about a cold cache?

# After the patch series

sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
./the-optimised-guix/bin/guix --help
time GUIX_PROFILING="rpc gc" ./the-optimised-guix/bin/guix build -d --no-grafts 
pigx
/gnu/store/fq6x8d2vcm6sbjkimg7g8kcgb4c5xv1b-pigx-0.0.3.drv
Remote procedure call summary: 5949 RPCs
  built-in-builders              ...     1
  add-to-store                   ...     3
  add-to-store/tree              ...    26
  add-temp-root                  ...   195
  valid-path?                    ...   195
  add-text-to-store              ...  5529
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        312.03 MiB
  GC times:         17
  time spent in GC: 3.37 seconds (23% of user time)

real    1m39,178s
user    0m14,557s
sys     0m0,990s

# Before the patch series
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
./the-unoptimised-guix/bin/guix --help
time GUIX_PROFILING="rpc gc" ./the-unoptimised-guix/bin/guix build -d 
--no-grafts pigx

Remote procedure call summary: 5749 RPCs
  built-in-builders              ...     1
  add-to-store/tree              ...    26
  add-to-store                   ...   193
  add-text-to-store              ...  5529
Garbage collection statistics:
  heap size:        93.85 MiB
  allocated:        325.25 MiB
  GC times:         18
  time spent in GC: 3.63 seconds (25% of user time)

real    1m42,100s
user    0m14,690s
sys     0m1,127s

It seems that if the disk cache is cold, the time-to-derivation decreases
a little by this patch series.  Much less than I had hoped for though; I'll
have to look into other areas for interesting performance gains ...

Greetings,
Maxime.

Attachment: v2-0001-build-self-Implement-basic-hash-algorithm.patch
Description: Text Data

Attachment: v2-0002-guix-hash-Extract-file-hashing-procedures.patch
Description: Text Data

Attachment: v2-0003-gexp-Allow-computing-the-hash-of-the-local-file-i.patch
Description: Text Data

Attachment: v2-0004-gexp-Allow-overriding-the-absolute-file-name.patch
Description: Text Data

Attachment: v2-0005-packages-Compute-the-hash-of-patches-in-advance-w.patch
Description: Text Data

Attachment: v2-0006-compile-all-compile-Keep-track-of-dependencies-of.patch
Description: Text Data

Attachment: v2-0007-packages-Add-patches-to-the-dependency-list-of-pa.patch
Description: Text Data

Attachment: v2-0008-gexp-Do-not-intern-if-the-file-is-already-in-the-.patch
Description: Text Data

Attachment: v2-0009-base32-Reduce-GC-pressure-in-make-bytevector-base.patch
Description: Text Data

Attachment: signature.asc
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]