guix-science
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] Add draft post "CRAN, a practical example for being reproducible


From: Lars-Dominik Braun
Subject: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
Date: Tue, 6 Dec 2022 08:53:22 +0100

Hi Simon, hi all,

attached my draft post for hpc.guix.info regarding guix-cran.

Thanks,
Lars

* drafts/reproducible-cran.md: New file.
---
 drafts/reproducible-cran.md | 195 ++++++++++++++++++++++++++++++++++++
 1 file changed, 195 insertions(+)
 create mode 100644 drafts/reproducible-cran.md

diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
new file mode 100644
index 0000000..c759b02
--- /dev/null
+++ b/drafts/reproducible-cran.md
@@ -0,0 +1,195 @@
+# CRAN, a practical example for being reproducible at large scale using GNU 
Guix
+
+GNU Guix provides scripts (“importer”) to turn packages from
+various language-specific repositories like [PyPi](https://pypi.org/)
+for Python, [crates.io](https://crates.io/) for Rust and
+[CRAN](https://cran.r-project.org/) for R into Guix package recipes.
+
+An example workflow for the CRAN package
+[zoid](https://CRAN.R-project.org/package=zoid), which is not available
+in Guix proper, would look like this:
+
+1. Import the package into a manifest.
+
+   ```console
+   $ guix import cran -r zoid > manifest.scm
+   ```
+2. Edit `manifest.scm` to import the required modules and return a
+   usable manifest containing the package and R itself.
+
+   ```scheme
+   (use-modules (guix packages)
+                (guix download)
+                (guix licenses)
+                (guix build-system r)
+                (gnu packages cran)
+                (gnu packages statistics))
+   
+   (define-public r-zoid …)
+   
+   (packages->manifest (list r-zoid r))
+   ```
+3. Run your code.
+
+   ```console
+   guix shell -m manifest.scm -- R -e 'library(zoid)'
+   ```
+
+Although Guix displays hints which modules are missing when trying to
+use an incomplete manifest, editing the manifest file to include all of
+them can be quite tedious.
+
+For R specifically the R package
+[guix.install](https://CRAN.R-project.org/package=guix.install) provides
+a way to automate this import. It also uses `guix import`, but references
+dependencies using package specifications like `(specification->package
+"r-bh")`. This way no extra logic to figure out the correct module
+imports is required. It then extends the package search path, including
+the newly written file at `~/.Rguix/packages.scm`, installs the package
+into the default Guix profile at `~/.guix-profile` and adds this profile
+to R’s search path.
+
+While this approach works well for individual users, Guix installations
+with a larger user-base, for instance institution-wide, would benefit
+from default availability of the entire CRAN package collection with
+pre-built substitutes to speed up installation times. Additionally
+reproducing environments would include less steps if the package
+recipes were available to anyone by default.
+
+## Introducing guix-cran
+
+GNU Guix provides a mechanism called “channels”,
+which can extend the package collection in Guix
+proper. [guix-cran](https://github.com/guix-science/guix-cran) does
+exactly that: It provides all CRAN packages missing in Guix proper in
+a channel and has all of the properties mentioned above. It can be
+installed globally via `/etc/guix/channels.scm` and packages can be
+pre-built on a central server.
+
+As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431
+packages available in guix-cran. 95% of them are buildable and only 0.5%
+of these builds are not reproducible via `guix build --check`.  It is
+also possible to use old package versions via `guix time-machine`, similar
+to what [MRAN](https://mran.microsoft.com/documents/rro/reproducibility)
+offers. However, that time-frame only spans about two months right now.
+
+Creating and updating guix-cran is [fully
+automated](https://github.com/guix-science/guix-cran-scripts) and happens
+without any human intervention. Improvements to the already very good
+CRAN importer also improve the channel’s quality. The channel itself
+is always in a usable state, because updates are tested with `guix pull`
+before committing and pushing them. However some packages may not build
+or work, because (usually undeclared) build or runtime dependencies are
+missing. This could be improved through better auto-detection in the
+CRAN importer.
+
+Currently building the channel derivation is very slow, most
+likely due to Guile performance issues. For this reason packages
+are split into files by first letter.  This way they can
+still be referenced deterministically by the first letter of
+their name.  Since the number of loadable modules is [limited to
+8192](https://www.mail-archive.com/guile-devel@gnu.org/msg16244.html),
+creating one module file per package is not possible and putting them
+all into the same file is even slower.
+
+The channel is not signed, because all changes are automated anyway.
+
+## Usage
+ 
+Using guix-cran requires the following steps:
+
+1. Create `channels.scm`:
+
+   ```scheme
+   (cons
+     (channel
+       (name 'guix-cran)
+       (url "https://github.com/guix-science/guix-cran.git";))
+     %default-channels)
+   ```
+2. Create `manifest.scm`:
+
+   ```scheme
+   (specifications->manifest '("r-zoid" "r"))
+   ```
+3. Run:
+
+   ```console
+   guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 
'library(zoid)'
+   ```
+
+For true reproducibility it’s necessary to pin the channels to a
+specific commit by running
+
+```console
+guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm
+```
+
+once and using `channels.pinned.scm` instead of `channels.scm` from there on.
+
+## Appendix
+
+Ludovic Courtès, Simon Tournier and Ricardo Wurmus provided valuable
+feedback to the draft of this post.
+
+The channel statistics above can be reproduced using the following
+manifest (`channels.scm`):
+
+```scheme
+(list
+  (channel
+    (name 'guix)
+    (url "https://git.savannah.gnu.org/git/guix.git";)
+    (branch "master")
+    (commit
+      "4781f0458de7419606b71bdf0fe56bca83ace910")
+    (introduction
+      (make-channel-introduction
+        "9edb3f66fd807b096b48283debdcddccfea34bad"
+        (openpgp-fingerprint
+          "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
+  (channel
+    (name 'guix-cran)
+    (url "https://github.com/guix-science/guix-cran.git";)
+    (branch "master")
+    (commit
+      "cc7394098f306550c476316710ccad20a510fa4b")))
+```
+
+And the following Scheme code to obtain a list of all packages provided
+by guix-cran (`list-packages.scm`):
+
+```scheme
+(use-modules (guix discovery)
+             (gnu packages)
+             (guix modules)
+             (guix utils)
+             (guix packages))
+(let* ((modules (all-modules (%package-module-path)))
+       (packages (fold-packages
+                   (lambda (p accum)
+                     (let ((mod (file-name->module-name (location-file 
(package-location p)))))
+                       (if (member (car mod) '(guix-cran))
+                         (cons p accum)
+                         accum)))
+                   '() modules)))
+  (for-each (lambda (p) (format #t "~a~%" (package-name p))) packages))
+```
+
+And this Bash script:
+
+```bash
+#!/bin/sh
+
+guix pull -p guix-profile -C channels.scm
+export GUIX_PROFILE=`pwd`/guix-profile
+source guix-profile/etc/profile
+guix repl list-packages.scm > packages
+cat packages| parallel -j 4 'rm -f builds/{} && guix build --no-grafts 
--timeout=300 -r builds/{} -q {} 2>&1 && guix build --no-grafts --timeout=300 
--check -q {} 2>&1' | tee build.log
+
+echo "total" && wc -l packages
+echo "success" && sort -u build.log | grep '^/gnu/store' | wc -l
+echo "failure" && sort -u build.log | grep 'failed$' | wc -l
+echo "non-reproducible" && sort -u build.log | grep 'differs$' | wc -l
+```
+
-- 
2.38.1

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]