[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
09/45: reppar: Write about limitations.
From: |
Ludovic Courtès |
Subject: |
09/45: reppar: Write about limitations. |
Date: |
Tue, 09 Jun 2015 12:37:01 +0000 |
civodul pushed a commit to branch master
in repository maintenance.
commit bbc84361df12ba4723cb6f755d2c47bd44992a36
Author: Ludovic Courtès <address@hidden>
Date: Fri May 29 18:38:12 2015 +0200
reppar: Write about limitations.
---
doc/reppar-2015/outline.org | 14 +++---
doc/reppar-2015/reproducible-hpc.skb | 76 ++++++++++++++++++++++++++++------
2 files changed, 70 insertions(+), 20 deletions(-)
diff --git a/doc/reppar-2015/outline.org b/doc/reppar-2015/outline.org
index 947f474..2e3833f 100644
--- a/doc/reppar-2015/outline.org
+++ b/doc/reppar-2015/outline.org
@@ -139,6 +139,13 @@
+ binaries become non-portable
+ tweaking the recipe of say, ATLAS, means rebuilding a large part
of the DAG
+ - no proprietary software
+ + common in HPC (GPUs, linear algebra)
+ + but this is a strength: reproducible science cannot be built on
+ black boxes, and experimentation needs the ability to fiddle with
+ the software
+ - no "virtual dependencies" like "mpi", "runtime system" Ã la Spack
+ - no command-line interface (yet) to tweak the DAG Ã la Spack
- software "archeology" is limited
+ reusing specific, old versions of compilers or libraries means
rewriting those recipes (they may have never existed in Guix
@@ -147,13 +154,6 @@
+ daemon, substitutes, network access, etc.
- numerical reproducibility? (cf. "Designing Bit-Reproducible Portable
High-Performance Applications")
- - no proprietary software
- + common in HPC (GPUs, linear algebra)
- + but this is a strength: reproducible science cannot be built on
- black boxes, and experimentation needs the ability to fiddle with
- the software
- - no "virtual dependencies" like "mpi", "runtime system" Ã la Spack
- - no command-line interface (yet) to tweak the DAG Ã la Spack
* Conclusion
diff --git a/doc/reppar-2015/reproducible-hpc.skb
b/doc/reppar-2015/reproducible-hpc.skb
index 4c9f3bd..32393c1 100644
--- a/doc/reppar-2015/reproducible-hpc.skb
+++ b/doc/reppar-2015/reproducible-hpc.skb
@@ -501,24 +501,74 @@ is by writing a function that recursively adjusts the
package labeled
(p [No matter how complex the transformations are, a package
object unambiguously represents a reproducible build process.]))
- (section :title [Going Further] ;active papers
+ (section :title [Going Further] ;active papers + gexps
:ident "active"))
(chapter :title [Limitations and Challenges]
:ident "limitations"
- (p [Nix and Guix address many of the reproducibility issues
-encountered in package deployment, and Guix provides APIs and a
-programming environment aiming to facilitate the development of package
-variants as is useful in HPC. Yet, to our knowledge, neither Guix nor
-Nix are widely deployed on HPC systems. An obvious reason that limits
-adoption is the requirement to have the build daemon run with root
-privileges,(---)without which it would not be able to use the Linux
-kernel container facilities that allow it to isolate build processes and
-maximize build reproducibility. System administrators are wary of
-installing privileged daemons, and so HPC system users trade
-reproducibility for practical approaches.])
- )
+ (p (emph [Privileged daemon.]) [ Nix and Guix address many of the
+reproducibility issues encountered in package deployment, and Guix
+provides APIs and a programming environment aiming to facilitate the
+development of package variants as is useful in HPC. Yet, to our
+knowledge, neither Guix nor Nix are widely deployed on HPC systems. An
+obvious reason that limits adoption is the requirement to have the build
+daemon run with root privileges,(---)without which it would not be able
+to use the Linux kernel container facilities that allow it to isolate
+build processes and maximize build reproducibility. System
+administrators are wary of installing privileged daemons, and so HPC
+system users trade reproducibility for practical approaches.])
+
+ (p (emph [Cluster setup.])[ All the ,(tt [guix]) commands are
+actually clients of the daemon. In a typical cluster setup, system
+administrators may want to run a single daemon on one specific node and
+to share ,(tt [/gnu/store]) among all the nodes. At the time of
+writing, Guix does not yet allow communication with a remote daemon.
+For this reason, Guix users at the MDC are required to manage their
+profiles from a specific node; other nodes can use the profiles, but not
+modify them. Allowing the ,(tt [guix]) commands to communicate with a
+remote daemon will address this issue.])
+ (p [In a typical cluster setup, compute nodes completely lack
+access to the Internet. Yet, the daemon needs to be able to download
+source code tarballs or pre-built binaries from external server. Thus,
+the daemon must run on a node with Internet access, which could be
+contrary to the policy on some clusters.])
+
+ (p (emph [Remaining non-determinism.])[ Despite the use of
+isolated containers to run build processes, there are still a few source
+of non-determinism that can impede reproducibility. In particular,
+details about the operating system kernel and the hardware begin used
+can ``leak'' to build processes. For example, the kernel Linux provides
+system calls such as ,(tt [uname]) and file system interfaces such as
+,(tt [/proc/cpuinfo]) that leak information about the host; independent
+builds on different hosts could lead to different results if this
+information is used. Likewise, the ,(tt [cpuid]) instruction leaks
+hardware details.])
+ (p [Fortunately, few software packages depend on this information.
+Yet, the proportion of packages depending on it is higher in the HPC
+world. A notable example is the ATLAS linear algebra system, which
+fine-tunes itself based on details about the CPU micro-architectures.
+Similarly, profile-guided optimization (PGO), where the compiler
+optimizes code based on a profile gathered in a previous run, undermines
+reproducibility. Running build processes in full-blown virtual machines
+would help address some of these issues, but with a potentially
+significant impact on build performance, and possibly preventing
+important optimization techniques in the HPC context.])
+
+ (p (emph [Proprietary software.])[ GNU,(~)Guix does not provide
+proprietary software packages. Unfortunately, proprietary software is
+still relatively common in HPC, be it linear algebra libraries or GPU
+support. Yet, we see it as a strength more than a limitation. Often,
+these ``black boxes'' inherently limit reproducibility,(---)how is one
+going to reproduce a software environment if they are not given the
+right to run the software in the first place? What if the software
+depends on the ability to ``call home'' to function at all? More
+importantly, we view reproducible software environments and reproducible
+science as a tool towards the goal of improved and shared knowledge;
+developers who deny the freedom to study and modify their code work
+against this goal.])
+
+ (p (bold [FIXME: Anything else?])))
(chapter :title [Related Work] :ident "related")
- branch master updated (028164d -> 23a0b66), Ludovic Courtès, 2015/06/09
- 01/45: reviews: Add more links., Ludovic Courtès, 2015/06/09
- 03/45: reppar: Mention CMake., Ludovic Courtès, 2015/06/09
- 05/45: reppar: Fix a couple of typos., Ludovic Courtès, 2015/06/09
- 06/45: reppar: More on StarPU and Chameleon., Ludovic Courtès, 2015/06/09
- 08/45: reppar: Add missing bibliography file., Ludovic Courtès, 2015/06/09
- 09/45: reppar: Write about limitations.,
Ludovic Courtès <=
- 07/45: reppar: Augment outline., Ludovic Courtès, 2015/06/09
- 04/45: reppar: Write about the HiePACS/Runtime use case., Ludovic Courtès, 2015/06/09
- 15/45: reppar: Add some sort of a conclusion., Ludovic Courtès, 2015/06/09
- 16/45: reppar: Fix typo., Ludovic Courtès, 2015/06/09
- 17/45: reppar: Remove empty figure., Ludovic Courtès, 2015/06/09
- 13/45: reppar: Implement Ricardo's suggestions., Ludovic Courtès, 2015/06/09
- 20/45: reppar: Remove bibliography entries for Web sites., Ludovic Courtès, 2015/06/09
- 18/45: reppar: Add MDC experience report + comments on RPMs., Ludovic Courtès, 2015/06/09
- 19/45: reppar: Fix typos, improve wording., Ludovic Courtès, 2015/06/09
- 10/45: reppar: Add "Related Work" section., Ludovic Courtès, 2015/06/09