[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Packaging Slurm (Was: Nss libraries not found when using guix pack)

From: Jean-Christophe HAESSIG
Subject: Packaging Slurm (Was: Nss libraries not found when using guix pack)
Date: Tue, 15 Mar 2022 10:16:21 +0000

On 08/03/2022 11:40, Ludovic Courtès wrote:
> Salut Jean-Christophe,  :-)

>> I jumped that hurdle with LD_PRELOAD, but this is not an acceptable fix
>> of course.
> Yeah, I did something similar in the past:
> Maybe we could have a package transformation option, say
> ‘--with-nss-plugins=…’, that would wrap binaries to have LD_LIBRARY_PATH
> pointing to the chosen NSS plugins.
> Not pretty, but I’m afraid this is hardly avoidable.
> Thoughts?

I don't really know what the implications of this would be. I continued 
exploring packaging Slurm with Guix and deploying it on Debian.
I feel what i'm trying to do is slightly out of scope of Guix's intent : 
I used guix pack with various options -R, -RR but these are made to 
enable regular users to run software from guix packages. When the 
software is intended to be run by root, things seem to go awry. I had 
errors because the program tries to switch user and groups.

mount("none", "/tmp/guix-exec-C6ZnPc", "tmpfs", 0, NULL) = 0
clone(child_stack=NULL, flags=CLONE_NEWNS|CLONE_NEWUSER|SIGCHLD) = 4061
openat(AT_FDCWD, "/proc/4061/setgroups", O_WRONLY) = 3
write(3, "deny\0", 5)                   = 5
close(3)                                = 0
getuid()                                = 0

and later :

[pid  4061] newfstatat(5, "", {st_mode=S_IFREG|0644, st_size=10406312, 
...}, AT_EMPTY_PATH) = 0
[pid  4061] setgroups(2, [3000, 51692]) = -1 EPERM (Operation not permitted)
[pid  4061] poll([{fd=2, events=POLLOUT}], 1, 5000) = 1 ([{fd=2, 
[pid  4061] newfstatat(2, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, 
[pid  4061] write(2, "slurmdbd: fatal: Failed to set s"..., 89slurmdbd: 
fatal: Failed to set supplementary groups, initgroups: Operation not 

When the program is directly run with its final system user account, it 
starts correctly, still complains about not being able to fiddle with 
groups but doesn't crash:

slurmdbd: Not running as root. Can't drop supplementary groups

I only got this to work with -RR. -R got me other permission errors 
about not being able to setup subuid/subgid. System is Debian 10.9 with 
kernel 4.19. I expected containers to be well available and didn't know 
if the errors could come from what the program tries to do as root so I 
didn't check thoroughly yet.

>> However, I did that only to realize that Slurm in guix is compiled
>> without mysql support, so I'll need to change the package, which I
>> have never done.

I managed to compile with mysql thanks to input from others. Thanks to them.

> This would be a welcome change, though it would have a noticeable impact
> on the closure size:
> --8<---------------cut here---------------start------------->8---
> $ guix size slurm |tail -1
> total: 134.7 MiB
> $ guix size slurm mariadb |tail -1
> total: 421.4 MiB
> --8<---------------cut here---------------end--------------->8---

I don't know if this could change anything but AFAIK mariadb is a 
dependency of slurmdbd only. Debian has separate packages for the 
accounting daemon, the controller daemon (slurmctld) and the client 
(slurmd) but there still is one source package.

Since only one host runs the dbd, not having to bundle mariadb libs on 
all the clients would reduce the bill - if it is possible to cherry-pick 
binaries like that in Guix.

>> I wanted to use Slurm from Guix because Debian does not provide every 
>> possible Slurm version. This can be a problem when a Slurm cluster must 
>> be upgraded without shutting it down completely. I hoped to gain some 
>> independence from my host distribution but it appears that won't be so 
>> simple...

> Interesting.  From our earlier discussion, this sounds like quite an
> endeavor, but I’d be curious to know what the stumbling blocks are and
> how we can overcome them!

For the time being, I'm still confident it can be done somehow, at least 
temporarily to enable a smooth upgrade. There are some minor hurdles 
e.g. Debian decided to change the paths in etc, var and the like to 
slurm-llnl. I managed to build several versions from git, I'm still 
blocked with 18.08 which doesn't compile because of "multiple definition 
of 'opt'". Only thing I can think of is something is too recent wrt 
slurm version.

I guess running Guix system would remove many problems but I'm not ready 
for that and since I'm interested in the shared software use case for a 
cluster, there would still remain the "battle for /gnu/store" issue.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]