[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Packaging Slurm (Was: Nss libraries not found when using guix pack)
From: |
Jean-Christophe HAESSIG |
Subject: |
Packaging Slurm (Was: Nss libraries not found when using guix pack) |
Date: |
Tue, 15 Mar 2022 10:16:21 +0000 |
On 08/03/2022 11:40, Ludovic Courtès wrote:
> Salut Jean-Christophe, :-)
Salut,
>> I jumped that hurdle with LD_PRELOAD, but this is not an acceptable fix
>> of course.
>
> Yeah, I did something similar in the past:
>
> https://lists.gnu.org/archive/html/guix-devel/2020-08/msg00168.html
>
> Maybe we could have a package transformation option, say
> ‘--with-nss-plugins=…’, that would wrap binaries to have LD_LIBRARY_PATH
> pointing to the chosen NSS plugins.
>
> Not pretty, but I’m afraid this is hardly avoidable.
>
> Thoughts?
I don't really know what the implications of this would be. I continued
exploring packaging Slurm with Guix and deploying it on Debian.
I feel what i'm trying to do is slightly out of scope of Guix's intent :
I used guix pack with various options -R, -RR but these are made to
enable regular users to run software from guix packages. When the
software is intended to be run by root, things seem to go awry. I had
errors because the program tries to switch user and groups.
--------------
mount("none", "/tmp/guix-exec-C6ZnPc", "tmpfs", 0, NULL) = 0
clone(child_stack=NULL, flags=CLONE_NEWNS|CLONE_NEWUSER|SIGCHLD) = 4061
openat(AT_FDCWD, "/proc/4061/setgroups", O_WRONLY) = 3
write(3, "deny\0", 5) = 5
close(3) = 0
getuid() = 0
--------------
and later :
--------------
[pid 4061] newfstatat(5, "", {st_mode=S_IFREG|0644, st_size=10406312,
...}, AT_EMPTY_PATH) = 0
[pid 4061] setgroups(2, [3000, 51692]) = -1 EPERM (Operation not permitted)
[pid 4061] poll([{fd=2, events=POLLOUT}], 1, 5000) = 1 ([{fd=2,
revents=POLLOUT}])
[pid 4061] newfstatat(2, "", {st_mode=S_IFIFO|0600, st_size=0, ...},
AT_EMPTY_PATH) = 0
[pid 4061] write(2, "slurmdbd: fatal: Failed to set s"..., 89slurmdbd:
fatal: Failed to set supplementary groups, initgroups: Operation not
permitted
--------------
When the program is directly run with its final system user account, it
starts correctly, still complains about not being able to fiddle with
groups but doesn't crash:
slurmdbd: Not running as root. Can't drop supplementary groups
I only got this to work with -RR. -R got me other permission errors
about not being able to setup subuid/subgid. System is Debian 10.9 with
kernel 4.19. I expected containers to be well available and didn't know
if the errors could come from what the program tries to do as root so I
didn't check thoroughly yet.
>> However, I did that only to realize that Slurm in guix is compiled
>> without mysql support, so I'll need to change the package, which I
>> have never done.
I managed to compile with mysql thanks to input from others. Thanks to them.
> This would be a welcome change, though it would have a noticeable impact
> on the closure size:
>
> --8<---------------cut here---------------start------------->8---
> $ guix size slurm |tail -1
> total: 134.7 MiB
> $ guix size slurm mariadb |tail -1
> total: 421.4 MiB
> --8<---------------cut here---------------end--------------->8---
I don't know if this could change anything but AFAIK mariadb is a
dependency of slurmdbd only. Debian has separate packages for the
accounting daemon, the controller daemon (slurmctld) and the client
(slurmd) but there still is one source package.
Since only one host runs the dbd, not having to bundle mariadb libs on
all the clients would reduce the bill - if it is possible to cherry-pick
binaries like that in Guix.
>> I wanted to use Slurm from Guix because Debian does not provide every
>> possible Slurm version. This can be a problem when a Slurm cluster must
>> be upgraded without shutting it down completely. I hoped to gain some
>> independence from my host distribution but it appears that won't be so
>> simple...
> Interesting. From our earlier discussion, this sounds like quite an
> endeavor, but I’d be curious to know what the stumbling blocks are and
> how we can overcome them!
For the time being, I'm still confident it can be done somehow, at least
temporarily to enable a smooth upgrade. There are some minor hurdles
e.g. Debian decided to change the paths in etc, var and the like to
slurm-llnl. I managed to build several versions from git, I'm still
blocked with 18.08 which doesn't compile because of "multiple definition
of 'opt'". Only thing I can think of is something is too recent wrt
slurm version.
I guess running Guix system would remove many problems but I'm not ready
for that and since I'm interested in the shared software use case for a
cluster, there would still remain the "battle for /gnu/store" issue.
Thanks,
JC