help-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Packaging Slurm


From: Ludovic Courtès
Subject: Re: Packaging Slurm
Date: Thu, 17 Mar 2022 19:25:11 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Hello,

Jean-Christophe HAESSIG <haessigj@igbmc.fr> skribis:

> I don't really know what the implications of this would be. I continued 
> exploring packaging Slurm with Guix and deploying it on Debian.
> I feel what i'm trying to do is slightly out of scope of Guix's intent : 
> I used guix pack with various options -R, -RR but these are made to 
> enable regular users to run software from guix packages. When the 
> software is intended to be run by root, things seem to go awry. I had 
> errors because the program tries to switch user and groups.
>
> --------------
> mount("none", "/tmp/guix-exec-C6ZnPc", "tmpfs", 0, NULL) = 0
> clone(child_stack=NULL, flags=CLONE_NEWNS|CLONE_NEWUSER|SIGCHLD) = 4061
> openat(AT_FDCWD, "/proc/4061/setgroups", O_WRONLY) = 3
> write(3, "deny\0", 5)                   = 5
> close(3)                                = 0
> getuid()                                = 0
> --------------
>
> and later :
>
> --------------
> [pid  4061] newfstatat(5, "", {st_mode=S_IFREG|0644, st_size=10406312, 
> ...}, AT_EMPTY_PATH) = 0
> [pid  4061] setgroups(2, [3000, 51692]) = -1 EPERM (Operation not permitted)
> [pid  4061] poll([{fd=2, events=POLLOUT}], 1, 5000) = 1 ([{fd=2, 
> revents=POLLOUT}])
> [pid  4061] newfstatat(2, "", {st_mode=S_IFIFO|0600, st_size=0, ...}, 
> AT_EMPTY_PATH) = 0
> [pid  4061] write(2, "slurmdbd: fatal: Failed to set s"..., 89slurmdbd: 
> fatal: Failed to set supplementary groups, initgroups: Operation not 
> permitted
> --------------

Can you try with:

  GUIX_EXECUTION_ENGINE=fakechroot ./bin/sulrmbdb …

assuming you’re using a -RR pack?

> When the program is directly run with its final system user account, it 
> starts correctly, still complains about not being able to fiddle with 
> groups but doesn't crash:
>
> slurmdbd: Not running as root. Can't drop supplementary groups
>
> I only got this to work with -RR. -R got me other permission errors 
> about not being able to setup subuid/subgid. System is Debian 10.9 with 
> kernel 4.19. I expected containers to be well available and didn't know 
> if the errors could come from what the program tries to do as root so I 
> didn't check thoroughly yet.

Yeah, presumably things running in an unprivileged user namespace (this
is the case with -R and also with GUIX_EXECUTION_ENGINE=userns) can’t
call setgroups(2).

>> This would be a welcome change, though it would have a noticeable impact
>> on the closure size:
>> 
>> --8<---------------cut here---------------start------------->8---
>> $ guix size slurm |tail -1
>> total: 134.7 MiB
>> $ guix size slurm mariadb |tail -1
>> total: 421.4 MiB
>> --8<---------------cut here---------------end--------------->8---
>
> I don't know if this could change anything but AFAIK mariadb is a 
> dependency of slurmdbd only. Debian has separate packages for the 
> accounting daemon, the controller daemon (slurmctld) and the client 
> (slurmd) but there still is one source package.

Here we could have a separate output maybe:

  
https://guix.gnu.org/manual/devel/en/html_node/Packages-with-Multiple-Outputs.html

[...]

> For the time being, I'm still confident it can be done somehow, at least 
> temporarily to enable a smooth upgrade. There are some minor hurdles 
> e.g. Debian decided to change the paths in etc, var and the like to 
> slurm-llnl. I managed to build several versions from git, I'm still 
> blocked with 18.08 which doesn't compile because of "multiple definition 
> of 'opt'". Only thing I can think of is something is too recent wrt 
> slurm version.

FWIW I recently fixed that build error in Guix:

  
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=dd98dc42fe8d898bbdf8b3f988120a81bb145f77

> I guess running Guix system would remove many problems but I'm not ready 
> for that and since I'm interested in the shared software use case for a 
> cluster, there would still remain the "battle for /gnu/store" issue.

Where “battle from /gnu/store” is the chicken-and-egg when booting,
right?  (That is, if /gnu/store is on NFS, then how do you boot.)

HTH,
Ludo’.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]