[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Packaging Slurm
From: |
Ludovic Courtès |
Subject: |
Re: Packaging Slurm |
Date: |
Thu, 17 Mar 2022 19:25:11 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Hello,
Jean-Christophe HAESSIG <haessigj@igbmc.fr> skribis:
> I don't really know what the implications of this would be. I continued
> exploring packaging Slurm with Guix and deploying it on Debian.
> I feel what i'm trying to do is slightly out of scope of Guix's intent :
> I used guix pack with various options -R, -RR but these are made to
> enable regular users to run software from guix packages. When the
> software is intended to be run by root, things seem to go awry. I had
> errors because the program tries to switch user and groups.
>
> --------------
> mount("none", "/tmp/guix-exec-C6ZnPc", "tmpfs", 0, NULL) = 0
> clone(child_stack=NULL, flags=CLONE_NEWNS|CLONE_NEWUSER|SIGCHLD) = 4061
> openat(AT_FDCWD, "/proc/4061/setgroups", O_WRONLY) = 3
> write(3, "deny\0", 5) = 5
> close(3) = 0
> getuid() = 0
> --------------
>
> and later :
>
> --------------
> [pid 4061] newfstatat(5, "", {st_mode=S_IFREG|0644, st_size=10406312,
> ...}, AT_EMPTY_PATH) = 0
> [pid 4061] setgroups(2, [3000, 51692]) = -1 EPERM (Operation not permitted)
> [pid 4061] poll([{fd=2, events=POLLOUT}], 1, 5000) = 1 ([{fd=2,
> revents=POLLOUT}])
> [pid 4061] newfstatat(2, "", {st_mode=S_IFIFO|0600, st_size=0, ...},
> AT_EMPTY_PATH) = 0
> [pid 4061] write(2, "slurmdbd: fatal: Failed to set s"..., 89slurmdbd:
> fatal: Failed to set supplementary groups, initgroups: Operation not
> permitted
> --------------
Can you try with:
GUIX_EXECUTION_ENGINE=fakechroot ./bin/sulrmbdb …
assuming you’re using a -RR pack?
> When the program is directly run with its final system user account, it
> starts correctly, still complains about not being able to fiddle with
> groups but doesn't crash:
>
> slurmdbd: Not running as root. Can't drop supplementary groups
>
> I only got this to work with -RR. -R got me other permission errors
> about not being able to setup subuid/subgid. System is Debian 10.9 with
> kernel 4.19. I expected containers to be well available and didn't know
> if the errors could come from what the program tries to do as root so I
> didn't check thoroughly yet.
Yeah, presumably things running in an unprivileged user namespace (this
is the case with -R and also with GUIX_EXECUTION_ENGINE=userns) can’t
call setgroups(2).
>> This would be a welcome change, though it would have a noticeable impact
>> on the closure size:
>>
>> --8<---------------cut here---------------start------------->8---
>> $ guix size slurm |tail -1
>> total: 134.7 MiB
>> $ guix size slurm mariadb |tail -1
>> total: 421.4 MiB
>> --8<---------------cut here---------------end--------------->8---
>
> I don't know if this could change anything but AFAIK mariadb is a
> dependency of slurmdbd only. Debian has separate packages for the
> accounting daemon, the controller daemon (slurmctld) and the client
> (slurmd) but there still is one source package.
Here we could have a separate output maybe:
https://guix.gnu.org/manual/devel/en/html_node/Packages-with-Multiple-Outputs.html
[...]
> For the time being, I'm still confident it can be done somehow, at least
> temporarily to enable a smooth upgrade. There are some minor hurdles
> e.g. Debian decided to change the paths in etc, var and the like to
> slurm-llnl. I managed to build several versions from git, I'm still
> blocked with 18.08 which doesn't compile because of "multiple definition
> of 'opt'". Only thing I can think of is something is too recent wrt
> slurm version.
FWIW I recently fixed that build error in Guix:
https://git.savannah.gnu.org/cgit/guix.git/commit/?id=dd98dc42fe8d898bbdf8b3f988120a81bb145f77
> I guess running Guix system would remove many problems but I'm not ready
> for that and since I'm interested in the shared software use case for a
> cluster, there would still remain the "battle for /gnu/store" issue.
Where “battle from /gnu/store” is the chicken-and-egg when booting,
right? (That is, if /gnu/store is on NFS, then how do you boot.)
HTH,
Ludo’.