SSSD, Kerberized NFSv4 and Bacula

My "guix secrets" tool provides a command-line interface to maintain a "secrets database" (/etc/guix/secrets.db) that's only accessible to root. It can contain simple passwords, arbitrary text (like for instance X509 certificates in PEM format) and binary data.
The problem with the standard activation service is that it runs early in the boot process and all activation actions are run in a seemingly random way, there isn't a way to provide any real dependencies. Any failures could possibly prevent the system from fully booting up.

I created a new "activation-tree-service-type" - currently experimental and a bit in a refactoring stage. It creates a separate one-shot Shepherd service for each activation action, and you can declare dependencies between them.

Since it's using normal Shepherd services underneath the hood, you could for instance depend on user-homes and the network being up, so you could SSH in and use GNU Emacs to fix any issues.

And any arbitrary Shepherd service could also depend on some of these actions - such as for instance the various Bacula services.
Then I created "service-accounts-service-type" that extends the standard account creation with the ability to also create home directories, run and PID directories and the log-file. It's mostly used under the hood.
Finally, "secrets-service-type" depends on all of the above to do its work.

It takes a template file - which is typically interned in the store - containing special "tokens" that tell it which keys to look up from the secrets database.

It uses the above mentioned service-accounts-service-type to specify where the substituted configuration file should be installed, insuring that the directory has been set up with appropriate permissions.

And then it substitutes the special tokens from the template file with the actual secrets. For instance "@password:foo@" would be substituted with a password entry called "foo". For arbitrary text or binary data, the template would contain something like "@blob:data@" - this will be substituted with the full path name of a file where the actual data will be written to.

* * * *

All of the above has been mostly working in early August, just one problem remained:

I do not want to store any of the actual data inside the VM, but rather use a folder on the NAS itself. Even the PostgreSQL database lives on a NFS-mounted volume. The problem is quite simply that Synology's Virtual Machine Manager software does not provide any way of exporting or importing volumes. You cannot even move them between VMs. And I really don't want to tie my data to the lifecycle of the VM.

Using traditional NFS (either version 2 or 3) worked perfectly fine and since this is a very locked-down environment, encrypting the NFS traffic really isn't needed. Like, and attacker that got access to either the NAS or the VM running inside it would already have all the data anyway.

However, I wanted to give it a try regardless and see whether I could get SSSD working with GNU Guix.

And this is where the nightmares began!

Firstly, I had to make a few changes to GNU Guix itself, most of which I'd like to upstream. The code is in my public GitLab repo, but it's a bit of a mess right now, and I'll need at least a day or two to clean it up. But I also ran across a couple of questions and issues.

GNU Guix is currently using nfs-utils 2.4.3, whereas 2.6.3 is currently the latest version. We don't need to upgrade, but I would like to backport one change, affecting a single function. This is needed for idmap-daemon to work with arbitrary plugins.

Back in nfs-utils 2.4.3, the plugin search path was hard-coded - and since that hard-coded path will be inside the store, other packages can't add anything to it.

In later versions, this was changed to attempt to load the plugin from the library search path first, prior to falling back to the hard-coded default.
Once nfs-utils is patched, rpc.idmapd then needs to be started with LD_LIBRARY_PATH set to the plugin directories - similar to how it's done with nscd.

I added a few new fields to idmap-service-type and nfs-service-type for this.

It also looks like you can't instantiate idmap-service-type without nfs-service-type due to what seems to be a bug.

It's currently using
(extend (lambda (config values) (first values)))
which fails if there isn't any previous value. Replacing that with last-extension-or-cfg (from "(gnu home services xdg)") fixes that issue.
For the sssd package, this is currently built without nfsidmap support and has it's sysconfdir set to /etc.

Was there a particular reason for this? I suppose nfsidmap support was disabled because it previously did not work?

As for its sysconfdir - there isn't really anything confidential in the sssd.conf file, so I would rather have that interned in the store if possible. This requires a little patch to sssd, though, to disable its permission checks on the config file.
For the realmd package - it currently does not compile on GNU/Guix master. All that's needed is a small fix to the configure script. GNU/Guix master uses a newer version of GNU Glibc - there is no "__res_querydomain" in -lresolv anymore, that's now called "res_querydomain" and is in glibc.
To make realmd actually work, it needs a configuration file.

Could we possibly either move it from (gnu packages admin) into (gnu packages sssd), or add a "realmd-sssd" package with a standard configuration file? A very simple config file will work fine, but it needs to contain the store paths of adcli. sssd and sss_cache.

These are the parts that I got working so far. You can join the domain, acquire Kerberos tickets, mount the network share - and access is handled by the server according to the current user's Kerberos credentials. You also don't need to copy around any keytabs or anything for that, as would be required with Samba. This is just really cool.

However, here's where the problems start:

I couldn't figure out how to use gssproxy - setting that environment variable doesn't seem to be doing anything, I ran the various daemons with strace and nothing was ever attempting to use the proxy. Then, I looked at the mit-krb5 source code as well as the nfs-utils and gss-daemon source code and couldn't find any reference to that environment variable either.

Is it possible that Fedora / Red Hat is using some custom patches in their distribution.
I finally worked around that by installing client keytabs for my service principals, using my secrets service.

Works great for local accounts, but using domain accounts gave me quite a bit of a headache!

Let's say "storage" in a domain account. I can do "getent passwd storage" and it works. I can do "chown storage foo" on a local file system as root and then "ls -l storage" shows me the correct owner.

On the mounted network share, root is mapped to the machine credential, so I have to create and chown things on the server. After a bit of starting / restarting nscd, sssd and gss-daemon, file permissions will also show up correctly in "ls -l".

I can also do "su storage" as root and that works (after I create the home directory); "su -s /bin/sh storage -c id" works fine.
In guile, I can also do (getent "storage") and that works.

However, it fails when I put that inside a G-Exp - to run it as part of a one-shot Shepherd service. I can open a pipe to "su -s /bin/sh storage -c /gnu/store/...-coreutils-../bin/id" and that works.

One would assume that (getent) won't work inside a G-Exp because it doesn't have access to NSCD / SSSD.

But why can I (invoke) "su" inside that same G-Exp and it works fine?

My gut feeling tells me that this "su pipe" thing might not be the most reliable thing to depend on.

The reason I need the domain account's UID is to put the Kerberos client keytab into "/var/krb5/user/<UID>/client.keytab". Maybe there's a way to use the username instead? I ran an "strace" on the gss-daemon and it currently only looks in that <UID> directory.
PostgreSQL - ... yeah, here it is getting interesting!

The first question here is which user account to use - and whether to create a local or domain account.

It seems like using a local "postgres" account might be the most robust thing to do. Any access to the mounted network share will be mapped to whichever Kerberos principal I place in the "client.keytab".

Either way, the local "root" user will not have any access to the data directory - and the local "postgres" user will only have access to it once SSSD is up and running and it's mounted.

I have an "activation-tree-service-type" action to mount the share once SSSD is ready and that seems to be working fine on system boot.

However, for PostgreSQL, I'd probably have to provide my own service that uses the same activation logic - not create the data directory at all, create the local state and pid directory and log-file once we have the user's UID (which is trivial for a local "postgres" account, but more complicated for domain accounts).
Finally, each of Bacula's service accounts then also needs client keytabs installed and started in the correct order.

* * * *

Here, I start to wonder whether it's even worth the hassle. To summarize, to use Kerberized NFSv4, all of the following is needed:

Some patches to GNU Guix (most of which can probably be upstreamed regardless).
Complicated activation actions, to put client keytabs in the correct places, with the correct permissions.
Strict, particular order in which services need to be started up on system boot.
Manually creating directories on the server with the right owner and permissions.
Manually running "samba-tool domain exportkeytab --principal=<service-user>" for each service user, coping them over and adding to "guix secrets".
There will be quite a few as I have set up Bacula with strict privilege separation, even using different Storage Daemons for different backups, each running as a distinct user account.
Custom PostgreSQL service.

Whereas with just using unencrypted NFSv3, I can:

Use GNU Guix master as-is.
Have my activation-tree-service-type create all the service accouts, their directories and everything with appropriate permissions.
Only run "guix secrets" locally, without the need to SSH into the server and run stuff as root there.
Have a much more simple activation logic.

Bacula is something that I would really like to get running and most of my work so far has been to make that happen in a clean and stable manner.

However, I am strongly leading towards declaring the entire SSSD endeavor a failed experiment and not pursue it any further.

In case there is any interest from your part, then I'd gladly polish up my Guix changes and submit them as a series of patches. I was actually planning to have that done by the end of this week, but then SSSD took far more time than I had anticipated.

Has anybody else ever made similar experiences or what are your recommendations?

I'm about to head out for a longer weekend, going on a bit of a road trip to visit some friends, so this is a great point for me to take a break and then come fresh next week.

Looking forward to hearing back from you and have a wonderful weekend,

Martin Baulig

From:	Martin Baulig
Subject:	SSSD, Kerberized NFSv4 and Bacula
Date:	Thu, 24 Aug 2023 19:55:05 +0000