[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SSSD, Kerberized NFSv4 and Bacula OFF TOPIC PRAISE
From: |
jbranso |
Subject: |
Re: SSSD, Kerberized NFSv4 and Bacula OFF TOPIC PRAISE |
Date: |
Fri, 25 Aug 2023 21:43:38 +0000 |
August 24, 2023 3:57 PM, "Martin Baulig" <martin@baulig.is> wrote:
> Hello,
>
> About 2–3 months ago, I got an initial prototype of Bacula working on GNU
> Guix. I had the Bacula
> Director, two separate Storage Daemons and the Baculum web interface running
> in a GNU Guix VM on my
> Synology NAS.
I had to look it up...Apparently Bacula is a way to back up computers on a
network. Sounds cool!
https://en.wikipedia.org/wiki/Bacula
> At some point, I would really love to upstream these changes, but it's quite
> a complex
> configuration - and I also had to do quite a few refactorings and clean-ups
> for this to pass my
> personal quality standards.
>
> One issue I had to deal with is that Bacula heavily relies upon clear-text
> passwords in its various
> configuration files. To communicate between its different components, it uses
> TLS with Client
> Certificates in addition to passwords. So in addition to writing clear-text
> passwords into various
> configuration files, the X509 private keys, DH parameters, etc. also need to
> be installed into
> appropriate directories.
>
> I came up with quite an elegant solution for this problem - and introduced
> three new services and
> an extension.
>
> * My "guix secrets" tool provides a command-line interface to maintain a
> "secrets database"
> (/etc/guix/secrets.db) that's only accessible to root. It can contain simple
> passwords, arbitrary
> text (like for instance X509 certificates in PEM format) and binary data.
I know guix has been wanting to figure out how to have services that need
passwords in the configuration
file. This sounds like it could work!
> * The problem with the standard activation service is that it runs early in
> the boot process and
> all activation actions are run in a seemingly random way, there isn't a way
> to provide any real
> dependencies. Any failures could possibly prevent the system from fully
> booting up.
>
> I created a new "activation-tree-service-type" - currently experimental and a
> bit in a refactoring
> stage. It creates a separate one-shot Shepherd service for each activation
> action, and you can
> declare dependencies between them.
>
> Since it's using normal Shepherd services underneath the hood, you could for
> instance depend on
> user-homes and the network being up, so you could SSH in and use GNU Emacs to
> fix any issues.
>
> And any arbitrary Shepherd service could also depend on some of these actions
> - such as for
> instance the various Bacula services.
>
> * Then I created "service-accounts-service-type" that extends the standard
> account creation with
> the ability to also create home directories, run and PID directories and the
> log-file. It's mostly
> used under the hood.
>
> * Finally, "secrets-service-type" depends on all of the above to do its work.
>
> It takes a template file - which is typically interned in the store -
> containing special "tokens"
> that tell it which keys to look up from the secrets database.
>
> It uses the above mentioned service-accounts-service-type to specify where
> the substituted
> configuration file should be installed, insuring that the directory has been
> set up with
> appropriate permissions.
>
> And then it substitutes the special tokens from the template file with the
> actual secrets. For
> instance "@password:foo@" would be substituted with a password entry called
> "foo". For arbitrary
> text or binary data, the template would contain something like "@blob:data@"
> - this will be
> substituted with the full path name of a file where the actual data will be
> written to.
>
> * * * *
>
> All of the above has been mostly working in early August, just one problem
> remained:
>
> I do not want to store any of the actual data inside the VM, but rather use a
> folder on the NAS
> itself. Even the PostgreSQL database lives on a NFS-mounted volume. The
> problem is quite simply
> that Synology's Virtual Machine Manager software does not provide any way of
> exporting or importing
> volumes. You cannot even move them between VMs. And I really don't want to
> tie my data to the
> lifecycle of the VM.
>
> Using traditional NFS (either version 2 or 3) worked perfectly fine and since
> this is a very
> locked-down environment, encrypting the NFS traffic really isn't needed.
> Like, and attacker that
> got access to either the NAS or the VM running inside it would already have
> all the data anyway.
>
> However, I wanted to give it a try regardless and see whether I could get
> SSSD working with GNU
> Guix.
>
> And this is where the nightmares began!
>
> Firstly, I had to make a few changes to GNU Guix itself, most of which I'd
> like to upstream. The
> code is in my public GitLab repo, but it's a bit of a mess right now, and
> I'll need at least a day
> or two to clean it up. But I also ran across a couple of questions and issues.
>
> * GNU Guix is currently using nfs-utils 2.4.3, whereas 2.6.3 is currently the
> latest version. We
> don't need to upgrade, but I would like to backport one change, affecting a
> single function. This
> is needed for idmap-daemon to work with arbitrary plugins.
>
> Back in nfs-utils 2.4.3, the plugin search path was hard-coded - and since
> that hard-coded path
> will be inside the store, other packages can't add anything to it.
>
> In later versions, this was changed to attempt to load the plugin from the
> library search path
> first, prior to falling back to the hard-coded default.
>
> * Once nfs-utils is patched, rpc.idmapd then needs to be started with
> LD_LIBRARY_PATH set to the
> plugin directories - similar to how it's done with nscd.
>
> I added a few new fields to idmap-service-type and nfs-service-type for this.
>
> It also looks like you can't instantiate idmap-service-type without
> nfs-service-type due to what
> seems to be a bug.
>
> It's currently using
>> (extend (lambda (config values) (first values)))
> which fails if there isn't any previous value. Replacing that with
> last-extension-or-cfg (from
> "(gnu home services xdg)") fixes that issue.
>
> * For the sssd package, this is currently built without nfsidmap support and
> has it's sysconfdir
> set to /etc.
>
> Was there a particular reason for this? I suppose nfsidmap support was
> disabled because it
> previously did not work?
>
> As for its sysconfdir - there isn't really anything confidential in the
> sssd.conf file, so I would
> rather have that interned in the store if possible. This requires a little
> patch to sssd, though,
> to disable its permission checks on the config file.
>
> * For the realmd package - it currently does not compile on GNU/Guix master.
> All that's needed is a
> small fix to the configure script. GNU/Guix master uses a newer version of
> GNU Glibc - there is no
> "__res_querydomain" in -lresolv anymore, that's now called "res_querydomain"
> and is in glibc.
>
> * To make realmd actually work, it needs a configuration file.
>
> Could we possibly either move it from (gnu packages admin) into (gnu packages
> sssd), or add a
> "realmd-sssd" package with a standard configuration file? A very simple
> config file will work fine,
> but it needs to contain the store paths of adcli. sssd and sss_cache.
>
> These are the parts that I got working so far. You can join the domain,
> acquire Kerberos tickets,
> mount the network share - and access is handled by the server according to
> the current user's
> Kerberos credentials. You also don't need to copy around any keytabs or
> anything for that, as would
> be required with Samba. This is just really cool.
>
> However, here's where the problems start:
>
> * I couldn't figure out how to use gssproxy - setting that environment
> variable doesn't seem to be
> doing anything, I ran the various daemons with strace and nothing was ever
> attempting to use the
> proxy. Then, I looked at the mit-krb5 source code as well as the nfs-utils
> and gss-daemon source
> code and couldn't find any reference to that environment variable either.
>
> Is it possible that Fedora / Red Hat is using some custom patches in their
> distribution.
>
> * I finally worked around that by installing client keytabs for my service
> principals, using my
> secrets service.
>
> Works great for local accounts, but using domain accounts gave me quite a bit
> of a headache!
>
> Let's say "storage" in a domain account. I can do "getent passwd storage" and
> it works. I can do
> "chown storage foo" on a local file system as root and then "ls -l storage"
> shows me the correct
> owner.
>
> On the mounted network share, root is mapped to the machine credential, so I
> have to create and
> chown things on the server. After a bit of starting / restarting nscd, sssd
> and gss-daemon, file
> permissions will also show up correctly in "ls -l".
>
> I can also do "su storage" as root and that works (after I create the home
> directory); "su -s
> /bin/sh storage -c id" works fine.
>
> * In guile, I can also do (getent "storage") and that works.
>
> However, it fails when I put that inside a G-Exp - to run it as part of a
> one-shot Shepherd
> service. I can open a pipe to "su -s /bin/sh storage -c
> /gnu/store/...-coreutils-../bin/id" and
> that works.
>
> One would assume that (getent) won't work inside a G-Exp because it doesn't
> have access to NSCD /
> SSSD.
>
> But why can I (invoke) "su" inside that same G-Exp and it works fine?
>
> My gut feeling tells me that this "su pipe" thing might not be the most
> reliable thing to depend
> on.
>
> The reason I need the domain account's UID is to put the Kerberos client
> keytab into
> "/var/krb5/user/<UID>/client.keytab". Maybe there's a way to use the username
> instead? I ran an
> "strace" on the gss-daemon and it currently only looks in that <UID>
> directory.
>
> * PostgreSQL - ... yeah, here it is getting interesting!
>
> The first question here is which user account to use - and whether to create
> a local or domain
> account.
>
> It seems like using a local "postgres" account might be the most robust thing
> to do. Any access to
> the mounted network share will be mapped to whichever Kerberos principal I
> place in the
> "client.keytab".
>
> Either way, the local "root" user will not have any access to the data
> directory - and the local
> "postgres" user will only have access to it once SSSD is up and running and
> it's mounted.
>
> I have an "activation-tree-service-type" action to mount the share once SSSD
> is ready and that
> seems to be working fine on system boot.
>
> However, for PostgreSQL, I'd probably have to provide my own service that
> uses the same activation
> logic - not create the data directory at all, create the local state and pid
> directory and log-file
> once we have the user's UID (which is trivial for a local "postgres" account,
> but more complicated
> for domain accounts).
>
> * Finally, each of Bacula's service accounts then also needs client keytabs
> installed and started
> in the correct order.
>
> * * * *
>
> Here, I start to wonder whether it's even worth the hassle. To summarize, to
> use Kerberized NFSv4,
> all of the following is needed:
>
> * Some patches to GNU Guix (most of which can probably be upstreamed
> regardless).
> * Complicated activation actions, to put client keytabs in the correct
> places, with the correct
> permissions.
> * Strict, particular order in which services need to be started up on system
> boot.
> * Manually creating directories on the server with the right owner and
> permissions.
> * Manually running "samba-tool domain exportkeytab
> --principal=<service-user>" for each service
> user, coping them over and adding to "guix secrets".
> * There will be quite a few as I have set up Bacula with strict privilege
> separation, even using
> different Storage Daemons for different backups, each running as a distinct
> user account.
> * Custom PostgreSQL service.
>
> Whereas with just using unencrypted NFSv3, I can:
>
> * Use GNU Guix master as-is.
> * Have my activation-tree-service-type create all the service accouts, their
> directories and
> everything with appropriate permissions.
> * Only run "guix secrets" locally, without the need to SSH into the server
> and run stuff as root
> there.
> * Have a much more simple activation logic.
>
> Bacula is something that I would really like to get running and most of my
> work so far has been to
> make that happen in a clean and stable manner.
>
> However, I am strongly leading towards declaring the entire SSSD endeavor a
> failed experiment and
> not pursue it any further.
>
> In case there is any interest from your part, then I'd gladly polish up my
> Guix changes and submit
> them as a series of patches. I was actually planning to have that done by the
> end of this week, but
> then SSSD took far more time than I had anticipated.
>
> Has anybody else ever made similar experiences or what are your
> recommendations?
>
> I'm about to head out for a longer weekend, going on a bit of a road trip to
> visit some friends, so
> this is a great point for me to take a break and then come fresh next week.
>
> Looking forward to hearing back from you and have a wonderful weekend,
>
> Martin Baulig
Congrats Martin! This whole email looks awesome!