grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] i386-pc: build verifiers API as module


From: Colin Watson
Subject: Re: [PATCH v2] i386-pc: build verifiers API as module
Date: Tue, 23 Mar 2021 13:27:15 +0000
User-agent: Mutt/1.10.1 (2018-07-13)

On Tue, Mar 23, 2021 at 12:37:20PM +0100, Javier Martinez Canillas wrote:
> On 3/23/21 5:16 AM, Michael Chang wrote
> > Afterall, keeping existing running system to survive update (NOT new
> > install) is really an important thing as many can't afford that to
> > happen. If we can make it any better to reduce the cost please consider
> > to do it. It doesn't conflict with the purpose to stop the short mbr gap
> > support, given we all know the broken system can be avoided in the first
> > place ...
> 
> My take on this is that we are conflating the need for distros to prevent an
> update to break their users' existing installs, with a new GRUB release that
> ships a set of CVE fixes (among other things).
> 
> I do agree that distros should avoid breaking users' installs, but don't think
> that upstream should keep supporting the 63 sectors embedding are size forever
> just to facilitate that. Otherwise it is a race to the bottom, since GRUB will
> have to pile up workarounds and massage the code just to keep this constraint.

We don't need to take this out of proportion, though: the actual
workarounds being discussed here are not that complicated, and certainly
*far* less complicated than some of the existing workarounds that exist
only for i386-pc.

> In the past, we posted other patches that we had been carrying for a long time
> downstream (i.e: "kern: Make grub_error() more verbose" [0]) and were told it
> couldn't be accepted due increasing the core.img size. It's really hard to add
> new stuff by keeping this constraint, without having a lot of ifdefery in 
> GRUB.

This has been an issue in some form or another for as long as I've been
working on GRUB.  It's not clear to me why it's particularly urgent to
break things now, when patches to make things work again exist and
aren't of unreasonable complexity compared to the rest of GRUB.  (To be
clear, I would be perfectly happy to be told that particular details of
some particular patch are problematic; it's the blanket refusal even for
simple and innocuous patches that don't add complexity or in some cases
might even remove complexity that I find unreasonable.)

I agree that the size constraints can be difficult at times, but well,
that's boot loaders for you; they're a constrained environment.  After
all, as I understand it, the entire system of a core GRUB image with
loadable modules originally came into existence due to this kind of size
constraint.

For those pointing to the documentation change as justification of
unilaterally ignoring certain constraints, I'd say that the number of
users who'll actually have seen that in advance of their systems being
broken is likely to be a pretty good approximation to zero.  I support
the basic idea of the documentation change, but it needs a *much* longer
deprecation cycle.  Yes, that may be inconvenient for us as GRUB
developers.

Zooming out a bit, it's in the interest of the whole free software
ecosystem for security updates to be reliable and not break existing use
cases.  I don't think any of us want users to be any more hesitant about
installing security updates than they already are: the world is
generally better off if we can deploy security updates as quickly as
possible.

The argument I'm making here is, I think, the same sort of argument by
which Torvalds famously has a very low tolerance for userspace
regressions in the Linux kernel.  The argument there runs something like
this: perhaps userspace is being objectively unreasonable, but
nevertheless, it exists in the real world and nobody is helped by having
to solve n-dimensional simultaneous equations in order to work out
whether they can install a given new kernel version.  Maintaining an
intolerance of regressions reduces the friction for installing updates.

> Also, most partitions tools have been aligning to a MiB boundary for quite 
> some
> time by now. So I don't believe is the correct trade-off for upstream to 
> support
> the 31 KiB embedding gap, in detriment of the 1 MiB case that's much more 
> common.

Right, I remember working on changes to improve alignment handling as
far back as 2010.  That's a good thing, but unfortunately the nature of
the beast here is that old mistakes stick around for a long time.

> Yes, it's a bummer to have to carry downstream patches (and we have a bunch in
> our distro) just to support existing users. But it's up to each distro to 
> decide
> what's the proper action to take for their supported OS releases (e.g: rebase 
> to
> the latest 2.06 version with some patches, only backport security fixes, etc).
> 
> For this particular case, it might be better for distros to just revert commit
> 9e95f45ceee ("verifiers: Move verifiers API to kernel image") instead of 
> making
> it conditional for i386-pc, adding complexity to the GRUB upstream code IMO.

That would also mean skipping or substantially modifying your lockdown
patch that followed it, which requires great care.  I did something like
this in various forms for our security updates because there wasn't much
choice there, but I'm not keen on it as a long-term solution.

In the long term, we do seem to want to have the verifiers API in the
kernel image at least for EFI platforms, don't we?  So reverting that
patch entirely seems like a bad move, and Michael's approach seems a
reasonable compromise.

> As mentioned, not every upstream change may work for all distros. For example,
> we had to revert e346414725a ("templates: Disable the os-prober by default")
> since otherwise users that don't have anything set in their /etc/default/grub
> wouldn't have entries for other OS in their GRUB config file anymore.

I'm very substantially less concerned about distro divergence in things
like grub-mkconfig.  While it can introduce confusion, it's typically
much less security-critical.


Let me be clear here: I'm no newcomer to the business of maintaining
large distro patch stacks for GRUB, far from it.  I've been doing it for
over a decade and I'm well aware of the trade-offs.  Often it's either
the right thing to do or at least the path of least resistance.  Some of
the patches I shepherd make it upstream, some never really do, some need
significant reworking; that's life.  I'm not expecting Debian to get to
zero upstream patches against GRUB before I hit retirement age, and I
don't think that's the point here.

Rather, I don't believe that the existence of distros as intermediaries
means that upstream can simply declare a whole class of regressions to
be a distro problem, with a mere handful of months of notice in the form
of a documentation change in an unreleased version before (no doubt
unintentionally) landing patches that break some of the systems in
question, and not expect at least some pushback about that.

Finally, I also notice that the person who sent the patch at the head of
this thread is also the same person who sent
https://lists.gnu.org/archive/html/grub-devel/2020-03/msg00059.html, so
it seems quite ironic to reject the patch on that basis ...

Thanks,

-- 
Colin Watson (he/him)                              [cjwatson@debian.org]



reply via email to

[Prev in Thread] Current Thread [Next in Thread]