qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory


From: Ard Biesheuvel
Subject: Re: [PATCH v2] hw/i386: place setup_data at fixed place in memory
Date: Thu, 4 Aug 2022 12:26:00 +0200

On Thu, 4 Aug 2022 at 11:25, Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Thu, Aug 04, 2022 at 10:58:36AM +0200, Laszlo Ersek wrote:
> > On 08/04/22 09:03, Michael S. Tsirkin wrote:
> > > On Thu, Aug 04, 2022 at 02:44:11AM +0200, Jason A. Donenfeld wrote:
> > >> The boot parameter header refers to setup_data at an absolute address,
> > >> and each setup_data refers to the next setup_data at an absolute address
> > >> too. Currently QEMU simply puts the setup_datas right after the kernel
> > >> image, and since the kernel_image is loaded at prot_addr -- a fixed
> > >> address knowable to QEMU apriori -- the setup_data absolute address
> > >> winds up being just `prot_addr + a_fixed_offset_into_kernel_image`.
> > >>
> > >> This mostly works fine, so long as the kernel image really is loaded at
> > >> prot_addr. However, OVMF doesn't load the kernel at prot_addr, and
> > >> generally EFI doesn't give a good way of predicting where it's going to
> > >> load the kernel. So when it loads it at some address != prot_addr, the
> > >> absolute addresses in setup_data now point somewhere bogus, causing
> > >> crashes when EFI stub tries to follow the next link.
> > >>
> > >> Fix this by placing setup_data at some fixed place in memory, relative
> > >> to real_addr, not as part of the kernel image, and then pointing the
> > >> setup_data absolute address to that fixed place in memory. This way,
> > >> even if OVMF or other chains relocate the kernel image, the boot
> > >> parameter still points to the correct absolute address.
> > >>
> > >> Fixes: 3cbeb52467 ("hw/i386: add device tree support")
> > >> Reported-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> > >> Cc: Richard Henderson <richard.henderson@linaro.org>
> > >> Cc: Peter Maydell <peter.maydell@linaro.org>
> > >> Cc: Michael S. Tsirkin <mst@redhat.com>
> > >> Cc: Daniel P. Berrangé <berrange@redhat.com>
> > >> Cc: Gerd Hoffmann <kraxel@redhat.com>
> > >> Cc: Ard Biesheuvel <ardb@kernel.org>
> > >> Cc: linux-efi@vger.kernel.org
> > >> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> > >
> > > Didn't read the patch yet.
> > > Adding Laszlo.
> >
> > As I said in
> > <8bcc7826-91ab-855e-7151-2e9add88025a@redhat.com">http://mid.mail-archive.com/8bcc7826-91ab-855e-7151-2e9add88025a@redhat.com>,
> > I don't believe that the setup_data chaining described in
> > <https://www.kernel.org/doc/Documentation/x86/boot.rst> can be made work
> > for UEFI guests at all, with QEMU pre-populating the links with GPAs.
> >
> > However, rather than introducing a new info channel, or reusing an
> > existent one (ACPI linker/loader, GUID-ed structure chaining in pflash),
> > for the sake of this feature, I suggest simply disabling this feature
> > for UEFI guests. setup_data chaining has not been necessary for UEFI
> > guests for years (this is the first time I've heard about it in more
> > than a decade), and the particular use case (provide guests with good
> > random seed) is solved for UEFI guests via virtio-rng / EFI_RNG_PROTOCOL.
> >
> > ... Now, here's my problem: microvm, and Xen.
> >
> > As far as I can tell, QEMU can determine (it already does determine)
> > whether the guest uses UEFI or not, for the "pc" and "q35" machine
> > types. But not for microvm or Xen!
> >
> > Namely, from pc_system_firmware_init() [hw/i386/pc_sysfw.c], we can
> > derive that
> >
> >   pflash_cfi01_get_blk(pcms->flash[0])
> >
> > returning NULL vs. non-NULL stands for "BIOS vs. UEFI". Note that this
> > is only valid after the inital part of pc_system_firmware_init() has run
> > ("Map legacy -drive if=pflash to machine properties"), but that is not a
> > problem, given the following call tree:
>
> I don't beleve that's a valid check anymore since Gerd introduced the
> ability to load UEFI via -bios, for UEFI builds without persistent
> variables. ( a8152c4e4613c70c2f0573a82babbc8acc00cf90 )
>

I think there is a fundamental flaw in the QEMU logic where it adds
setup_data nodes to the *file* representation of the kernel image.

IOW, loading the kernei image at address x, creating setup_data nodes
in memory relative to x and then invoking the kernel image directly
(as kexec does as well, AIUI) is perfectly fine.

Managing a file system abstraction and a generic interface to load its
contents, and using it to load the kernel image anywhere in memory is
also fine, and OVMF -kernel relies on this.

It is the combination of the two that explodes, unsurprisingly. Making
inferences about which kind of firmware is invoking the file loading
interface, and gating this behavior accordingly just papers over the
problem.

Jason and I have been discussing this over IRC, and there are
essentially 3 places we could address this:
1) in the Linux/x86 EFI stub, which has a 'pure EFI' entrypoint and a
'EFI handover protocol' entrypoint, and we could simply wipe the
setup_data pointer in the former;
2) in OVMF's kernel loader, where we could 'fix up' the setup_data field
3) in QEMU, which [as I laid out above] does something it shouldn't be doing.

I strongly object to 2), as it would involve teaching OVMF's 'pure
EFI' boot path about the intricacies of struct boot_params etc, which
defeats the purpose of having a generic EFI boot path (Note that much
of my work on the Linux/EFI subsystem over the past years has been
focused on getting rid of all the arch-specific hacks and layering
violations and making as much of the EFI code in Linux arch-agnostic
and adhere to EFI principles and APIs)

Given that neither SETUP_DTB nor SETUP_RNG_SEED are particularly
relevant for pure EFI boot, clearing setup_data in the pure EFI
entrypoint in the EFI stub is not unreasonable, but not very future
proof, given that it limits how we might use setup_data for other
purposes in the future (although one might argue we could just drop
code again from the stub when new uses of setup_data are introduced
into the kernel)

However, our conclusion was that it is really QEMU that is doing
something strange here, so IMHO, fixing QEMU is really the most
suitable approach.

-- 
Ard.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]