qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: race condition in display device caused by run_on_cpu() dropping the


From: Peter Maydell
Subject: Re: race condition in display device caused by run_on_cpu() dropping the iothread lock
Date: Mon, 15 Aug 2022 14:02:47 +0100

On Mon, 15 Aug 2022 at 12:22, Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> On Mon, Aug 01, 2022 at 02:23:55PM +0100, Peter Maydell wrote:
> > I've been debugging a segfault in the raspi3b display device, and I've
> > tracked it down to a race condition, but I'm not sure what the right
> > way to fix it is...
> >
> > The race is that a vCPU thread is handling a guest register write that
> > says "resize the framebuffer", which it implements by calling
> > qemu_console_resize().
>
> [ back online after vacation ]
>
> Easiest is probably to not instantly resize the display surface but
> let the update handler do that on the next display refresh.

I feel like this will fix the immediate crash but isn't
addressing the wider underlying problem. (For instance, if the
user does something with the UI at just the wrong moment this
can probably get in during the we-dropped-the-iothread-lock window.)

> Many display devices do that anyway because often multiple register
> updates are needed to perform a resize and you don't want your ui
> window run through all the temporary states ...
>
> Alternative: The DisplaySurface is backed by pixman images which are
> reference counted.  Some qemu code which depends on the backing store
> staying around while not holding the iolock work with the pixman image
> directly because they can just take a reference then to avoid the image
> being freed while they use it.
>
> >  * memory_region_snapshot_and_clear_dirty() ends up calling run_on_cpu(),
> >    which briefly drops the iothread lock.
>
> Oh.  Is that new?

Since commit 9458a9a1df1a4 in 2018.

> > How is this intended to work? I feel like if run_on_cpu() silently
> > drops the iothread lock this probably invalidates a lot of assumptions
> > that QEMU code makes, especially in this kind of setup where
> > the code making the assumptions is several layers in the callstack
> > above whatever it is that ends up calling run_on_cpu()...
>
> Indeed.  The display update code paths using dirty bitmap snapshots
> certainly don't expect that.

Yeah. The problem is that to fix the bug that 9458a9a1df1a4 is
trying to fix we really do have to allow guest code to run,
because we need to make sure that the TCG CPU thread has
finished writing to RAM and got out of the generated code
block, otherwise the dirty flag won't be consistent.

-- PMM



reply via email to

[Prev in Thread] Current Thread [Next in Thread]