qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH] mpqemu: Remove unlock/lock of iothread in mpqemu-link se


From: Alexander Duyck
Subject: Re: [RFC PATCH] mpqemu: Remove unlock/lock of iothread in mpqemu-link send and recv functions
Date: Tue, 24 May 2022 07:58:50 -0700

On Mon, May 23, 2022 at 3:56 PM Jag Raman <jag.raman@oracle.com> wrote:
>
>
>
> > On May 23, 2022, at 11:09 AM, Alexander Duyck <alexander.duyck@gmail.com> 
> > wrote:
> >
> > From: Alexander Duyck <alexanderduyck@fb.com>
> >
> > When I run Multi-process QEMU with an e1000 as the remote device and SMP
> > enabled I see the combination lock up and become unresponsive. The QEMU 
> > build
> > is a fairly standard x86_64-softmmu setup. After doing some digging I 
> > tracked
> > the lockup down to the what appears to be a race with the mpqemu-link 
> > msg_send
> > and msg_receive functions and the reacquisition of the lock.
> >
> > I am assuming the issue is some sort of lock inversion though I haven't
> > identified exactly what the other lock involved is yet. For now removing
> > the logic to unlock the iothread and then reacquire the lock seems to
> > resolve the issue. I am assuming the releasing of the lock was some form of
> > optimization but I am not certain so I am submitting this as an RFC.
>
> Hi Alexander,
>
> We are working on moving away from Multi-process QEMU and to using vfio-user
> based approach. The vfio-user patches are under review. I believe we would 
> drop
> the Multi-process support once vfio-user is merged.
>
> We release the lock here while communicating with the remote process via the
> QIOChannel. It is to prevent lockup of the VM in case the QIOChannel hangs.
>
> I was able to reproduce this issue at my end. There is a deadlock between
> "mpqemu_msg_send() -> qemu_mutex_lock_iothread()" and
> "mpqemu_msg_send_and_await_reply() -> QEMU_LOCK_GUARD(&pdev->io_mutex)”.
>
> From what I can tell, as soon as one vcpu thread drops the iothread lock, 
> another
> thread running mpqemu_msg_send_and_await_reply() holds on to it. That prevents
> the first thread from completing. Attaching backtrace below.
>
> To avoid the deadlock, I think we should drop both the iothread lock and 
> io_mutex
> and reacquire them in the correct order - first iothread and then io_mutex. 
> Given
> multiprocess QEMU would be dropped in the near future, I suppose we don’t have
> to proceed further along these lines.
>
> I tested your patch, and that fixes the e1000 issue at my end also. I believe 
> we
> could adopt it.

Hi Jag,

I will go take a look at the vfio-user patches. I hadn't been
following the list lately so I didn't realize things were headed in
that direction. It may work out better for me depending on what is
enabled, as I was looking at working with PCIe config and MSI-X anyway
so if the vfio-user supports that already then it may save me some
work.

What you describe as being the issue doesn't sound too surprising, as
I had mentioned disabling SMP also solved the problem for me. It is
likely due to the fact that the e1000 has separate threads running for
handling stats and such and then the main thread handling the
networking. I would think we cannot drop the io_mutex though as it is
needed to enforce the ordering of the request send and the reply
receive. Rather if we need to drop the iothread lock I would think it
might be better to drop it before acquiring the io_mutex, or to at
least wait until after we release the io_mutex before reacquiring it.

If you want I can resubmit my patch for acceptance, but it sounds like
the code will be deprecated soon so I am not sure there is much point.
I will turn my focus to vfio-user and see if I can get it up and
running for my use case.

Thanks,

- Alex



reply via email to

[Prev in Thread] Current Thread [Next in Thread]