qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)


From: Ding Hui
Subject: Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)
Date: Mon, 23 May 2022 20:47:03 +0800
User-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0

Hi ASM

I think I meet the almost exactly same issue with ASM (qemu e1000 tap, guest run dpdk), in our environment, rather than your github project

It seems that the desc (not only status filed, we also traced buffer_addr and csum) changed to old value suddenly after guest dpdk set status to 0

So I googled the issue, and found the bug at https://bugs.launchpad.net/qemu/+bug/1853123 and this mail list.

Can you tell me what's your workaround, and do you find out the root cause?

Thanks a lot

On 2020/1/2 19:05, Stefan Hajnoczi wrote:
On Mon, Dec 30, 2019 at 01:10:15PM +0300, ASM wrote:
It could be a bug in QEMU's e1000 emulation - maybe it's not doing
things in the correct order and causes a race condition with the DPDK
polling driver - or it could be a bug in the DPDK e1000 driver regarding
the order in which the descriptor ring and RX Head/Tail MMIO registers
are updated.


What did I understand:
* DPDK and Kernel drivers work like simular with ring. It don't
analize Head, but check STATUS.
This is a bit strange but completely correct driver behavior. If the
driver writes to memory, it expects
this value to be written. The problem is definitely not in the DPDK
and in the kernel driver.
* This problem appears on KVM, but not appears on tcg.
* Most similar to a bug in QEMU e1000 emulation. The e1000 emulation
read and writes to some
memory and same times, same as dpdk driver.


As I understand it, KVM explicitly prohibits access to shared memory.
It is obvious that us need to
protect (RCU) all STATUS registers of all buffers. There can be a lot
of buffers and they can be
scattered throughout the memory.

I don't agree with this reasoning because these descriptor rings are
designed to support simultaneous access by the driver and the device (if
done with care!).  What QEMU and the driver are attempting is typical.

The important thing for safe shared memory access is that both the
driver and the device know who is allowed to write to a memory location
at a point in time.  As long as both the driver and device don't write
to the same location it is possible to safely use the data structure.

The typical pattern that a driver or device uses to publish information
is:

   1. Fill in all fields but don't set the STATUS bit yet.
   2. Memory barrier or other memory ordering directive to ensure that
      fields have been written.  This operation might be implicit.
   3. Set the STATUS bit in a separate operation.

On the read side there should be:

   1. Load the STATUS field and check the bit.  If it's clear then try
      again.
   2. Memory barrier or other memory ordering directive to ensure that
      fields have been written.  This operation might be implicit.
   3. Load all the fields.

Can you explain why the STATUS fields need to be protected?  My
understanding is that the STATUS field is owned by the device at this
point in time.  The device writes to the STATUS field and the driver
polls it - this is safe.

I think the issue is bugs in the code.  The DPDK code looks questionable
(https://git.dpdk.org/dpdk/tree/drivers/net/e1000/em_rxtx.c#n715):

   volatile struct e1000_rx_desc *rxdp;
   ...
   rxdp = &rx_ring[rx_id];
   status = rxdp->status;
   if (! (status & E1000_RXD_STAT_DD))
       break;
   rxd = *rxdp;

Although there is a conditional branch on rxdp->status, the CPU may load
*rxdp before loading rxdp->status.  These pointers are volatile but
that's not enough to prevent the CPU from reordering memory accesses
(the compiler emits regular load instructions without memory ordering
directives).

This is probably not the bug that you're seeing, but it suggests there
are more problems.

Stefan



--
Thanks,
- Ding Hui



reply via email to

[Prev in Thread] Current Thread [Next in Thread]