Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX

From:	Hanna Czenczek
Subject:	Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX
Date:	Thu, 16 Mar 2023 10:43:38 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.7.1

On 15.03.23 19:25, Eric Blake wrote:

On Wed, Mar 15, 2023 at 01:13:29PM +0100, Hanna Czenczek wrote:

When processing vectored guest requests that are not aligned to the
storage request alignment, we pad them by adding head and/or tail
buffers for a read-modify-write cycle.

The guest can submit I/O vectors up to IOV_MAX (1024) in length, but
with this padding, the vector can exceed that limit.  As of
4c002cef0e9abe7135d7916c51abce47f7fc1ee2 ("util/iov: make
qemu_iovec_init_extended() honest"), we refuse to pad vectors beyond the
limit, instead returning an error to the guest.

To the guest, this appears as a random I/O error.  We should not return
an I/O error to the guest when it issued a perfectly valid request.

Before 4c002cef0e9abe7135d7916c51abce47f7fc1ee2, we just made the vector
longer than IOV_MAX, which generally seems to work (because the guest
assumes a smaller alignment than we really have, file-posix's
raw_co_prw() will generally see bdrv_qiov_is_aligned() return false, and
so emulate the request, so that the IOV_MAX does not matter).  However,
that does not seem exactly great.

I see two ways to fix this problem:
1. We split such long requests into two requests.
2. We join some elements of the vector into new buffers to make it
    shorter.

I am wary of (1), because it seems like it may have unintended side
effects.

(2) on the other hand seems relatively simple to implement, with
hopefully few side effects, so this patch does that.

Agreed that approach 2 is more conservative.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2141964
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
  block/io.c | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++---
  util/iov.c |   4 --
  2 files changed, 133 insertions(+), 10 deletions(-)

+/*

+ * If padding has made the IOV (`pad->local_qiov`) too long (more than IOV_MAX
+ * elements), collapse some elements into a single one so that it adheres to 
the
+ * IOV_MAX limit again.
+ *
+ * If collapsing, `pad->collapse_buf` will be used as a bounce buffer of length
+ * `pad->collapse_len`.  `pad->collapsed_qiov` will contain the previous 
entries
+ * (before collapsing), so that bdrv_padding_destroy() can copy the bounce
+ * buffer content back for read requests.
+ *
+ * Note that we will not touch the padding head or tail entries here.  We 
cannot
+ * move them to a bounce buffer, because for RMWs, both head and tail expect to
+ * be in an aligned buffer with scratch space after (head) or before (tail) to
+ * perform the read into (because the whole buffer must be aligned, but head's
+ * and tail's lengths naturally cannot be aligned, because they provide padding
+ * for unaligned requests).  A collapsed bounce buffer for multiple IOV 
elements
+ * cannot provide such scratch space.
+ *
+ * Therefore, this function collapses the first IOV elements after the
+ * (potential) head element.

It looks like you blindly pick the first one or two non-padding iovs
at the front of the array.  Would it be any wiser (in terms of less
memmove() action or even a smaller bounce buffer) to pick iovs at the
end of the array, and/or a sequential search for the smallest
neighboring iovs?  Or is that a micro-optimization that costs more
than it saves?

Right, I didn’t think of picking near the end. That makes senseindeed! If not for performance, at least it allows dropping thenon-trivial comment for the memmove().

Searching for the smallest buffers, I’m not sure. I think it can makesense performance-wise – iterating over 1024 elements will probably payoff quickly when you indeed have differences in buffer size there. Mymain concern is that it would be more complicated, and I just don’tthink that’s worth it for such a rare case.

Would it be any easier to swap the order of padding vs. collapsing?
That is, we already know the user is giving us a long list of iovs; if
it is 1024 elements long, and we can detect that padding will be
needed, should we collapse before padding instead of padding, finding
that we now have 1026, and memmove'ing back into 1024?

I’d prefer that, but it’s difficult. We need the temporary QIOV(pad->local_qiov) so we can merge entries, but this is only created byqemu_iovec_init_extended().

We can try to move the collapsing into qemu_iovec_init_extended(), butit would be a bit awkward still, and probably blow up that function’sinterface (it’s in util/iov.c, so we can’t really immediately use theBdrvRequestPadding object). I think, in the end, functionally not muchwould change, so I’d rather keep the order as it is (unless someone hasa good idea here).

But logic-wise, your patch looks correct to me.

Reviewed-by: Eric Blake <eblake@redhat.com>


Thanks!

Hanna

[Prev in Thread]

Current Thread

[Next in Thread]

[RFC 0/2] Split padded I/O vectors exceeding IOV_MAX, Hanna Czenczek, 2023/03/15
- [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX, Hanna Czenczek, 2023/03/15
  - Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX, Eric Blake, 2023/03/15
    - Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX, Hanna Czenczek <=
  - Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX, Stefan Hajnoczi, 2023/03/15
    - Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX, Hanna Czenczek, 2023/03/16
  - Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX, Vladimir Sementsov-Ogievskiy, 2023/03/16
    - Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX, Hanna Czenczek, 2023/03/17
- [RFC 2/2] iotests/iov-padding: New test, Hanna Czenczek, 2023/03/15
- Re: [RFC 0/2] Split padded I/O vectors exceeding IOV_MAX, Stefan Hajnoczi, 2023/03/15
  - Re: [RFC 0/2] Split padded I/O vectors exceeding IOV_MAX, Hanna Czenczek, 2023/03/15

Prev by Date: Re: [PULL 0/3] Misc next patches
Next by Date: Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX
Previous by thread: Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX
Next by thread: Re: [RFC 1/2] block: Split padded I/O vectors exceeding IOV_MAX
Index(es):
- Date
- Thread