qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] nbd/client: Request larger block status by default


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH] nbd/client: Request larger block status by default
Date: Wed, 22 Sep 2021 10:28:17 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0

21.09.2021 23:00, Eric Blake wrote:
On Tue, Sep 21, 2021 at 10:12:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
21.09.2021 21:08, Eric Blake wrote:
On Tue, Sep 21, 2021 at 08:25:11PM +0300, Vladimir Sementsov-Ogievskiy wrote:
21.09.2021 19:17, Eric Blake wrote:
Now that commit 5a1cfd21 has clarified that a driver's block_status
can report larger *pnum than in the original request, we can take
advantage of that in the NBD driver.  Rather that limiting our request
to the server based on the maximum @bytes our caller mentioned, we
instead ask for as much status as possible (the minimum of our 4G
limit or the rest of the export); the server will still only give us
one extent in its answer (because we are using NBD_CMD_FLAG_REQ_ONE),
but now the block layer's caching of data areas can take advantage of
cases where the server gives us a large answer to avoid the need for
future NBD_CMD_BLOCK_STATUS calls.

Signed-off-by: Eric Blake <eblake@redhat.com>
---


I remember we already discussed that, but can't find.

The problem is that it's not for free:

In server code in blockstatus_to_extents, we loop though the disk, trying to 
merge extents of the same type.

With full allocated qcow2, we'll have to load all L2 tables and handle them, to merge all 
block status into one big "allocated" extent.


We don't have to loop that far.  The NBD protocol allows the server to
stop looping at whatever point makes sense, as long as it makes
progress.

Maybe, we need some additional negotiation flag, to allow BLOCK_STATUS command 
with NBD_CMD_FLAG_REQ_ONE flag to return an extent larger than required when 
that information is available for free?

That's already the case when FLAG_REQ_ONE is not present.  The reason
that REQ_ONE clamps things at the requested limit is because older
qemu had a bug that it rejected the server sending extra information,
even when that info was free.


That's one possibility.  Another does not add anything to the NBD
protocol, but instead limits the code that tries to loop over block
status to deteremine a larger "allocated" answer to return to instead
stop looping after a finite number of extents have been merged
together.


In this case we should answer a question: when to stop looping? I'm not sure we 
can simply drop the loop:

For example, for compressed clusters, bdrv_co_block_status() will return them 
one-by-one, and sending them one by one to the wire, when user requested large 
range would be inefficient.
Or should we change block-status behavior for compressed clusters? And may be 
add flag to block_status() that we are not interested in valid_offset, so it 
can return an extent corresponding to the whole L2 table chunk (if all entries 
are allocated, but not consecutive)?

Currently, bdrv_co_block_status() takes 'bool want_zero' that says
what the client wants.  Maybe it's worth expanding that into an enum
or bitmask to allow finer-grained client requests (the notion of
whether valid_offset matters to the caller IS relevant for deciding
when to clamp vs. loop).

I think, sooner or later we'll do it anyway




Hmm. So, if not update spec, we'll have to "fix" implementation. That means actually, 
that we should update spec anyway, at least to note that: "clients tend to request large 
regions in hope that server will not spend too much time to serve them but instead return shorter 
answer"..

I'm really hoping we don't have to tweak the NBD spec on this one, but
rather improve the quality of implementation in qemu.


And you'll never have guarantee, that some another (non-qemu) NBD server will 
not try to satisfy the whole request in on go.

That's true, but the NBD spec has always tried to encourage servers to
provide more information when it was free, but to give up early if it
gets too expensive.  It's a judgment call on where that line lies, and
may indeed be different between different servers.

Hmm. Okay.

Still, if we go the way this patch suggests, we'll need at least improve our 
server implementation of NBD_BLOCK_STATUS around qcow2.




In other words:

1. We want block_status of some region
2. If there some free information available about larger region we are happy to 
cache it

With your solution, we just request a lot larger region, so we lose information 
of [1]. That means that sever can't imagine, how much of requested region is 
really needed, i.e. if we do some additional work to return more information 
(still within boundaries of the request) will it be:
  - good work to minimize network traffic
OR
  - extra work, waste server time, client will cache this information but 
probably never use (or even lose it soon, as our cache is very simple)

With additional negotiation flag we don't lose [1], i.e how much client wants 
now.


So, for me, modifying the protocol looks nicer..

Another approach is do request without NBD_CMD_FLAG_REQ_ONE and handle several 
extents.

_This_ idea is nicer.  It allows the client to request an actual
length it is interested in now, but allows the server to give extra
information back if it is free.  And it works without changing the NBD
protocol or existing qemu server; it is a client-only change, just
like this patch tried to be, but may have nicer performance
implications.



Are you optimizing some concrete scenario?

Not at this point, so much as observing the effects of commit 5a1cfd21
and seeing if we should update our behavior to match.

For v2, I'll try switching to just drop our REQ_ONE artificial
limitations from the client.  We are still throwing away a lot of
useful information because we don't cache anything beyond the first
extent returned by the server.  At worst, maybe it will require adding
a tuning knob in the QAPI when creating the NBD client to decide which
of the two approaches to favor for a given client's connection to a
particular server.


Great!

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]