qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v5 07/10] block: introduce preallocate filter


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH v5 07/10] block: introduce preallocate filter
Date: Wed, 26 Aug 2020 12:15:47 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0

26.08.2020 11:52, Max Reitz wrote:
On 26.08.20 08:49, Vladimir Sementsov-Ogievskiy wrote:
25.08.2020 18:11, Max Reitz wrote:
On 21.08.20 16:11, Vladimir Sementsov-Ogievskiy wrote:
It's intended to be inserted between format and protocol nodes to
preallocate additional space (expanding protocol file) on writes
crossing EOF. It improves performance for file-systems with slow
allocation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
   docs/system/qemu-block-drivers.rst.inc |  26 +++
   qapi/block-core.json                   |  20 +-
   block/preallocate.c                    | 291 +++++++++++++++++++++++++
   block/Makefile.objs                    |   1 +
   4 files changed, 337 insertions(+), 1 deletion(-)
   create mode 100644 block/preallocate.c

[...]

diff --git a/block/preallocate.c b/block/preallocate.c
new file mode 100644
index 0000000000..bdf54dbd2f
--- /dev/null
+++ b/block/preallocate.c
@@ -0,0 +1,291 @@
+/*
+ * preallocate filter driver
+ *
+ * The driver performs preallocate operation: it is injected above
+ * some node, and before each write over EOF it does additional
preallocating
+ * write-zeroes request.
+ *
+ * Copyright (c) 2020 Virtuozzo International GmbH.
+ *
+ * Author:
+ *  Sementsov-Ogievskiy Vladimir <vsementsov@virtuozzo.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+#include "qemu/units.h"
+#include "block/block_int.h"
+
+
+typedef struct BDRVPreallocateState {
+    int64_t prealloc_size;
+    int64_t prealloc_align;
+
+    /*
+     * Filter is started as not-active, so it doesn't do any
preallocations nor
+     * requires BLK_PERM_RESIZE on its child. This is needed to
create filter
+     * above another node-child and then do bdrv_replace_node to
insert the
+     * filter.

A bit weird, but seems fair.  It’s basically just a cache for whether
this node has a writer on it or not.

Apart from the weirdness, I don’t understand the “another node-child”.
Say you have “format” -> “proto”, and then you want to insert
“prealloc”.  Wouldn’t you blockdev-add prealloc,file=proto and then
blockdev-replace format.file=prealloc?

Yes something like this. Actually, I'm about inserting the filter
automatically from block layer code, but your reasoning is about same
thing and is better.

So what is that “other node-child”?

Hmm, just my misleading wording. At least '-' in wrong place. Just
"other node child", or "child of another node"..

OK.

+     *
+     * Filter becomes active the first time it detects that its
parents have
+     * BLK_PERM_RESIZE on it.
+     * Filter becomes active forever: it doesn't lose active status
if parents
+     * lose BLK_PERM_RESIZE, otherwise we'll not be able to shrink
the file on
+     * filter close.

Oh, the file is shrunk?  That wasn’t clear from the documentation.

Hm.  Seems like useful behavior.  I just wonder if keeping the
permission around indefinitely makes sense.  Why not just truncate it
when the permission is revoked?

How? Parent is closed earlier, so on close we will not have the
permission. So, we force-keep the permission up to our close().

I thought that preallocate_child_perm() would be invoked when the parent
is detached, so we could do the truncate there, before relinquishing
preallocate’s RESIZE permission.


Hmm, I can check it. I just a bit afraid of doing something serious like 
truncation in .bdrv_child_perm() handler, which doesn't seem to imply such 
usage.


+static void preallocate_close(BlockDriverState *bs)
+{
+    BDRVPreallocateState *s = bs->opaque;
+
+    if (s->active && s->data_end >= 0 &&
+        bdrv_getlength(bs->file->bs) > s->data_end)
+    {
+        bdrv_truncate(bs->file, s->data_end, true,
PREALLOC_MODE_OFF, 0, NULL);

Now that I think more about it...  What if there are other writers on
bs->file?  This may throw data away.

Good point. Actually, if bs->file has other writing parents, the logic
of the filter
around "data_end" is broken. So we must unshare WRITE and RESIZE
permissions.

That’s certainly a heavy hammer, but it’d work.

  Maybe BDS.wr_highest_offset can
help to make a better call?

Anyway, we'll have to use wr_highest_offset of the filter not the child
node
(in the child wr_highest_offset will track preallocations as well),

That’s true.

so we want to unshare WRITE/RESIZE permissions.

[...]

+static int coroutine_fn
+preallocate_co_truncate(BlockDriverState *bs, int64_t offset,
+                        bool exact, PreallocMode prealloc,
+                        BdrvRequestFlags flags, Error **errp)
+{
+    BDRVPreallocateState *s = bs->opaque;
+    int ret = bdrv_co_truncate(bs->file, offset, exact, prealloc,
flags, errp);
+
+    /* s->data_end may become negative here, which means unknown
data end */
+    s->data_end = bdrv_getlength(bs->file->bs);

What would be the problem with just setting data_end = offset?  We only
use data_end to truncate down on close, so basically repeat the
bdrv_co_truncate() call here, which seems correct.

We can use offset only if ret >= 0 && exact is true..

Well, on close, we ignore any errors from truncate() anyway.  So even if
we used exact=false here, it shouldn’t matter.

Here we handle truncate from user. If I understand "exact" correctly the 
following is possible:

1. parent does truncate 1M -> 2M, exact=false
2. driver decides to truncate to 5M (exact condition is sutisfied)
3. but we remember s->data_end = 2M, and on close will shrink to 2M

Doesn't seem a real problem.. So, we just consider the preallocation done by 
driver as our preallocation to be dropped on close().

Could there be problems on user shrink?

1. parent does truncate 2M -> 1M, exact=false
2. driver decides to do notihng (why not)
3. on close() we'll shrink to 1M..

Again, seems no real problems.

But in case when bdrv_co_truncate failed, we should call bdrv_getlength anyway, as 
we don't know the result of failed truncation. Or we can just set s->data_end = 
-1, and not care too much about fail-scenarios.

So, probably we can do
s->data_end = ret < 0 : ret : offset;

But I just feel safer with additional bdrv_getlength().


But simpler is
just call
bdrv_getlength() anyway.

Certainly simpler than thinking about potential edge cases, true.

Max



--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]