qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations


From: Hanna Czenczek
Subject: Re: [PATCH v2 5/8] qemu-img: rebase: avoid unnecessary COW operations
Date: Tue, 19 Sep 2023 12:46:37 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0

On 15.09.23 18:20, Andrey Drobyshev wrote:
When rebasing an image from one backing file to another, we need to
compare data from old and new backings.  If the diff between that data
happens to be unaligned to the target cluster size, we might end up
doing partial writes, which would lead to copy-on-write and additional IO.

Consider the following simple case (virtual_size == cluster_size == 64K):

base <-- inc1 <-- inc2

qemu-io -c "write -P 0xaa 0 32K" base.qcow2
qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2

While doing rebase, we'll write a half of the cluster to inc2, and block
layer will have to read the 2nd half of the same cluster from the base image
inc1 while doing this write operation, although the whole cluster is already
read earlier to perform data comparison.

In order to avoid these unnecessary IO cycles, let's make sure every
write request is aligned to the overlay subcluster boundaries.  Using
subcluster size is universal as for the images which don't have them
this size equals to the cluster size, so in any case we end up aligning
to the smallest unit of allocation.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
  qemu-img.c | 76 ++++++++++++++++++++++++++++++++++++++++--------------
  1 file changed, 56 insertions(+), 20 deletions(-)

Looks good, I like the changes from v1!  Two minor things:

diff --git a/qemu-img.c b/qemu-img.c
index fcd31d7b5b..83950af42b 100644
--- a/qemu-img.c
+++ b/qemu-img.c

[...]

@@ -3844,33 +3861,48 @@ static int img_rebase(int argc, char **argv)
                  }
              }
+ /*
+             * At this point we know that the region [offset; offset + n)
+             * is unallocated within the target image.  This region might be
+             * unaligned to the target image's (sub)cluster boundaries, as
+             * old backing may have smaller clusters (or have subclusters).
+             * We extend it to the aligned boundaries to avoid CoW on
+             * partial writes in blk_pwrite(),
+             */
+            n += offset - QEMU_ALIGN_DOWN(offset, write_align);
+            offset = QEMU_ALIGN_DOWN(offset, write_align);
+            n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
+            n = MIN(n, size - offset);
+            assert(!bdrv_is_allocated(unfiltered_bs, offset, n, &n_alloc) &&
+                   n_alloc == n);
+
+            /*
+             * Much like the with the target image, we'll try to read as much

s/the with the/with the/

+             * of the old and new backings as we can.
+             */
+            n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
+            if (blk_new_backing) {
+                n_new = MIN(n, MAX(0, new_backing_size - (int64_t) offset));
+            }

If we don’t have a check for blk_old_backing (old_backing_size is 0 if blk_old_backing is NULL), why do we have a check for blk_new_backing (new_backing_size is 0 if blk_new_backing is NULL)?

(Perhaps because the previous check was `offset >= new_backing_size || !blk_new_backing`, i.e. included exactly such a check – but I don’t think it’s necessary, new_backing_size will be 0 if blk_new_backing is NULL.)

+
              /*
               * Read old and new backing file and take into consideration that
               * backing files may be smaller than the COW image.
               */
-            if (offset >= old_backing_size) {
-                memset(buf_old, 0, n);
-                buf_old_is_zero = true;
+            memset(buf_old + n_old, 0, n - n_old);
+            if (!n_old) {
+                old_backing_eof = true;
              } else {
-                if (offset + n > old_backing_size) {
-                    n = old_backing_size - offset;
-                }
-
-                ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
+                ret = blk_pread(blk_old_backing, offset, n_old, buf_old, 0);
                  if (ret < 0) {
                      error_report("error while reading from old backing file");
                      goto out;
                  }
              }
- if (offset >= new_backing_size || !blk_new_backing) {
-                memset(buf_new, 0, n);
-            } else {
-                if (offset + n > new_backing_size) {
-                    n = new_backing_size - offset;
-                }
-
-                ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
+            memset(buf_new + n_new, 0, n - n_new);
+            if (blk_new_backing && n_new) {

Same as above, I think we can drop the blk_new_backing check, just so that both blocks (for old and new) look the same.

(Also, the memset() already has to trust that n_new is 0 if blk_new_backing is NULL, so the check doesn’t make much sense from that perspective either, and makes the memset() look wrong.)

+                ret = blk_pread(blk_new_backing, offset, n_new, buf_new, 0);
                  if (ret < 0) {
                      error_report("error while reading from new backing file");
                      goto out;




reply via email to

[Prev in Thread] Current Thread [Next in Thread]