[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] qcow2: Skip copy-on-write when allocating a zero cluster
From: |
Kevin Wolf |
Subject: |
Re: [PATCH v2] qcow2: Skip copy-on-write when allocating a zero cluster |
Date: |
Thu, 10 Sep 2020 14:04:39 +0200 |
Am 09.09.2020 um 21:23 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 27.08.2020 17:53, Alberto Garcia wrote:
> > Since commit c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a when a write
> > request results in a new allocation QEMU first tries to see if the
> > rest of the cluster outside the written area contains only zeroes.
> >
> > In that case, instead of doing a normal copy-on-write operation and
> > writing explicit zero buffers to disk, the code zeroes the whole
> > cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK.
> >
> > This improves performance very significantly but it only happens when
> > we are writing to an area that was completely unallocated before. Zero
> > clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and
> > are therefore slower to allocate.
> >
> > This happens because the code uses bdrv_is_allocated_above() rather
> > bdrv_block_status_above(). The former is not as accurate for this
> > purpose but it is faster. However in the case of qcow2 the underlying
> > call does already report zero clusters just fine so there is no reason
> > why we cannot use that information.
> >
> > After testing 4KB writes on an image that only contains zero clusters
> > this patch results in almost five times more IOPS.
> >
> > Signed-off-by: Alberto Garcia <berto@igalia.com>
> > ---
> > v2:
> > - Add new, simpler API: bdrv_is_unallocated_or_zero_above()
> >
> > include/block/block.h | 2 ++
> > block/io.c | 24 ++++++++++++++++++++++++
> > block/qcow2.c | 37 +++++++++++++++++++++----------------
> > 3 files changed, 47 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/block/block.h b/include/block/block.h
> > index 6e36154061..477a72b2e9 100644
> > --- a/include/block/block.h
> > +++ b/include/block/block.h
> > @@ -496,6 +496,8 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t
> > offset, int64_t bytes,
> > int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
> > bool include_base, int64_t offset, int64_t
> > bytes,
> > int64_t *pnum);
> > +int bdrv_is_unallocated_or_zero_above(BlockDriverState *bs, int64_t offset,
> > + int64_t bytes);
> > bool bdrv_is_read_only(BlockDriverState *bs);
> > int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
> > diff --git a/block/io.c b/block/io.c
> > index ad3a51ed53..94faa3f9d7 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2557,6 +2557,30 @@ int bdrv_block_status(BlockDriverState *bs, int64_t
> > offset, int64_t bytes,
> > offset, bytes, pnum, map, file);
> > }
> > +/*
> > + * Check @bs (and its backing chain) to see if the range defined
> > + * by @offset and @bytes is unallocated or known to read as zeroes.
> > + * Return 1 if that is the case, 0 otherwise and -errno on error.
> > + * This test is meant to be fast rather than accurate so returning 0
> > + * does not guarantee non-zero data.
> > + */
> > +int bdrv_is_unallocated_or_zero_above(BlockDriverState *bs, int64_t offset,
> > + int64_t bytes)
>
> _above prefix for block-status functions usually assume existing of "base"
> parameter, otherwise, it's not clear "above what?"
>
> Also, actually the caller doesn't care about it it allocated or not. It only
> wants to know is it read-as-zero or not. So, probably better name is
> bdrv_is_zero_fast()
>
> > +{
> > + int ret;
> > + int64_t pnum = bytes;
> > +
> > + ret = bdrv_common_block_status_above(bs, NULL, false, offset,
> > + bytes, &pnum, NULL, NULL);
> > +
> > + if (ret < 0) {
> > + return ret;
> > + }
> > +
> > + return (pnum == bytes) &&
> > + ((ret & BDRV_BLOCK_ZERO) || (!(ret & BDRV_BLOCK_ALLOCATED)));
>
> Note that some protocol drivers returns unallocated status when it doesn't
> read-as-zero, so in general, we can't use this function as is_zero.
>
> I think, that better to keep only (pnum == bytes) && (ret & BDRV_BLOCK_ZERO)
> here
Ah, very good, you already mentioned the main points I had, and more. (I
didn't realise that using BDRV_BLOCK_ALLOCATED is actually wrong, just
that it's more complicated than necessary.)
What I would like to add is that a bdrv_co_is_zero_fast() would be even
better. is_zero_cow() isn't marked as such yet, but semantically it's a
coroutine_fn, so we have no reason to go through the synchronous
wrappers.
> , and to make it work correctly improve bdrv_co_block_status like this:
>
> diff --git a/block/io.c b/block/io.c
> index ad3a51ed53..33b2e91bcd 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2408,15 +2408,15 @@ static int coroutine_fn
> bdrv_co_block_status(BlockDriverState *bs,
> if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
> ret |= BDRV_BLOCK_ALLOCATED;
> - } else if (want_zero && bs->drv->supports_backing) {
> - if (bs->backing) {
> + } else if (bs->drv->supports_backing) {
> + if (bs->backing && want_zero) {
> BlockDriverState *bs2 = bs->backing->bs;
> int64_t size2 = bdrv_getlength(bs2);
> if (size2 >= 0 && offset >= size2) {
> ret |= BDRV_BLOCK_ZERO;
> }
> - } else {
> + } else if (!bs->backing) {
> ret |= BDRV_BLOCK_ZERO;
> }
> }
>
> - we can always add ZERO flag to backing-supporting formats if range is
> unallocated and there is no backing file.
This makes sense to me, though it should be a separate patch. This one
wouldn't become incorrect without it, but it would be less effective.
Kevin