qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster


From: Kevin Wolf
Subject: Re: [PATCH 0/1] qcow2: Skip copy-on-write when allocating a zero cluster
Date: Mon, 17 Aug 2020 17:53:07 +0200

Am 17.08.2020 um 17:31 hat Alberto Garcia geschrieben:
> On Mon 17 Aug 2020 12:10:19 PM CEST, Kevin Wolf wrote:
> >> Since commit c8bb23cbdbe / QEMU 4.1.0 (and if the storage backend
> >> allows it) writing to an image created with preallocation=metadata
> >> can be slower (20% in my tests) than writing to an image with no
> >> preallocation at all.
> >
> > A while ago we had a case where commit c8bb23cbdbe was actually
> > reported as a major performance regression, so it's a big "it
> > depends".
> >
> > XFS people told me that they consider this code a bad idea. Just
> > because it's a specialised "write zeroes" operation, it's not
> > necessarily fast on filesystems. In particular, on XFS, ZERO_RANGE
> > causes a queue drain with O_DIRECT (probably hurts cases with high
> > queue depths) and additionally even a page cache flush without
> > O_DIRECT.
> >
> > So in a way this whole thing is a two-edged sword.
> 
> I see... on ext4 the improvements are clearly visible. Are we not
> detecting this for xfs? We do have an s->is_xfs flag.

My understanding is that XFS and ext4 behave very similar in this
respect. It's not a clear loss on XFS either, some cases are improved.
But cases that get a performance regression exist, too. It's a question
of the workload, the file system state (e.g. fragmentation of the image
file) and the storage.

So I don't think checking for a specific filesystem is going to improve
things.

> >> a) shall we include a warning in the documentation ("note that this
> >> preallocation mode can result in worse performance")?
> >
> > To be honest, I don't really understand this case yet. With metadata
> > preallocation, the clusters are already marked as allocated, so why
> > would handle_alloc_space() even be called? We're not allocating new
> > clusters after all?
> 
> It's not called, what happens is what you say below:
> 
> > Or are you saying that ZERO_RANGE + pwrite on a sparse file (= cluster
> > allocation) is faster for you than just the pwrite alone (= writing to
> > already allocated cluster)?
> 
> Yes, 20% faster in my tests (4KB random writes), but in the latter case
> the cluster is already allocated only at the qcow2 level, not on the
> filesystem. preallocation=falloc is faster than preallocation=metadata
> (preallocation=off sits in the middle).

Hm, this feels wrong. Doing more operations should never be faster than
doing less operations.

Maybe the difference is in allocating 64k at once instead of doing a
separate allocation for every 4k block? But with the extent size hint
patches to file-posix, we should allocate 1 MB at once by default now
(if your test image was newly created). Can you check whether this is in
effect for your image file?

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]