qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is retu


From: Eric Biggers
Subject: Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
Date: Thu, 3 Nov 2022 16:26:12 +0000

On Thu, Nov 03, 2022 at 10:52:43AM +0100, Kevin Wolf wrote:
> Am 02.11.2022 um 03:49 hat Eric Biggers geschrieben:
> > On Tue, Nov 01, 2022 at 07:27:16PM -0700, Eric Biggers wrote:
> > > On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> > > > Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
> > > 
> > > Citation needed.  For direct I/O to block devices, the kernel's block 
> > > layer
> > > checks the alignment before the I/O is actually submitted to the 
> > > underlying
> > > block device.  See
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306
> > > 
> > > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
> > > 
> > > That "bug" seems to be based on a misunderstanding of the kernel source 
> > > code,
> > > and not any actual testing.
> > > 
> > > I just tested it, and the error code is EINVAL.
> > > 
> > 
> > I think I see what's happening.  The kernel code was broken just a few 
> > months
> > ago, in v6.0 by the commit "block: relax direct io memory alignment"
> > (https://git.kernel.org/linus/b1a000d3b8ec582d).  Now the block layer lets 
> > DIO
> > through when the user buffer is only aligned to the device's dma_alignment. 
> >  But
> > a dm-crypt device has a dma_alignment of 512 even when the crypto sector 
> > size
> > (and thus also the logical block size) is 4096.  So there is now a case 
> > where
> > misaligned DIO can reach dm-crypt, when that shouldn't be possible.
> > 
> > It also means that STATX_DIOALIGN will give the wrong value for
> > stx_dio_mem_align in the above case, 512 instead of 4096.  This is because
> > STATX_DIOALIGN for block devices relies on the dma_alignment.
> 
> In other words, STATX_DIOALIGN is unusable from the start because we
> don't know whether the information it returns is actually correct? :-/

That's a silly point of view.  STATX_DIOALIGN has only been in a released kernel
for a few weeks (v6.0), the bug is only in one edge case, and it will get fixed
quickly and backported to v6.0.y where users of 6.0 will get it.

Basically every Linux API has been broken at one point or the other, but things
get fixed.  Direct I/O itself has been totally broken on some filesystems, so by
this argument why are you even using direct I/O?  Actually, why are you even
using Linux?  Just use one of those operating systems with no bugs instead.

Also note that your alternative is probing, which is super broken because many
filesystems fall back to buffered I/O for misaligned direct I/O instead of
returning an error.

So please cooperate with getting things fixed properly instead of continuing to
use broken workarounds.

> > I'll raise this on the linux-block and dm-devel mailing lists.  It
> > would be nice if people reported kernel bugs instead of silently
> > working around them...
> 
> I wasn't involved in this specific one, but in case you're wondering why
> I wouldn't have reported it either...
> 
> On one hand, I agree with you because I want bugs in my code reported,
> too, but on the other hand, it has also happened to me before that
> you're treated as the stupid userland developer who doesn't know how
> the kernel works and who should better have kept his ideas of how it
> should work to himself - which is not exactly encouraging to report
> things when you can just deal with the way they are. I wouldn't have
> been able to tell whether in the mind of the respective maintainers,
> -EINVAL is required behaviour or whether that was just a totally
> unreasonable assumption on our side. Erring on the safe side, I'll give
> up an assumption that obviously doesn't match reality.

The error code is documented in the read(2) and write(2) man pages.  So clearly
either the kernel was wrong or the man page was wrong.  Either way it needed to
be reported.

- Eric



reply via email to

[Prev in Thread] Current Thread [Next in Thread]