Re: Segfault in hw/scsi/scsi-disk.c caused by null pointer

I’m not much further with my segfault, though I now know that the number of detaches likely does not matter and it seems to occur during the attach, not the detach part of the code.

I adapted my change to be a bit more sane - I think it might make sense in general, as something is clearly wrong, the code can be reached somehow and in this case we probably just want to stop, instead of pretending everything is okay.

So the following change also works for us, causing no segfaults:

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c

index efee6739f9..7273cd6c3d 100644

--- a/hw/scsi/scsi-disk.c

+++ b/hw/scsi/scsi-disk.c

@@ -775,6 +775,15 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)

return -1;

}

+ /* Avoid null-pointers leading to segfaults below */

+ if (!s->version) {

+ return -1;

+ }

+ if (!s->vendor) {

+ return -1;

+ }

/* PAGE CODE == 0 */

buflen = req->cmd.xfer;

if (buflen > SCSI_MAX_INQUIRY_LEN) {

I still hope to get some feedback from anyone that is familiar with hw/scsi. Hopefully this reaches someone who can shed some light on this.

Cheers and enjoy your weekend,

Denis

On 9 Aug 2022, at 18:51, Peter Maydell <peter.maydell@linaro.org> wrote:

On Tue, 9 Aug 2022 at 17:26, Denis Krienbühl <denis@href.ch> wrote:
On 9 Aug 2022, at 18:15, Peter Maydell <peter.maydell@linaro.org> wrote:
My wild guess is that there's a race condition somewhere such
that when you're doing this huge amount of detaches, very rarely
a disk is detached and deleted but this INQUIRY request is
incorrectly still sent to the disk (which being a freed object,
might be overwritten with other stuff). But that is purely a guess.

So.. should this be something I create a bug report for?

If you can repro this on current head-of-git, or at least on
the most recent release, then yes, file a bug report.

The best I can currently do is start to log what’s going on. Since
I’m not at all familiar with SCSI and this code-base, do you have
any tipps on what I should log to maybe find out where this
race-condition occurs?

Or if there’s any kind of documentation I could read to understand
better what is going on in the hw/scsi subsystem and how I should
navigate the code. After reading your explanation we’ll probably
look for other workarounds, but I would love to understand what’s
going on.

Paolo and Fam are the SCSI subsystem maintainers. They might know
whether this sounds like a bug that's already been fixed at some
point, or have other suggestions.

Context (ie link to the start of this thread on the list archive):
https://lists.gnu.org/archive/html/qemu-discuss/2022-08/msg00011.html

thanks
-- PMM

From:	Denis Krienbühl
Subject:	Re: Segfault in hw/scsi/scsi-disk.c caused by null pointer
Date:	Fri, 12 Aug 2022 16:41:40 +0200