qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/nvme: reattach subsystem namespaces on hotplug


From: Klaus Jensen
Subject: Re: [PATCH] hw/nvme: reattach subsystem namespaces on hotplug
Date: Fri, 24 Sep 2021 09:29:09 +0200

On Sep 24 08:05, Hannes Reinecke wrote:
> On 9/23/21 10:09 PM, Klaus Jensen wrote:
> > On Sep  9 13:37, Hannes Reinecke wrote:
> > > On 9/9/21 12:47 PM, Klaus Jensen wrote:
> > > > On Sep  9 11:43, Hannes Reinecke wrote:
> > > > > With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
> > > > > namespaces get moved from the controller to the subsystem if one
> > > > > is specified.
> > > > > That keeps the namespaces alive after a controller hot-unplug, but
> > > > > after a controller hotplug we have to reconnect the namespaces
> > > > > from the subsystem to the controller.
> > > > > 
> > > > > Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
> > > > > Cc: Klaus Jensen <k.jensen@samsung.com>
> > > > > Signed-off-by: Hannes Reinecke <hare@suse.de>
> > > > > ---
> > > > >   hw/nvme/subsys.c | 8 +++++++-
> > > > >   1 file changed, 7 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
> > > > > index 93c35950d6..a9404f2b5e 100644
> > > > > --- a/hw/nvme/subsys.c
> > > > > +++ b/hw/nvme/subsys.c
> > > > > @@ -14,7 +14,7 @@
> > > > >   int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
> > > > >   {
> > > > >       NvmeSubsystem *subsys = n->subsys;
> > > > > -    int cntlid;
> > > > > +    int cntlid, nsid;
> > > > >       for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
> > > > >           if (!subsys->ctrls[cntlid]) {
> > > > > @@ -29,12 +29,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error 
> > > > > **errp)
> > > > >       subsys->ctrls[cntlid] = n;
> > > > > +    for (nsid = 0; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) {
> > > > > +        if (subsys->namespaces[nsid]) {
> > > > > +            nvme_attach_ns(n, subsys->namespaces[nsid]);
> > > > > +        }
> > > > 
> > > > Thanks Hannes! I like it, keeping things simple.
> > > > 
> > > > But we should only attach namespaces that have the shared property or
> > > > have ns->attached == 0. Non-shared namespaces may already be attached to
> > > > another controller in the subsystem.
> > > > 
> > > 
> > > Well ... I tried to avoid that subject, but as you brought it up:
> > > There is a slightly tricky issue in fabrics, in that the 'controller' is
> > > independent from the 'connection'.
> > > The 'shared' bit in the CMIC setting indicates that the subsystem may
> > > have more than one _controller_. It doesn't talk about how many
> > > _connections_ a controller may support; that then is the realm of
> > > dynamic or static controllers, which we don't talk about :-).
> > > Sufficient to say, linux only implements the dynamic controller model,
> > > so every connection will be to a different controller.
> > > But you have been warned :-)
> > > 
> > > However, the 'CMIC' setting is independent on the 'NMIC' setting (ie the
> > > 'shared' bit in the namespace).
> > > Which leads to the interesting question on how exactly should one handle
> > > non-shared namespaces in subsystems for which there are multiple
> > > controllers. Especially as the NSID space is per subsystem, so each
> > > controller will be able to figure out if there are blanked-out namespaces.
> > > So to make this a sane setup I would propose to set the 'shared' option
> > > automatically whenever the controller has the 'subsys' option set.
> > > And actually, I would ditch the 'shared' option completely, and make it
> > > dependent on the 'subsys' setting for the controller.
> > > Much like we treat the 'CMIC' setting nowadays.
> > > That avoids lots of confusions, and also make the implementation _way_
> > > easier.
> > > 
> > 
> > I see your point. Unfortunately, there is no easy way to ditch shared=
> > now. But I think it'd be good enough to attach to shared automatically,
> > but not to "shared=off".
> > 
> > I've already ditched the shared parameter on my RFC refactor series and
> > always having the namespaces shared.
> > 
> 
> Okay.
> 
> > > > However...
> > > > 
> > > > The spec says that "The attach and detach operations are persistent
> > > > across all reset events.". This means that we should track those events
> > > > in the subsystem and only reattach namespaces that were attached at the
> > > > time of the "reset" event. Currently, we don't have anything mapping
> > > > that state. But the device already has to take some liberties with
> > > > regard to stuff that is considered persistent by the spec (SMART log
> > > > etc.) since we do not have any way to store stuff persistently across
> > > > qemu invocations, so I think the above is an acceptable compromise.
> > > > 
> > > Careful. 'attach' and 'detach' are MI (management interface) operations.
> > > If and how many namespaces are visible to any given controllers is
> > > actually independent on that; and, in fact, controllers might not even
> > > implement 'attach' or 'detach'.
> > > But I do agree that we don't handle the 'reset' state properly.
> > > 
> > 
> > The Namespace Attachment command has nothing to do with MI? And the QEMU
> > controller does support attaching and detaching namespaces.
> > 
> 
> No, you got me wrong. Whether a not a namespace is attached is independent
> of any 'Namespace attachment' command.
> Case in point: the subsystem will be starting up with namespace already
> attached, even though no-one issued any namespace attachment command.
> 

Right, understood.

> > > > A potential (as good as it gets) fix would be to keep a map/list of
> > > > "persistently" attached controllers on the namespaces and re-attach
> > > > according to that when we see that controller joining the subsystem
> > > > again. CNTLID would be the obvious choice for the key here, but problem
> > > > is that we cant really use it since we assign it sequentially from the
> > > > subsystem, which now looks like a pretty bad choice. CNTLID should have
> > > > been a required property of the nvme device when subsystems are
> > > > involved. Maybe we can fix up the CNTLID assignment to take the serial
> > > > into account (we know that is defined and *should* be unique) and not
> > > > reuse CNTLIDs. This limits the subsystem to NVME_MAX_CONTROLLERS unique
> > > > controllers, but I think that is fair enough.
> > > > 
> > > > Sigh. Need to think this through.
> > > > 
> > > Well, actually there is an easy way out. I do agree that we need to make
> > > the 'cntlid' a property of the controller. And once it's set we can
> > > track it properly, eg by having per-cntlid nsid lists in the subsystem.
> > > But if it's not set we can claim that we'll be allocating a new
> > > controller across reboots (which is actually what we're doing), making
> > > us spec compliant again :-)
> > > 
> > 
> > That is a decent solution, but we still can't track it across reboots. I
> > think we should just with your patch (but for now, only automatically
> > attach shared namespaces).
> > 
> But that's the point.
> We don't _need_ to track it.
> We only need to track it when it's specified via a (yet-to-be-added) cntlid
> parameter, but then it's trivial.
> If it's not specified we will allocate a new one and don't need to track
> stuff. That's even permitted by the NVMe spec; v2.0 section 3.1.1 has:
> 
> While allocation of static controllers to hosts are expected to be durable
> (so that hosts can expect to form associations to the same controllers
> repeatedly (e.g., after each host reboot)), the NVM subsystem may remove the
> host allocation of a controller that is not in use at any time for
> implementation specific reasons
> (e.g., controller resource reclamation, subsystem reconfiguration).
> 

Awesome. Thanks for pointing me towards that wording.

> > Would that be acceptable to you? It would require your to add shared=on
> > on your single-namespace test set up since unfortunately, shared=on is
> > not the default. However, we can consider changing that default in 6.2.
> > 
> Yes, for an interim solution it's okay.
> The important bit is to get hotplug up and running again for NVMe; we do
> have some testcases for patches in upstream linux which I would like to
> forward to our QA Team.
> 

Cool. I've sent out a series with your patch and an additional patch
that changes the default for 'shared'.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]