qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/nvme: reattach subsystem namespaces on hotplug


From: Klaus Jensen
Subject: Re: [PATCH] hw/nvme: reattach subsystem namespaces on hotplug
Date: Thu, 23 Sep 2021 22:09:36 +0200

On Sep  9 13:37, Hannes Reinecke wrote:
> On 9/9/21 12:47 PM, Klaus Jensen wrote:
> > On Sep  9 11:43, Hannes Reinecke wrote:
> >> With commit 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
> >> namespaces get moved from the controller to the subsystem if one
> >> is specified.
> >> That keeps the namespaces alive after a controller hot-unplug, but
> >> after a controller hotplug we have to reconnect the namespaces
> >> from the subsystem to the controller.
> >>
> >> Fixes: 5ffbaeed16 ("hw/nvme: fix controller hot unplugging")
> >> Cc: Klaus Jensen <k.jensen@samsung.com>
> >> Signed-off-by: Hannes Reinecke <hare@suse.de>
> >> ---
> >>  hw/nvme/subsys.c | 8 +++++++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/nvme/subsys.c b/hw/nvme/subsys.c
> >> index 93c35950d6..a9404f2b5e 100644
> >> --- a/hw/nvme/subsys.c
> >> +++ b/hw/nvme/subsys.c
> >> @@ -14,7 +14,7 @@
> >>  int nvme_subsys_register_ctrl(NvmeCtrl *n, Error **errp)
> >>  {
> >>      NvmeSubsystem *subsys = n->subsys;
> >> -    int cntlid;
> >> +    int cntlid, nsid;
> >>  
> >>      for (cntlid = 0; cntlid < ARRAY_SIZE(subsys->ctrls); cntlid++) {
> >>          if (!subsys->ctrls[cntlid]) {
> >> @@ -29,12 +29,18 @@ int nvme_subsys_register_ctrl(NvmeCtrl *n, Error 
> >> **errp)
> >>  
> >>      subsys->ctrls[cntlid] = n;
> >>  
> >> +    for (nsid = 0; nsid < ARRAY_SIZE(subsys->namespaces); nsid++) {
> >> +        if (subsys->namespaces[nsid]) {
> >> +            nvme_attach_ns(n, subsys->namespaces[nsid]);
> >> +        }
> > 
> > Thanks Hannes! I like it, keeping things simple.
> > 
> > But we should only attach namespaces that have the shared property or
> > have ns->attached == 0. Non-shared namespaces may already be attached to
> > another controller in the subsystem.
> > 
> 
> Well ... I tried to avoid that subject, but as you brought it up:
> There is a slightly tricky issue in fabrics, in that the 'controller' is
> independent from the 'connection'.
> The 'shared' bit in the CMIC setting indicates that the subsystem may
> have more than one _controller_. It doesn't talk about how many
> _connections_ a controller may support; that then is the realm of
> dynamic or static controllers, which we don't talk about :-).
> Sufficient to say, linux only implements the dynamic controller model,
> so every connection will be to a different controller.
> But you have been warned :-)
> 
> However, the 'CMIC' setting is independent on the 'NMIC' setting (ie the
> 'shared' bit in the namespace).
> Which leads to the interesting question on how exactly should one handle
> non-shared namespaces in subsystems for which there are multiple
> controllers. Especially as the NSID space is per subsystem, so each
> controller will be able to figure out if there are blanked-out namespaces.
> So to make this a sane setup I would propose to set the 'shared' option
> automatically whenever the controller has the 'subsys' option set.
> And actually, I would ditch the 'shared' option completely, and make it
> dependent on the 'subsys' setting for the controller.
> Much like we treat the 'CMIC' setting nowadays.
> That avoids lots of confusions, and also make the implementation _way_
> easier.
> 

I see your point. Unfortunately, there is no easy way to ditch shared=
now. But I think it'd be good enough to attach to shared automatically,
but not to "shared=off".

I've already ditched the shared parameter on my RFC refactor series and
always having the namespaces shared.

> > However...
> > 
> > The spec says that "The attach and detach operations are persistent
> > across all reset events.". This means that we should track those events
> > in the subsystem and only reattach namespaces that were attached at the
> > time of the "reset" event. Currently, we don't have anything mapping
> > that state. But the device already has to take some liberties with
> > regard to stuff that is considered persistent by the spec (SMART log
> > etc.) since we do not have any way to store stuff persistently across
> > qemu invocations, so I think the above is an acceptable compromise.
> > 
> Careful. 'attach' and 'detach' are MI (management interface) operations.
> If and how many namespaces are visible to any given controllers is
> actually independent on that; and, in fact, controllers might not even
> implement 'attach' or 'detach'.
> But I do agree that we don't handle the 'reset' state properly.
> 

The Namespace Attachment command has nothing to do with MI? And the QEMU
controller does support attaching and detaching namespaces.

> > A potential (as good as it gets) fix would be to keep a map/list of
> > "persistently" attached controllers on the namespaces and re-attach
> > according to that when we see that controller joining the subsystem
> > again. CNTLID would be the obvious choice for the key here, but problem
> > is that we cant really use it since we assign it sequentially from the
> > subsystem, which now looks like a pretty bad choice. CNTLID should have
> > been a required property of the nvme device when subsystems are
> > involved. Maybe we can fix up the CNTLID assignment to take the serial
> > into account (we know that is defined and *should* be unique) and not
> > reuse CNTLIDs. This limits the subsystem to NVME_MAX_CONTROLLERS unique
> > controllers, but I think that is fair enough.
> > 
> > Sigh. Need to think this through.
> > 
> Well, actually there is an easy way out. I do agree that we need to make
> the 'cntlid' a property of the controller. And once it's set we can
> track it properly, eg by having per-cntlid nsid lists in the subsystem.
> But if it's not set we can claim that we'll be allocating a new
> controller across reboots (which is actually what we're doing), making
> us spec compliant again :-)
> 

That is a decent solution, but we still can't track it across reboots. I
think we should just with your patch (but for now, only automatically
attach shared namespaces).

Would that be acceptable to you? It would require your to add shared=on
on your single-namespace test set up since unfortunately, shared=on is
not the default. However, we can consider changing that default in 6.2.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]