qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation


From: Kevin Wolf
Subject: Re: [PATCH 1/5] vhost-user-blk: Don't reconnect during initialisation
Date: Wed, 28 Apr 2021 19:31:13 +0200

Am 28.04.2021 um 18:52 hat Raphael Norwitz geschrieben:
> Given what you've shown with the use-after-free, I agree the changes
> need to be reverted. I'm a little uneasy about removing the reconnect
> logic from the device realization completely though.
> 
> On Thu, Apr 22, 2021 at 07:02:17PM +0200, Kevin Wolf wrote:
> > This is a partial revert of commits 77542d43149 and bc79c87bcde.
> > 
> > Usually, an error during initialisation means that the configuration was
> > wrong. Reconnecting won't make the error go away, but just turn the
> > error condition into an endless loop. Avoid this and return errors
> > again.
> > 
> 
> Is that nessesarily true? As I understand it the main usecases for
> device reconnect are to allow a device backend to be restarted after a
> failure or to allow the backend to be upgraded without restarting the
> guest. I agree - misconfiguration could be a common cause of a device
> backend crashing at realize time, but couldn't there be others? Maybe
> transient memory pressure?
> 
> Especially in the case where one process is connecting to many different
> vhost-user-blk instances, I could imagine power-ons and incoming
> migrations racing with backend restarts quite frequently. Should
> these cases cause failures?
> 
> We can still hit the infinite looping case you describe post-realize.
> Why should we treat pre-realize differently?

I think there is one main difference between realize() and later
operation, which is that we can actually deliver an error to the user
during realize(). When we're just starting QEMU and processing the CLI
arguments, failure is very obvious, and in the context of QMP
device-add, the client is actively waiting for a result, too.

Later on, there is no good way to communicate an error (writes to stderr
just end up in some logfile at best, QAPI events can be missed), and
even if there were, we would have to do something with the guest until
the user/management application actually reacts to the error. The latter
is not a problem during realize() because the device wasn't even plugged
in yet.

So I think there are good reasons why it could make sense to distinguish
initialisation and operation.

> > Additionally, calling vhost_user_blk_disconnect() from the chardev event
> > handler could result in use-after-free because none of the
> > initialisation code expects that the device could just go away in the
> > middle. So removing the call fixes crashes in several places.
> > 
> > For example, using a num-queues setting that is incompatible with the
> > backend would result in a crash like this (dereferencing dev->opaque,
> > which is already NULL):
> > 
> >  #0  0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, 
> > condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at 
> > ../hw/virtio/vhost-user.c:313
> >  #1  0x0000555555d950d3 in qio_channel_fd_source_dispatch 
> > (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, 
> > user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
> >  #2  0x00007ffff7b32a9f in g_main_context_dispatch () at 
> > /lib64/libglib-2.0.so.0
> >  #3  0x00007ffff7b84a98 in g_main_context_iterate.constprop () at 
> > /lib64/libglib-2.0.so.0
> >  #4  0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
> >  #5  0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, 
> > msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
> >  #6  0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, 
> > config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
> >  #7  0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, 
> > config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
> >  #8  0x0000555555cdd150 in vhost_user_blk_device_realize 
> > (dev=0x555557bc60b0, errp=0x7fffffffcf90) at 
> > ../hw/block/vhost-user-blk.c:510
> >  #9  0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, 
> > errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  hw/block/vhost-user-blk.c | 54 ++++++++++-----------------------------
> >  1 file changed, 13 insertions(+), 41 deletions(-)
> > 
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index f5e9682703..e824b0a759 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -50,6 +50,8 @@ static const int user_feature_bits[] = {
> >      VHOST_INVALID_FEATURE_BIT
> >  };
> >  
> > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event);
> > +
> >  static void vhost_user_blk_update_config(VirtIODevice *vdev, uint8_t 
> > *config)
> >  {
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> > @@ -362,19 +364,6 @@ static void vhost_user_blk_disconnect(DeviceState *dev)
> >      vhost_dev_cleanup(&s->dev);
> >  }
> >  
> > -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > -                                 bool realized);
> > -
> > -static void vhost_user_blk_event_realize(void *opaque, QEMUChrEvent event)
> > -{
> > -    vhost_user_blk_event(opaque, event, false);
> > -}
> > -
> > -static void vhost_user_blk_event_oper(void *opaque, QEMUChrEvent event)
> > -{
> > -    vhost_user_blk_event(opaque, event, true);
> > -}
> > -
> >  static void vhost_user_blk_chr_closed_bh(void *opaque)
> >  {
> >      DeviceState *dev = opaque;
> > @@ -382,12 +371,11 @@ static void vhost_user_blk_chr_closed_bh(void *opaque)
> >      VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >  
> >      vhost_user_blk_disconnect(dev);
> > -    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL,
> > -            vhost_user_blk_event_oper, NULL, opaque, NULL, true);
> > +    qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, vhost_user_blk_event,
> > +                             NULL, opaque, NULL, true);
> >  }
> >  
> > -static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
> > -                                 bool realized)
> > +static void vhost_user_blk_event(void *opaque, QEMUChrEvent event)
> >  {
> >      DeviceState *dev = opaque;
> >      VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> > @@ -401,17 +389,7 @@ static void vhost_user_blk_event(void *opaque, 
> > QEMUChrEvent event,
> >          }
> >          break;
> >      case CHR_EVENT_CLOSED:
> > -        /*
> > -         * Closing the connection should happen differently on device
> > -         * initialization and operation stages.
> > -         * On initalization, we want to re-start vhost_dev initialization
> > -         * from the very beginning right away when the connection is 
> > closed,
> > -         * so we clean up vhost_dev on each connection closing.
> > -         * On operation, we want to postpone vhost_dev cleanup to let the
> > -         * other code perform its own cleanup sequence using vhost_dev data
> > -         * (e.g. vhost_dev_set_log).
> > -         */
> > -        if (realized && !runstate_check(RUN_STATE_SHUTDOWN)) {
> > +        if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> >              /*
> >               * A close event may happen during a read/write, but vhost
> >               * code assumes the vhost_dev remains setup, so delay the
> > @@ -431,8 +409,6 @@ static void vhost_user_blk_event(void *opaque, 
> > QEMUChrEvent event,
> >               * knowing its type (in this case vhost-user).
> >               */
> >              s->dev.started = false;
> > -        } else {
> > -            vhost_user_blk_disconnect(dev);
> >          }
> >          break;
> >      case CHR_EVENT_BREAK:
> > @@ -490,31 +466,27 @@ static void vhost_user_blk_device_realize(DeviceState 
> > *dev, Error **errp)
> >      s->vhost_vqs = g_new0(struct vhost_virtqueue, s->num_queues);
> >      s->connected = false;
> >  
> > -    qemu_chr_fe_set_handlers(&s->chardev,  NULL, NULL,
> > -                             vhost_user_blk_event_realize, NULL, (void 
> > *)dev,
> > -                             NULL, true);
> > -
> > -reconnect:
> >      if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> >          error_report_err(err);
> >          goto virtio_err;
> >      }
> >  
> > -    /* check whether vhost_user_blk_connect() failed or not */
> > -    if (!s->connected) {
> > -        goto reconnect;
> > +    if (vhost_user_blk_connect(dev) < 0) {
> > +        qemu_chr_fe_disconnect(&s->chardev);
> > +        goto virtio_err;
> >      }
> > +    assert(s->connected);
> 
> Maybe a good compromise here would be to retry some small number of
> times (or even just once) so that cases like daemon upgrades and
> recoverable crashes racing with power-ons and incoming migrations
> don't result in failures?
> 
> As a more general solution, we could have a user defined parameter to
> specify a number of repeated connection failures to allow both pre and
> post realize before bringing QEMU down. Thoughts?

Retrying once or even a small number of times sounds reasonable enough.
At first I thought it wouldn't help because restarting the daemon might
take some time, but with qemu_chr_fe_wait_connected() we already wait
until we can successfully connect in each iteration, and we would only
retry for errors that happen afterwards.

But I think what we really want to do before retrying is distinguishing
errors that are actually related to the connection itself from errors
that relate to the content of the communication (i.e. invalid requests
or configuration).
In fact, I think the error I hit wasn't even produced on the remote
side, it came from QEMU itself. Making vhost_user_blk_connect() return
-EAGAIN in the right cases and reconnecting only there sounds much
better than just blindly retrying.

Of course, adjusting error reporting so that we can distinguish these
cases will probably touch much more places than would be appropriate for
this patch, so I'd suggest that we indeed just revert the reconnection
during realize() in this series, but then try to add some more selective
retry logic back on top of it (without using the event handler, which
only made everything more complicated in a function that waits
synchronously anyway).

Kevin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]