qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 00/33] migration: capture error reports into Error object


From: Daniel P . Berrangé
Subject: Re: [PATCH 00/33] migration: capture error reports into Error object
Date: Thu, 4 Feb 2021 19:09:27 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

On Thu, Feb 04, 2021 at 06:22:49PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > Due to its long term heritage most of the migration code just invokes
> > 'error_report' when problems hit. This was fine for HMP, since the
> > messages get redirected from stderr, into the HMP console. It is not
> > OK for QMP because the errors will not be fed back to the QMP client.
> > 
> > This wasn't a terrible real world problem with QMP so far because
> > live migration happens in the background, so at least on the target side
> > there is not a QMP command that needs to capture the incoming migration.
> > It is a problem on the source side but it doesn't hit frequently as the
> > source side has fewer failure scenarios. None the less on both sides it
> > would be desirable if 'query-migrate' can report errors correctly.
> > With the introduction of the load-snapshot QMP commands, the need for
> > error reporting becomes more pressing.
> > 
> > Wiring up good error reporting is a large and difficult job, which
> > this series does NOT complete. The focus here has been on converting
> > all methods in savevm.c which have an 'int' return value capable of
> > reporting errors. This covers most of the infrastructure for controlling
> > the migration state serialization / protocol.
> > 
> > The remaining part that is missing error reporting are the callbacks in
> > the VMStateDescription struct which can return failure codes, but have
> > no "Error **errp" parameter. Thinking about how this might be dealt with
> > in future, a big bang conversion is likely non-viable. We'll probably
> > want to introduce a duplicate set of callbacks with the "Error **errp"
> > parameter and convert impls in batches, eventually removing the
> > original callbacks. I don't intend todo that myself in the immediate
> > future.
> > 
> > IOW, this patch series probably solves 50% of the problem, but we
> > still do need the rest to get ideal error reporting.
> > 
> > In doing this savevm conversion I noticed a bunch of places which
> > see and then ignore errors. I only fixed one or two of them which
> > were clearly dubious. Other places in savevm.c where it seemed it
> > was probably ok to ignore errors, I've left using error_report()
> > on the basis that those are really warnings. Perhaps they could
> > be changed to warn_report() instead.
> > 
> > There are alot of patches here, but I felt it was easier to review
> > for correctness if I converted 1 function at a time. The series
> > does not neccessarily have to be reviewed/appied in 1 go.
> 
> After this series, what do my errors look like, and where do they end
> up?
> Do I get my nice backtrace shwoing that device failed, then that was
> part of that one...

It hasn't modified any of the VMStateDescription callbacks so any
of the per-device logic that was printing errors will still be using
error_report to the console as before.

The errors that have changed (at this stage) are only the higher
level ones that are in the generic part of the code. Where those
errors mentioned a device name/ID they still do.

In some of the parts I've modified there will have been multiple
error_reports collapsed into one error_setg() but the ones that
are eliminated are high level generic messages with no useful
info, so I don't think loosing those is a problem per-se.

The example that I tested was the case where we load a snapshot
under a different config that we saved it with. This is the scenario
that gave the non-deterministic ordering in the iotest you disabled
from my previous series.

In that case, we changed from:

  qemu-system-x86_64: Unknown savevm section or instance 
'0000:00:02.0/virtio-rng' 0. Make sure that your current VM setup matches your 
saved VM setup, including any hotplugged devices
  {"return": [{"current-progress": 1, "status": "concluded", "total-progress": 
1, "type": "snapshot-load", "id": "load-err-stderr", "error": "Error -22 while 
loading VM state"}]}

To

  {"return": [{"current-progress": 1, "status": "concluded", "total-progress": 
1, "type": "snapshot-load", "id": "load-err-stderr", "error": "Unknown savevm 
section or instance '0000:00:02.0/virtio-rng' 0. Make sure that your current VM 
setup matches your saved VM setup, including any hotplugged devices"}]}

>From a HMP loadvm POV, this means instead of seeing

  (hmp)  loadvm foo
  Unknown savevm section or instance '0000:00:02.0/virtio-rng' 0. Make sure 
that your current VM setup matches your saved VM setup, including any 
hotplugged devices
  Error -22 while loading VM state

You will only see the detailed error message

  (hmp)  loadvm foo
  Unknown savevm section or instance '0000:00:02.0/virtio-rng' 0. Make sure 
that your current VM setup matches your saved VM setup, including any 
hotplugged devices

In this case I think loosing the "Error -22 while loading VM state"
is fine, as it didn't add value IMHO.


If we get around to converting the VMStateDescription callbacks to
take an error object, then I think we'll possibly need to stack the
error message from the callback, with the higher level message.

Do you have any familiar/good examples of error message stacking I
can look at ?  I should be able to say whether they would be impacted
by this series or not - if they are, then I hopefully only threw away
the fairly useless high level messages, like the "Error -22" message
above.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]