qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 00/33] migration: capture error reports into Error object


From: Daniel P . Berrangé
Subject: Re: [PATCH 00/33] migration: capture error reports into Error object
Date: Tue, 16 Feb 2021 09:30:52 +0000
User-agent: Mutt/2.0.5 (2021-01-21)

On Mon, Feb 15, 2021 at 07:01:28PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Mon, Feb 15, 2021 at 06:38:05PM +0000, Dr. David Alan Gilbert wrote:
> > > One thing to check, and I *think* you're OK, but we have one place where
> > > we actually check the error number:
> > > 
> > > migration.c:
> > > 3414 static MigThrError migration_detect_error(MigrationState *s)
> > > ...
> > > 3426     /* Try to detect any file errors */
> > > 3427     ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> > > 3428     if (!ret) {
> > > 3429         /* Everything is fine */
> > > 3430         assert(!local_error);
> > > 3431         return MIG_THR_ERR_NONE;
> > > 3432     }
> > > 3433 
> > > 3434     if (local_error) {
> > > 3435         migrate_set_error(s, local_error);
> > > 3436         error_free(local_error);
> > > 3437     }
> > > 3438 
> > > 3439     if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> > > 3440         /*
> > > 3441          * For postcopy, we allow the network to be down for a
> > > 3442          * while. After that, it can be continued by a
> > > 3443          * recovery phase.
> > > 3444          */
> > > 3445         return postcopy_pause(s);
> > > 3446     } else {
> > > 
> > > This is to go into postcopy pause if the network connection broke (but
> > > not if for example a device moaned about being in an invalid state)
> > > 
> > > If I read this correctly, file errors are still being preserved - is
> > > that correct?
> > 
> > Yes, in places where QemuFile is reporting an actual I/O error I've
> > tried to preserve that. Only removed setting of fake I/O errors. So
> > if anything, we ought to get more accurate at detecting the recoverable
> > scenarios once we fully cleanup errors.
> 
> OK, good.

One scenario to possibly check though is that in a few places we used
error_report_err() but didn't immediately return an error code back to
the caller, instead carrying on doing other calls. It is possible that
we thus reported an error about bad data, and then later hit the EIO
check for QemuFile.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]