qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 28/29] migration: Add direct-io parameter


From: Fabiano Rosas
Subject: Re: [PATCH v2 28/29] migration: Add direct-io parameter
Date: Mon, 30 Oct 2023 19:51:34 -0300

Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, Oct 25, 2023 at 11:32:00AM -0300, Fabiano Rosas wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Tue, Oct 24, 2023 at 04:32:10PM -0300, Fabiano Rosas wrote:
>> >> Markus Armbruster <armbru@redhat.com> writes:
>> >> 
>> >> > Fabiano Rosas <farosas@suse.de> writes:
>> >> >
>> >> >> Add the direct-io migration parameter that tells the migration code to
>> >> >> use O_DIRECT when opening the migration stream file whenever possible.
>> >> >>
>> >> >> This is currently only used for the secondary channels of fixed-ram
>> >> >> migration, which can guarantee that writes are page aligned.
>> >> >>
>> >> >> However the parameter could be made to affect other types of
>> >> >> file-based migrations in the future.
>> >> >>
>> >> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> >
>> >> > When would you want to enable @direct-io, and when would you want to
>> >> > leave it disabled?
>> >> 
>> >> That depends on a performance analysis. You'd generally leave it
>> >> disabled unless there's some indication that the operating system is
>> >> having trouble draining the page cache.
>> >
>> > That's not the usage model I would suggest.
>> >
>> 
>> Hehe I took a shot at answering but I really wanted to say "ask Daniel".
>> 
>> > The biggest value of the page cache comes when it holds data that
>> > will be repeatedly accessed.
>> >
>> > When you are saving/restoring a guest to file, that data is used
>> > once only (assuming there's a large gap between save & restore).
>> > By using the page cache to save a big guest we essentially purge
>> > the page cache of most of its existing data that is likely to be
>> > reaccessed, to fill it up with data never to be reaccessed.
>> >
>> > I usually describe save/restore operations as trashing the page
>> > cache.
>> >
>> > IMHO, mgmt apps should request O_DIRECT always unless they expect
>> > the save/restore operation to run in quick succession, or if they
>> > know that the host has oodles of free RAM such that existing data
>> > in the page cache won't be trashed, or
>> 
>> Thanks, I'll try to incorporate this to some kind of doc in the next
>> version.
>> 
>> > if the host FS does not support O_DIRECT of course.
>> 
>> Should we try to probe for this and inform the user?
>
> qemu_open_internall will already do a nice error message. If it gets
> EINVAL when using O_DIRECT, it'll retry without O_DIRECT and if that
> works, it'll reoprt "filesystem does not support O_DIRECT"
>
> Having said that I see a problem with /dev/fdset handling, because
> we're only validating O_ACCMODE and that excludes O_DIRECT.
>
> If the mgmt apps passes an FD with O_DIRECT already set, then it
> won't work for VMstate saving which is unaligned.
>
> If the mgmt app passes an FD without O_DIRECT set, then we are
> not setting O_DIRECT for the multifd RAM threads.

I could use some advice on how to solve this situation. The fdset code
at monitor/fds.c and the add-fd command don't seem to be usable outside
the original use-case of passing fds with different open flags.

There are several problems, the biggest one being that there's no way to
manipulate the set of file descriptors aside from asking for duplication
of an fd that matches a particular set of flags.

That doesn't work for us because the two fds we need (one for main
channel, other for secondary channels) will have the same open flags. So
the fdset code will always return the first one it finds in the set.

Another problem (or feature) of the fdset code is that it only removes
an fd when qmp_remove_fd() is called if the VM runstate is RUNNING,
which means that the migration code cannot remove the fds after use
reliably. We need to be able to remove them to make sure we use the
correct fds in a subsequent migration.

I see some paths forward:

1) Require the user to put the fds in separate fdsets.

  This would be the easiest to handle in the migration code, but we
  would have to come up with special file: URL syntax to accomodate more
  than one fdset. Perhaps "file:/dev/fdsets/1,2" ?

2) Require the two fds in the same fdset and separate them ourselves.

  This would require extending the fdset code to allow more ways of
  manipulating the fdset. There's two options here:

  a) Implement a pop() operation in the fdset code. We receive the
     fdset, pop one fd from it and put it in a new fdset. I did some
     experimentation with this by having an fd->present flag and just
     skipping the fd during query-fdsets and
     monitor_fdset_dup_fd_add(). It works, but it's convoluted.

  b) Add support for removing the original fd when given the dup()ed
     fd. The list of duplicated fds is currently by-fdset and not
     by-original-fd, so this would require a larger code change.

3) Design a whole new URI.

  Here, there are the usual benefits and drawbacks of doing something
  from scratch. With the added drawback of dissociating from the file:
  URI which is already well tested and easy to use when doing QEMU-only
  migration.


With the three options above there's still the issue of removing the
fd. I think the original commit[1] might have been mistaken in adding
the runstate_is_running() check for *both* the "removed = true" clause
and the "fd was never duplicated" clause. But it's hard to tell since
this whole feature is a bit opaque to me.

1- ebe52b592d (monitor: Prevent removing fd from set during init)
https://gitlab.com/qemu-project/qemu/-/commit/ebe52b592d

All in all, I'm inclined to consider the first option, unless someone
has a better idea. Assuming we can figure out the removal issue, that
is.

Thoughts?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]