qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: towards a workable O_DIRECT outmigration to a file


From: Claudio Fontana
Subject: Re: towards a workable O_DIRECT outmigration to a file
Date: Thu, 18 Aug 2022 20:13:36 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0

On 8/18/22 18:31, Dr. David Alan Gilbert wrote:
> * Claudio Fontana (cfontana@suse.de) wrote:
>> On 8/18/22 14:38, Dr. David Alan Gilbert wrote:
>>> * Nikolay Borisov (nborisov@suse.com) wrote:
>>>> [adding Juan and David to cc as I had missed them. ]
>>>
>>> Hi Nikolay,
>>>
>>>> On 11.08.22 г. 16:47 ч., Nikolay Borisov wrote:
>>>>> Hello,
>>>>>
>>>>> I'm currently looking into implementing a 'file:' uri for migration save
>>>>> in qemu. Ideally the solution will be O_DIRECT compatible. I'm aware of
>>>>> the branch https://gitlab.com/berrange/qemu/-/tree/mig-file. In the
>>>>> process of brainstorming how a solution would like the a couple of
>>>>> questions transpired that I think warrant wider discussion in the
>>>>> community.
>>>
>>> OK, so this seems to be a continuation with Claudio and Daniel and co as
>>> of a few months back.  I'd definitely be leaving libvirt sides of the
>>> question here to Dan, and so that also means definitely looking at that
>>> tree above.
>>
>> Hi Dave, yes, Nikolai is trying to continue on the qemu side.
>>
>> We have something working with libvirt for our short term needs which offers 
>> good performance,
>> but it is clear that that simple solution is barred for upstream libvirt 
>> merging.
>>
>>
>>>
>>>>> First, implementing a solution which is self-contained within qemu would
>>>>> be easy enough( famous last words) but the gist is one  has to only care
>>>>> about the format within qemu. However, I'm being told that what libvirt
>>>>> does is prepend its own custom header to the resulting saved file, then
>>>>> slipstreams the migration stream from qemu. Now with the solution that I
>>>>> envision I intend to keep all write-related logic inside qemu, this
>>>>> means there's no way to incorporate the logic of libvirt. The reason I'd
>>>>> like to keep the write process within qemu is to avoid an extra copy of
>>>>> data between the two processes (qemu outging migration and libvirt),
>>>>> with the current fd approach qemu is passed an fd, data is copied
>>>>> between qemu/libvirt and finally the libvirt_iohelper writes the data.
>>>>> So the question which remains to be answered is how would libvirt make
>>>>> use of this new functionality in qemu? I was thinking something along
>>>>> the lines of :
>>>>>
>>>>> 1. Qemu writes its migration stream to a file, ideally on a filesystem
>>>>> which supports reflink - xfs/btrfs
>>>>>
>>>>> 2. Libvirt writes it's header to a separate file
>>>>> 2.1 Reflinks the qemu's stream right after its header
>>>>> 2.2 Writes its trailer
>>>>>
>>>>> 3. Unlink() qemu's file, now only libvirt's file remains on-disk.
>>>>>
>>>>> I wouldn't call this solution hacky though it definitely leaves some
>>>>> bitter aftertaste.
>>>
>>> Wouldn't it be simpler to tell libvirt to write it's header, then tell
>>> qemu to append everything?
>>
>> I would think so as well. 
>>
>>>
>>>>> Another solution would be to extend the 'fd:' protocol to allow multiple
>>>>> descriptors (for multifd) support to be passed in. The reason dup()
>>>>> can't be used is because in order for multifd to be supported it's
>>>>> required to be able to write to multiple, non-overlapping regions of the
>>>>> file. And duplicated fd's share their offsets etc. But that really seems
>>>>> more or less hacky. Alternatively it's possible that pwrite() are used
>>>>> to write to non-overlapping regions in the file. Any feedback is
>>>>> welcomed.
>>>
>>> I do like the idea of letting fd: take multiple fd's.
>>
>> Fine in my view, I think we will still need then a helper process in libvirt 
>> to merge the data into a single file, no?
>> In case the libvirt multifd to single file multithreaded helper I proposed 
>> before is helpful as a reference you could reuse/modify those patches.
> 
> Eww that's messy isn't it.
> (You don't fancy a huge sparse file do you?)
> 
>> Maybe this new way will be acceptable to libvirt,
>> ie avoiding the multifd code -> socket, but still merging the data from the 
>> multiple fds into a single file?
> 
> It feels to me like the problem here is really what we want is something
> closer to a dump than the migration code; you don't need all that
> overhead of the code to deal with live migration bitmaps and dirty pages

well yes you are right, we don't care about live migration bitmaps and dirty 
pages,
but we don't incur in any of that anyway since (at least for what I have in 
mind, virsh save and restore),
the VM is stopped.

> that aren't going to happen.
> Something that just does a nice single write(2) (for each memory
> region);
> and then ties the device state on.

ultimately yes, it's the same thing though, whether we trigger it via migrate 
fd: or via another non-migration-related mechanism,
any approach would work.

Ciao,

C

> 
> Dave
> 
>>>
>>> Dave
>>>
>>
>> Thanks for your comments,
>>
>> Claudio
>>>>>
>>>>>
>>>>> Regards,
>>>>> Nikolay
>>>>
>>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]