qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 10/19] migration: Postcopy preemption enablement


From: manish.mishra
Subject: Re: [PATCH v4 10/19] migration: Postcopy preemption enablement
Date: Sat, 14 May 2022 00:23:44 +0530
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0


On 12/05/22 9:52 pm, Peter Xu wrote:
Hi, Manish,

On Wed, May 11, 2022 at 09:24:28PM +0530, manish.mishra wrote:
@@ -1962,9 +2038,17 @@ static bool get_queued_page(RAMState *rs, 
PageSearchStatus *pss)
       RAMBlock  *block;
       ram_addr_t offset;
+again:
       block = unqueue_page(rs, &offset);
-    if (!block) {
+    if (block) {
+        /* See comment above postcopy_preempted_contains() */
+        if (postcopy_preempted_contains(rs, block, offset)) {
+            trace_postcopy_preempt_hit(block->idstr, offset);
+            /* This request is dropped */
+            goto again;
+        }
If we continuosly keep on getting new post-copy request. Is it possible this
case can starve post-copy request which is in precopy preemtion?
I didn't fully get your thoughts, could you elaborate?

Here we're checking against the case where the postcopy requested page is
exactly the one that we have preempted in previous precopy sessions.  If
true, we drop this postcopy page and continue with the rest.

When there'll be no postcopy requests pending then we'll continue with the
precopy page, which is exactly the request we've dropped.

Why we did this is actually in comment above postcopy_preempted_contains(),
and quotting from there:

/*
  * This should really happen very rarely, because it means when we were sending
  * during background migration for postcopy we're sending exactly the page that
  * some vcpu got faulted on on dest node.  When it happens, we probably don't
  * need to do much but drop the request, because we know right after we restore
  * the precopy stream it'll be serviced.  It'll slightly affect the order of
  * postcopy requests to be serviced (e.g. it'll be the same as we move current
  * request to the end of the queue) but it shouldn't be a big deal.  The most
  * imporant thing is we can _never_ try to send a partial-sent huge page on the
  * POSTCOPY channel again, otherwise that huge page will got "split brain" on
  * two channels (PRECOPY, POSTCOPY).
  */

[...]

Hi Peter, what i meant here is that as we go to precopy sending only when there is

no post-copy request left so if there is some workload which is continuosly generating

new post-copy fault request, It may take very long before we resume on precopy channel.

So basically precopy channel may have a post-copy request pending for very long in

this case? Earlier as it was FCFS there was a guarantee a post-copy request will be

served after a fixed amount of time.

@@ -2211,7 +2406,16 @@ static int ram_save_host_page(RAMState *rs, 
PageSearchStatus *pss)
           return 0;
       }
+    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
I see why there is only one extra channel, multiFD is not supported for
postcopy. Peter, Any particular reason for that.
We used one channel not because multifd is not enabled - if you read into
the series the channels are separately managed because they're servicing
different goals.  It's because I don't really know whether multiple
channels would be necessary, because postcopy requests should not be the
major channel that pages will be sent, kind of a fast-path.

One of the major goal of this series is to avoid interruptions made to
postcopy urgent pages due to sending of precopy pages.  One extra channel
already serviced it well, so I just stopped there as the initial version.
I actually raised that question myself too in the cover letter in the todo
section, I think we can always evaluate the possibility of that in the
future without major reworks (but we may need another parameter to specify
the num of threads just like multifd).

>because postcopy requests should not be the major channel that pages will be sent, kind of a fast-path.

Yes, agree Peter, but in worst case scenario it is possible we may have to transfer full memory of VM

by post-copy requests? So in that case we may require higher number of threads. But agree there can not be

be binding with number of mutliFD channels as multiFD uses 256KB buffer size but here we may have to 4k

in small page case so there can be big diff in throughput limits. Also smaller the buffer size much higher will

be cpu usage so it needs to be decided carefully.

As it must be very bad without multiFD, we have seen we can not utilise NIC
higher than 10 Gbps without multiFD. If it

is something in TODO can we help with that?
Yes, that should be on Juan's todo list (in the cc list as well), and
AFAICT he'll be happy if anyone would like to take items out of the list.
We can further discuss it somewhere.

One thing to mention is that I suspect the thread models will still need to
be separate even if multifd joins the equation.  I mean, IMHO multifd
threads take chunks of pages and send things in bulk, while if you read
into this series postcopy preempt threads send page one by one and asap.
The former cares on throughput and latter cares latency.  When we design
the mix of postcopy+multifd it'll be great we also keep this in mind so
hopefully it'll make postcopy+multifd+preempt easier at last.
yes, got it, thanks

Thanks,




reply via email to

[Prev in Thread] Current Thread [Next in Thread]