Re: [PATCH v4 10/19] migration: Postcopy preemption enablement

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 10/19] migration: Postcopy preemption enablement

From:	Peter Xu
Subject:	Re: [PATCH v4 10/19] migration: Postcopy preemption enablement
Date:	Thu, 12 May 2022 12:22:19 -0400

Hi, Manish,

On Wed, May 11, 2022 at 09:24:28PM +0530, manish.mishra wrote:
> > @@ -1962,9 +2038,17 @@ static bool get_queued_page(RAMState *rs, 
> > PageSearchStatus *pss)
> >       RAMBlock  *block;
> >       ram_addr_t offset;
> > +again:
> >       block = unqueue_page(rs, &offset);
> > -    if (!block) {
> > +    if (block) {
> > +        /* See comment above postcopy_preempted_contains() */
> > +        if (postcopy_preempted_contains(rs, block, offset)) {
> > +            trace_postcopy_preempt_hit(block->idstr, offset);
> > +            /* This request is dropped */
> > +            goto again;
> > +        }
> If we continuosly keep on getting new post-copy request. Is it possible this
> case can starve post-copy request which is in precopy preemtion?

I didn't fully get your thoughts, could you elaborate?

Here we're checking against the case where the postcopy requested page is
exactly the one that we have preempted in previous precopy sessions.  If
true, we drop this postcopy page and continue with the rest.

When there'll be no postcopy requests pending then we'll continue with the
precopy page, which is exactly the request we've dropped.

Why we did this is actually in comment above postcopy_preempted_contains(),
and quotting from there:

/*
 * This should really happen very rarely, because it means when we were sending
 * during background migration for postcopy we're sending exactly the page that
 * some vcpu got faulted on on dest node.  When it happens, we probably don't
 * need to do much but drop the request, because we know right after we restore
 * the precopy stream it'll be serviced.  It'll slightly affect the order of
 * postcopy requests to be serviced (e.g. it'll be the same as we move current
 * request to the end of the queue) but it shouldn't be a big deal.  The most
 * imporant thing is we can _never_ try to send a partial-sent huge page on the
 * POSTCOPY channel again, otherwise that huge page will got "split brain" on
 * two channels (PRECOPY, POSTCOPY).
 */

[...]

> > @@ -2211,7 +2406,16 @@ static int ram_save_host_page(RAMState *rs, 
> > PageSearchStatus *pss)
> >           return 0;
> >       }
> > +    if (migrate_postcopy_preempt() && migration_in_postcopy()) {
> 
> I see why there is only one extra channel, multiFD is not supported for
> postcopy. Peter, Any particular reason for that.

We used one channel not because multifd is not enabled - if you read into
the series the channels are separately managed because they're servicing
different goals.  It's because I don't really know whether multiple
channels would be necessary, because postcopy requests should not be the
major channel that pages will be sent, kind of a fast-path.

One of the major goal of this series is to avoid interruptions made to
postcopy urgent pages due to sending of precopy pages.  One extra channel
already serviced it well, so I just stopped there as the initial version.
I actually raised that question myself too in the cover letter in the todo
section, I think we can always evaluate the possibility of that in the
future without major reworks (but we may need another parameter to specify
the num of threads just like multifd).

> 
> As it must be very bad without multiFD, we have seen we can not utilise NIC
> higher than 10 Gbps without multiFD. If it
> 
> is something in TODO can we help with that?

Yes, that should be on Juan's todo list (in the cc list as well), and
AFAICT he'll be happy if anyone would like to take items out of the list.
We can further discuss it somewhere.

One thing to mention is that I suspect the thread models will still need to
be separate even if multifd joins the equation.  I mean, IMHO multifd
threads take chunks of pages and send things in bulk, while if you read
into this series postcopy preempt threads send page one by one and asap.
The former cares on throughput and latter cares latency.  When we design
the mix of postcopy+multifd it'll be great we also keep this in mind so
hopefully it'll make postcopy+multifd+preempt easier at last.

Thanks,

-- 
Peter Xu

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v4 10/19] migration: Postcopy preemption enablement, manish.mishra, 2022/05/11
- Re: [PATCH v4 10/19] migration: Postcopy preemption enablement, Peter Xu <=
  - Re: [PATCH v4 10/19] migration: Postcopy preemption enablement, manish.mishra, 2022/05/13
    - Re: [PATCH v4 10/19] migration: Postcopy preemption enablement, Peter Xu, 2022/05/13

Prev by Date: Re: [PULL 06/13] nbd: remove peppering of nbd_client_connected
Next by Date: Re: [PULL 12/15] migration/block: rename BLOCK_SIZE macro
Previous by thread: Re: [PATCH v4 10/19] migration: Postcopy preemption enablement
Next by thread: Re: [PATCH v4 10/19] migration: Postcopy preemption enablement
Index(es):
- Date
- Thread