qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving"


From: William Roche
Subject: Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase
Date: Wed, 20 Sep 2023 14:11:35 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.0

Thank you Zhijian for your feedback.

So I'll try to push this change today.

Cheers,
William.


On 9/20/23 12:04, Zhijian Li (Fujitsu) wrote:


On 15/09/2023 19:31, William Roche wrote:
On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote:


I'm okay with "RDMA isn't touched".
BTW, could you share your reproducing program/hacking to poison the page, so 
that
i am able to take a look the RDMA part later when i'm free.

Not sure it's suitable to acknowledge a not touched part. Anyway
Acked-by: Li Zhijian <lizhijian@fujitsu.com> # RDMA


Thanks.
As you asked for a procedure to inject memory errors into a running VM,
I've attached to this email the source code (mce_process_react.c) of a
program that will help to target the error injection in the VM.


I just tried you hwpoison program and do RDMA migration. Migration failed, but 
fortunately
the source side is still alive :).

(qemu) Failed to register chunk!: Bad address
Chunk details: block: 0 chunk index 671 start 139955096518656 end 
139955097567232 host 139955096518656 local 139954392924160 registrations: 636
qemu-system-x86_64: cannot get lkey
qemu-system-x86_64: rdma migration: write error! -22
qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): -22
qemu-system-x86_64: Early error. Sending error.


Since current RDMA migration transfers guest memory in a chunk size(1M) by 
default, we may need to

option 1: reduce all chunk size to 1 page
option 2: handle the hwpoison chunk specially

However, because there may be a chance to use another protocol, it's also 
possible to temporarily not fix the issue.

Tested-by: Li Zhijian <lizhijian@fujitsu.com>

Thanks
Zhijian



reply via email to

[Prev in Thread] Current Thread [Next in Thread]