qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] util: NUMA aware memory preallocation


From: Daniel P . Berrangé
Subject: Re: [PATCH] util: NUMA aware memory preallocation
Date: Wed, 11 May 2022 10:19:59 +0100
User-agent: Mutt/2.1.5 (2021-12-30)

On Tue, May 10, 2022 at 08:55:33AM +0200, Michal Privoznik wrote:
> When allocating large amounts of memory the task is offloaded
> onto threads. These threads then use various techniques to
> allocate the memory fully (madvise(), writing into the memory).
> However, these threads are free to run on any CPU, which becomes
> problematic on NUMA machines because it may happen that a thread
> is running on a distant node.
> 
> Ideally, this is something that a management application would
> resolve, but we are not anywhere close to that, Firstly, memory
> allocation happens before monitor socket is even available. But
> okay, that's what -preconfig is for. But then the problem is that
> 'object-add' would not return until all memory is preallocated.

Is the delay to 'object-add' actually a problem ?

Currently we're cold plugging the memory backends, so prealloc
happens before QMP is available. So we have a delay immediately
at startup. Switching to -preconfig plus 'object-add'  would
not be making the delay worse, merely moving it ever so slightly
later.

With the POV of an application using libvirt, this is the same.
virDomainCreate takes 1 hour, regardless of whether the 1 hour
allocatinon delay is before QMP or in -preconfig object-add
execution.

> Long story short, management application has no way of learning
> TIDs of allocator threads so it can't make them run NUMA aware.

This feels like the key issue. The preallocation threads are
invisible to libvirt, regardless of whether we're doing coldplug
or hotplug of memory-backends. Indeed the threads are invisible
to all of QEMU, except the memory backend code.

Conceptually we need 1 or more explicit worker threads, that we
can assign CPU affinity to, and then QEMU can place jobs on them.
I/O threads serve this role, but limited to blockdev work. We
need a generalization of I/O threads, for arbitrary jobs that
QEMU might want to farm out to specific numa nodes.

In a guest spanning multiple host NUMA nodes, libvirt would
have to configure 1 or more worker threads for QEMU, learn
their TIDs,then add the memory backends in -preconfig, which
would farm our preallocation to the worker threads, with
job placement matching the worker's affinity.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]