qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] coroutine: cap per-thread local pool size


From: Stefan Hajnoczi
Subject: Re: [PATCH] coroutine: cap per-thread local pool size
Date: Tue, 19 Mar 2024 13:55:10 -0400

On Tue, Mar 19, 2024 at 01:43:32PM +0000, Daniel P. Berrangé wrote:
> On Mon, Mar 18, 2024 at 02:34:29PM -0400, Stefan Hajnoczi wrote:
> > The coroutine pool implementation can hit the Linux vm.max_map_count
> > limit, causing QEMU to abort with "failed to allocate memory for stack"
> > or "failed to set up stack guard page" during coroutine creation.
> > 
> > This happens because per-thread pools can grow to tens of thousands of
> > coroutines. Each coroutine causes 2 virtual memory areas to be created.
> 
> This sounds quite alarming. What usage scenario is justified in
> creating so many coroutines ?

The coroutine pool hides creation and deletion latency. The pool
initially has a modest size of 64, but each virtio-blk device increases
the pool size by num_queues * queue_size (256) / 2.

The issue pops up with large SMP guests (i.e. large num_queues) with
multiple virtio-blk devices.

> IIUC, coroutine stack size is 1 MB, and so tens of thousands of
> coroutines implies 10's of GB of memory just on stacks alone.
> 
> > Eventually vm.max_map_count is reached and memory-related syscalls fail.
> 
> On my system max_map_count is 1048576, quite alot higher than
> 10's of 1000's. Hitting that would imply ~500,000 coroutines and
> ~500 GB of stacks !

Fedora recently increased the limit to 1048576. Before that it was
65k-ish and still is on most other distros.

Regarding why QEMU might have 65k coroutines pooled, it's because the
existing coroutine pool algorithm is per thread. So if the max pool size
is 15k but you have 4 IOThreads then up to 4 x 15k total coroutines can
be sitting in pools. This patch addresses this by setting a small fixed
size on per thread pools (256).

> 
> > diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
> > index 5fd2dbaf8b..2790959eaf 100644
> > --- a/util/qemu-coroutine.c
> > +++ b/util/qemu-coroutine.c
> 
> > +static unsigned int get_global_pool_hard_max_size(void)
> > +{
> > +#ifdef __linux__
> > +    g_autofree char *contents = NULL;
> > +    int max_map_count;
> > +
> > +    /*
> > +     * Linux processes can have up to max_map_count virtual memory areas
> > +     * (VMAs). mmap(2), mprotect(2), etc fail with ENOMEM beyond this 
> > limit. We
> > +     * must limit the coroutine pool to a safe size to avoid running out of
> > +     * VMAs.
> > +     */
> > +    if (g_file_get_contents("/proc/sys/vm/max_map_count", &contents, NULL,
> > +                            NULL) &&
> > +        qemu_strtoi(contents, NULL, 10, &max_map_count) == 0) {
> > +        /*
> > +         * This is a conservative upper bound that avoids exceeding
> > +         * max_map_count. Leave half for non-coroutine users like library
> > +         * dependencies, vhost-user, etc. Each coroutine takes up 2 VMAs so
> > +         * halve the amount again.
> > +         */
> > +        return max_map_count / 4;
> 
> That's 256,000 coroutines, which still sounds incredibly large
> to me.

Any ideas for tweaking this heuristic?

> 
> > +    }
> > +#endif
> > +
> > +    return UINT_MAX;
> 
> Why UINT_MAX as a default ?  If we can't read procfs, we should
> assume some much smaller sane default IMHO, that corresponds to
> what current linux default max_map_count would be.

This line is not Linux-specific. I don't know if other OSes have an
equivalent to max_map_count.

I agree with defaulting to 64k-ish on Linux.

Stefan

> 
> > +}
> > +
> > +static void __attribute__((constructor)) qemu_coroutine_init(void)
> > +{
> > +    qemu_mutex_init(&global_pool_lock);
> > +    global_pool_hard_max_size = get_global_pool_hard_max_size();
> >  }
> > -- 
> > 2.44.0
> > 
> > 
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]