emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Opportunistic GC


From: Pip Cet
Subject: Re: Opportunistic GC
Date: Wed, 10 Mar 2021 19:38:08 +0000

On Wed, Mar 10, 2021 at 5:18 PM Matt Armstrong <matt@rfc20.org> wrote:
> Pip Cet <pipcet@gmail.com> writes:
> > On Mon, Mar 8, 2021 at 3:37 AM Stefan Monnier <monnier@iro.umontreal.ca> 
> > wrote:
> >> I've been running with the code below recently to try and see if
> >> opportunistic GC can be worth the trouble.
> >
> > Just a random idea: What if we exploit the fact most people have more
> > than one CPU core these days, garbage-collect in a separate fork()ed
> > process, then do only the sweeping in the main process?
> >
> > Immediate advantage: we'd never mark objects in the "real" Emacs
> > process, making debugging easier.
> >
> > My implementation proposal would be to pipe(), fork(), run GC, then
> > send through the pipe the Lisp_Objects which can be collected by the
> > original Emacs.
> >
> > For me, it was a bit difficult to see that this would indeed be safe,
> > but I'm now pretty convinced it would be: objects that are unreachable
> > in the child Emacs cannot become reachable in the parent Emacs (they
> > might show up on the stack, but that's a false positive).
> >
> > We'd have to be careful not to run two "real" GC processes at the same
> > time.
>
> Hi Pip,
>
> Neat idea.

Thank you.

> Are there examples of other GC implementations using this approach?

I'm not aware of any, but that doesn't mean much.

> Perhaps this has been tried before, elsewhere, and we could learn a lot
> from that.

I'm certain it has.

> I looked for myself and found
> https://dlang.org/blog/2019/07/22/symmetry-autumn-of-code-experience-report-porting-a-fork-based-gc/
> which talk about this idea in the context of the D run time.  This page
> mentions a different idea: doing the mark pass with multiple threads.
> This might be worth exploring; the two ideas are composable with
> each other.

Absolutely. What they're not easily composable with is a copying
garbage collector, but Emacs might not require one of those as much as
other applications do.

Again, I think this is, most of all, cheap in terms of implementation
time. Not as cheap as I thought initially, I confess, but I'm willing
to argue that this might be a good thing: we need to be clearer about
what's GC and what isn't, and move other clean-up tasks out of the
garbage collector.

There's also an additional problem with running the GC "fully"
asynchronously, which I define as collecting only some of the objects
it returns as unreachable before starting the next cycle. I'm not sure
we ever really want to do that, but I thought it would be nice if we
could. The problem is that A and B may both be unreachable, A might
reference B, but only B might get collected, leaving a dangling
pointer in A. Then A might become maybe-reachable again, by being on
the stack as a false positive, and we end up seeing nasty corruption
bugs. This actually did happen to me while playing with this patch, so
at least I learned my "don't play with snakes" lesson.

That means we need to either assume that the forked thread will always
terminate properly, and abort otherwise, or we need to wait for it to
produce all of its output before starting to act on it. The latter is
problematic because GC should be about freeing memory, not allocating
more memory to begin with. The former is problematic because it makes
things like automated nice level adjustments for the worker process
much more difficult: instead of going "oh, we were being too nice,
kill the thread and start over", we'd have to un-nice it, probably to
a much higher priority, once we conclude it's taking too long.

Pip



reply via email to

[Prev in Thread] Current Thread [Next in Thread]