bug#39962: 27.0.90; Crash in Emacs 27.0.90

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#39962: 27.0.90; Crash in Emacs 27.0.90

From:	Eli Zaretskii
Subject:	bug#39962: 27.0.90; Crash in Emacs 27.0.90
Date:	Fri, 13 Mar 2020 11:39:54 +0200

> From: Pip Cet <pipcet@gmail.com>
> Date: Thu, 12 Mar 2020 20:36:10 +0000
> Cc: pieter-l@vanoostrum.org, 39962@debbugs.gnu.org, eggert@cs.ucla.edu
> 
> > > It doesn't affect visible behavior of any callers, except in the case
> > > where the previous behavior was buggy.
> >
> > I guess we have different notions of "visible"
> 
> Please say something about your notion of "visible". It doesn't affect
> any of the existing C callers of valid_lisp_object_p. Are you talking
> about printing valid_lisp_object_p(x) in a debugger, and not getting
> the expected value? Or something else?

I'm talking about the behavior documented in the commentary.

> > and "buggy".
> 
> It avoids segfaults or random memory corruption. How is that not "buggy"?

That's not the issue here.  You said the proposed change didn't change
the behavior "except where it was buggy"; I'm saying that it changes
the behavior unrelated to this bug, where previous behavior was not
buggy by any measure.

> > > I'm most certainly not changing the semantics of live_buffer, if
> > > that's what you're worried about. I am changing the semantics of
> > > live_buffer_p, which is an internal function, and my initial patch
> > > also changed the return value of valid_lisp_object_p, to another value
> > > that would be treated equivalently. If there are objections to that,
> > > we can easily distinguish the two cases.
> >
> > I actually don't understand why we need to make such a change.
> 
> Which change? Treating the two cases differently? Because the garbage
> collector needs to know whether an object is reachable, not whether
> it's still a live buffer.

So why does it not consider the buffer reachable in this case?  The
call to live_buffer_p is just one attempt to identify it as reachable;
there are (or at least should be) others.

> valid_lisp_object_p is currently documented to return 2 for a killed
> buffer and 1 for a live buffer, which is weird since they're both
> valid. It also returns 1 for some fake objects which aren't actually
> valid:
> 
> (gdb) p current_thread->m_current_buffer
> $3 = (struct buffer *) 0x555556694b10
> (gdb) p valid_lisp_object_p(0x555556694b15)
> $4 = 1
> (gdb) p valid_lisp_object_p(0x555556694b25)
> $5 = 1

Why do you consider this incorrect?  The Emacs GC is "conservative",
which means it doesn't collect anything that _might_ be a valid Lisp
object.  In what ways does the above violate that contract?

> If a buffer has been killed but is reachable only through
> mark_maybe_object, we fail to mark it.
> 
> We should mark it. In fact, whether a buffer object is marked should
> depend only on whether it's reachable, not whether it's "live" in some
> other sense.
> 
> That's all my patch does.

Your patch modifies the notion of whether a buffer is "live", on the
assumption that this is the root cause of the failure to mark it.  But
do we have any evidence that this is the root cause?  Because if not,
your patch might just be a band-aid, and the real root cause will
still be out there, even if we apply the patch.

Moreover, by disregarding the indication of a killed buffer, doesn't
your patch cause us not to GC killed buffers even though they are
unreachable, or at least create a danger that we would?  Buffers are
objects that are created and killed a lot in any Emacs session, so
failing to GC them would mean memory leaks.

The way to understand what happened in your test case is to figure out
how come the buffer was not found to be reachable via any other
approach the GC makes.  For example, shouldn't we have this buffer
somewhere on the stack? if so, how come stack marking didn't find it?
And if we don't have it on the stack, why not?

> > > > The problem you are trying to solve is rare
> > >
> > > I think it would become much less rare with lexical binding in effect,
> > > at least when the code's byte-compiled.
> >
> > That remains to be seen.
> 
> How about we put out the fire rather than waiting to see whether it
> causes any damage?

The disagreement is whether there's fire, not whether we should put it
out if there is.  You've shown that you can start a fire if you want,
but not that the fire is already out there, burning.  E.g., I see no
reason for some Lisp program to do what your test case does, it simply
makes no sense.

> And, if we can agree to do so, what would you like a patch which is
> actually meant for inclusion into the emacs-27 branch (my previous
> patch wasn't, obviously) to look like?

If it isn't clear, I'm saying that your proposed patch is not
necessarily TRT for master, either.  I'd like to see more analysis of
what exactly happens in that case, and why, along the above-mentioned
lines, before I make up my mind.

> > > > since this code was with us since 20 years ago without
> > > > anyone bumping into it,
> > >
> > > That we know of. They might have just accrued it to random Emacs crashes.
> >
> > Then again, they might not.  We don't really have any evidence to that
> > effect, all we know is that the code survived virtually intact since
> > the day it was written.
> 
> I have no idea what you're trying to get at here.

I'm saying that we have no evidence on which to base the arguments, so
I suggest to drop this part of the dispute as counter-productive.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#39962: 27.0.90; Crash in Emacs 27.0.90, (continued)

Prev by Date: bug#36672: 27.0.50; NS build: Creating child frame leads to empty space
Next by Date: bug#39911: 27.0.90; error in wisent-total-conflicts if no source filename
Previous by thread: bug#39962: 27.0.90; Crash in Emacs 27.0.90
Next by thread: bug#39962: 27.0.90; Crash in Emacs 27.0.90
Index(es):
- Date
- Thread