Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal

From:	Eli Zaretskii
Subject:	Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Date:	Tue, 20 Dec 2022 17:16:18 +0200

> From: Po Lu <luangruo@yahoo.com>
> Cc: emacs-devel@gnu.org
> Date: Tue, 20 Dec 2022 09:39:13 +0800
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> From: Po Lu <luangruo@yahoo.com>
> >> Cc: emacs-devel@gnu.org
> >> Date: Mon, 19 Dec 2022 09:56:40 +0800
> >> 
> >> Eli Zaretskii <eliz@gnu.org> writes:
> >> 
> >> > But AFAIK it is _always_ safe for unblock_input to signal.  When do
> >> > you think it isn't?
> >> 
> >> In almost all of the helper functions where it is called:
> >> tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.
> >
> > I don't think I understand why.  Let's take tear_down_x_back_buffer,
> > for example: why isn't it safe to call unblock_input from it?
> > tear_down_x_back_buffer is called:
> >
> >   . from x_set_inhibit_double_buffering, which is a frame parameter
> >     handler, so it is only called from the Lisp thread
> >   . from x_free_frame_resources, which AFAIU is only called from the
> >     Lisp thread, when we delete a frame
> >
> > So what is the problem here? what am I missing?
> 
> If unblock_input signals inside, then the double buffering state could
> be left inconsistent.  For example, the back buffer picture could be
> freed, but the back buffer itself may be left present, which can lead to
> crashes later on.  That it is called from the only thread present does
> not make it safe.
> 
> More generally, C code does not expect unblock_input to signal!

That is definitely incorrect, not in the general form you are
presenting it.  unblock_input processes input events, and those can
legitimately signal errors.  Other places in C call other functions
that can signal.  C code which cannot allow control to escape it via
non-local exits must use the record_unwind_protect machinery to set up
suitable unwinders.

I very much hope that all your work on new X features and in
particular on the new input events and XInput2 was not based on the
above wrong assumption -- that unblock_input can never signal (or
should never signal).

But we moved far away from the original issue, and are in fact
discussing a very far (and much less important, from my POV) tangent.
Let's not lose our focus, okay?  The original issue was whether we can
have code which calls handle_one_xevent and other similar functions,
which can call unblock_input and thus can signal an error, from
threads other than the main thread, or from callbacks we install that
are then called by toolkits from a non-main thread.  By contrast, the
past several messages were discussing whether unblock_input is allowed
to signal even if called from the main thread.  And that is an
entirely different issue.

It is a different issue because my point is simple: calls to
handle_one_xevent etc. from non-main threads _cannot_ be made safe in
Emacs, no matter what you do.  So it's an absolute no-no.  By
contrast, if we call unblock_input from the main thread, and the code
which calls it doesn't expect signals, all we need is either to
restructure the caller, or use record_unwind_protect.  Thus, even if
you are absolutely right about all of the cases you show below -- and
I'm not convinced you are right in all of them, but even if you are --
even if all of them are indeed unsafe in the face of Lisp errors, all
it means that we must examine them one by one and make them safe by
one of the above mentioned techniques.

So I would like to return to the main issue: calls to
handle_one_xevent from non-main threads.  That must not happen in
Emacs, or else the corresponding build will be fragile and can crash
given the opportune conditions.  If you agree with that, let's discuss
how to make those configurations safer.  If you don't agree, please
explain why.

> Being called from the Lisp thread does not make it safe to signal!

Not by itself, it doesn't.  But it definitely makes it fixable, so
that it _can_ be safe.  By contrast, calling the Lisp machine from
another thread _cannot_ be fixed.  It's a time bomb that will
definitely go off.

> If the last unblock_input signals, match (allocated by FcFontMatch)
> leaks.

Yes, code which doesn't expect errors might leak descriptors or memory
or other resources.  But resource leaks are very rarely fatal, unless
you leak a lot of them: modern machines have enough memory and file
descriptors to let us leak them for many days before disaster
strikes.  By contrast, a single error signaled from the wrong thread
will immediately crash Emacs because it longjmp's to the wrong stack!

> >> > You cannot request that.  note_mouse_highlight examines properties,
> >> > and that can always signal, because properties are Lisp creatures and
> >> > can invoke Lisp.
> >> 
> >> Could you please explain how?
> >
> > Which part is unclear?
> 
> Which properties involve Lisp execution, and why the evaluation of that
> Lisp cannot be put inside internal_catch_all.

You've just seen that: we examine text properties and overlays in a
buffer whose restriction could have changed behind our backs, what
with today's proliferation of calls into Lisp from very deep and
low-level functions.  We call Lisp in keyboard.c and in xdisp.c and in
insdel.c and in window.c.

> > These calls will all have to go, then.  Maybe we should use some
> > simplified, stripped down version of handle_one_xevent, which doesn't
> > invoke processing that can access Lisp data and the global state.  Why
> > would a toolkit menu widget need to call note_mouse_highlight, which
> > is _our_ function that implements _our_ features, about which GTK (or
> > any other toolkit) knows absolutely nothing??
> 
> The toolkit menu mostly demands that Emacs set the cursor to the
> appropriate cursor for whatever the pointer is under.

I cannot parse this: "set the cursor to the appropriate cursor"?

> The X server can also demand Emacs expose parts of the frame at any
> time, due to window configuration changes.

We already handle this, by setting a flag in the SIGIO handler, and
then calling expose_frame when it is safe.  What are the problems you
see here if we queue these events instead of processing them
immediately?

> > If it's impossible to do safely, it will not work.  It already doesn't
> > work in popup menus on w32, for similar reasons.  That's a minor
> > inconvenience to users, and a much smaller catastrophe than making
> > these unsafe calls.
> 
> But it worked since the beginning of 2006 or so, when xmenu.c was made
> to call handle_one_xevent.

"Worked" in the sense that we maybe didn't hear about problems, or
heard too few of them?  That doesn't mean the problems don't exist,
just that people don't come immediately running here and complain.

Besides, PGTK is very young, definitely not around since 2006.  Give
it some time.

> Anyway, the "minor inconvenience" can be avoided by simply wrapping
> note_mouse_highlight in internal_catch_all.  Why do you think that
> would be too harsh?

You cannot use that machinery reliably in a non-main thread anyway.
For starters, you have no idea what kind of stack do those threads
get.  They aren't our threads, so we don't know anything about them.
Using internal_catch_all would mean we'd need to run unwinders, and
there's no guarantee they could be run on those other threads.  For
example, they cannot safely access the state of the Lisp machine,
because they run on other threads asynchronically.

Besides, note_mouse_highlight is called from many places that don't
need any internal_catch_all.  If some caller cannot allow non-local
exits, it should call internal_catch_all by itself.  But that's
another tangent.

> > And I responded saying that there should be no "mouse stuck" because
> > both XI_Motion and XI_Leave will be queued and processed in the order
> > they arrived.  What does this have to do with the safety of calling
> > clear_mouse_face, or lack thereof?
> >
> > I'm saying that there's no need to process these events immediately,
> > when the event calls into the Lisp machine.  We can, and in many cases
> > already do, queue the event and process it a bit later, and in a
> > different, safer, context.
> 
> But then as a result the mouse face will not be dismissed when entering
> a menu.

Again, I don't understand: we don't display mouse-face on menus, only
on buffer text, on the mode line, and on the tool bar (if we implement
it in Emacs, not use the toolkit's tool bar).

> And we will also lose the valuable development feature of using
> mouse-high to determine how wedged a stuck Emacs really is.

Now you lost me.  What is that feature, and how is it related?

> > But it's already "in keyboard.c"!  See the part that begins with
> >
> >   /* Try generating a mouse motion event.  */
> >   else if (some_mouse_moved ())
> >
> > which ends with
> >
> >       if (!NILP (x) && NILP (obj))
> >     obj = make_lispy_movement (f, bar_window, part, x, y, t);
> >
> > Then we process these "lispy movements" as input.
> 
> That isn't related to mouse-highlight, and only happens when track-mouse
> is non-nil.

Those are details.  My point is that we do process _some_ mouse events
in the main loop, and it works.

> > Not normally, no.  But if some Lisp program sets bad text properties
> > or overlays, it could signal.  We can never assume in Emacs that some
> > Lisp machine code never signals.
> 
> Then the simple solution would be to catch those signals inside
> note_mouse_highlight.  What could be the problem with that?

It's wrong, that's the problem.

And again, this is a tangent.  The main issue is that this code cannot
be called from a non-main thread, period.  Let's focus on that and
solve those problems first.  You made me very worried by telling we do
this nonchalantly and for a long time.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/18
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/19
  - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/19
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii <=
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/20
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/21
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/21
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/22
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/22

Prev by Date: Re: [NonGNU ELPA] New package: Denote-Refs
Next by Date: Re: master 09b5f00613: ; Fix calls to treesit functions
Previous by thread: Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Next by thread: Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Index(es):
- Date
- Thread