Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal

From:	Po Lu
Subject:	Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Date:	Wed, 21 Dec 2022 09:01:01 +0800
User-agent:	Gnus/5.13 (Gnus v5.13)

Eli Zaretskii <eliz@gnu.org> writes:

> That is definitely incorrect, not in the general form you are
> presenting it.  unblock_input processes input events, and those can
> legitimately signal errors.  Other places in C call other functions
> that can signal.  C code which cannot allow control to escape it via
> non-local exits must use the record_unwind_protect machinery to set up
> suitable unwinders.
>
> I very much hope that all your work on new X features and in
> particular on the new input events and XInput2 was not based on the
> above wrong assumption -- that unblock_input can never signal (or
> should never signal).
>
> But we moved far away from the original issue, and are in fact
> discussing a very far (and much less important, from my POV) tangent.
> Let's not lose our focus, okay?  The original issue was whether we can
> have code which calls handle_one_xevent and other similar functions,
> which can call unblock_input and thus can signal an error, from
> threads other than the main thread, or from callbacks we install that
> are then called by toolkits from a non-main thread.  By contrast, the
> past several messages were discussing whether unblock_input is allowed
> to signal even if called from the main thread.  And that is an
> entirely different issue.
>
> It is a different issue because my point is simple: calls to
> handle_one_xevent etc. from non-main threads _cannot_ be made safe in
> Emacs, no matter what you do.  So it's an absolute no-no.  By
> contrast, if we call unblock_input from the main thread, and the code
> which calls it doesn't expect signals, all we need is either to
> restructure the caller, or use record_unwind_protect.  Thus, even if
> you are absolutely right about all of the cases you show below -- and
> I'm not convinced you are right in all of them, but even if you are --
> even if all of them are indeed unsafe in the face of Lisp errors, all
> it means that we must examine them one by one and make them safe by
> one of the above mentioned techniques.
>
> So I would like to return to the main issue: calls to
> handle_one_xevent from non-main threads.  That must not happen in
> Emacs, or else the corresponding build will be fragile and can crash
> given the opportune conditions.  If you agree with that, let's discuss
> how to make those configurations safer.  If you don't agree, please
> explain why.
>
>> Being called from the Lisp thread does not make it safe to signal!
>
> Not by itself, it doesn't.  But it definitely makes it fixable, so
> that it _can_ be safe.  By contrast, calling the Lisp machine from
> another thread _cannot_ be fixed.  It's a time bomb that will
> definitely go off.
>
>> If the last unblock_input signals, match (allocated by FcFontMatch)
>> leaks.
>
> Yes, code which doesn't expect errors might leak descriptors or memory
> or other resources.  But resource leaks are very rarely fatal, unless
> you leak a lot of them: modern machines have enough memory and file
> descriptors to let us leak them for many days before disaster
> strikes.  By contrast, a single error signaled from the wrong thread
> will immediately crash Emacs because it longjmp's to the wrong stack!
>
>> >> > You cannot request that.  note_mouse_highlight examines properties,
>> >> > and that can always signal, because properties are Lisp creatures and
>> >> > can invoke Lisp.
>> >> 
>> >> Could you please explain how?
>> >
>> > Which part is unclear?
>> 
>> Which properties involve Lisp execution, and why the evaluation of that
>> Lisp cannot be put inside internal_catch_all.
>
> You've just seen that: we examine text properties and overlays in a
> buffer whose restriction could have changed behind our backs, what
> with today's proliferation of calls into Lisp from very deep and
> low-level functions.  We call Lisp in keyboard.c and in xdisp.c and in
> insdel.c and in window.c.
>
>> > These calls will all have to go, then.  Maybe we should use some
>> > simplified, stripped down version of handle_one_xevent, which doesn't
>> > invoke processing that can access Lisp data and the global state.  Why
>> > would a toolkit menu widget need to call note_mouse_highlight, which
>> > is _our_ function that implements _our_ features, about which GTK (or
>> > any other toolkit) knows absolutely nothing??
>> 
>> The toolkit menu mostly demands that Emacs set the cursor to the
>> appropriate cursor for whatever the pointer is under.
>
> I cannot parse this: "set the cursor to the appropriate cursor"?
>
>> The X server can also demand Emacs expose parts of the frame at any
>> time, due to window configuration changes.
>
> We already handle this, by setting a flag in the SIGIO handler, and
> then calling expose_frame when it is safe.  What are the problems you
> see here if we queue these events instead of processing them
> immediately?
>
>> > If it's impossible to do safely, it will not work.  It already doesn't
>> > work in popup menus on w32, for similar reasons.  That's a minor
>> > inconvenience to users, and a much smaller catastrophe than making
>> > these unsafe calls.
>> 
>> But it worked since the beginning of 2006 or so, when xmenu.c was made
>> to call handle_one_xevent.
>
> "Worked" in the sense that we maybe didn't hear about problems, or
> heard too few of them?  That doesn't mean the problems don't exist,
> just that people don't come immediately running here and complain.
>
> Besides, PGTK is very young, definitely not around since 2006.  Give
> it some time.
>
>> Anyway, the "minor inconvenience" can be avoided by simply wrapping
>> note_mouse_highlight in internal_catch_all.  Why do you think that
>> would be too harsh?
>
> You cannot use that machinery reliably in a non-main thread anyway.
> For starters, you have no idea what kind of stack do those threads
> get.  They aren't our threads, so we don't know anything about them.
> Using internal_catch_all would mean we'd need to run unwinders, and
> there's no guarantee they could be run on those other threads.  For
> example, they cannot safely access the state of the Lisp machine,
> because they run on other threads asynchronically.
>
> Besides, note_mouse_highlight is called from many places that don't
> need any internal_catch_all.  If some caller cannot allow non-local
> exits, it should call internal_catch_all by itself.  But that's
> another tangent.
>
>> > And I responded saying that there should be no "mouse stuck" because
>> > both XI_Motion and XI_Leave will be queued and processed in the order
>> > they arrived.  What does this have to do with the safety of calling
>> > clear_mouse_face, or lack thereof?
>> >
>> > I'm saying that there's no need to process these events immediately,
>> > when the event calls into the Lisp machine.  We can, and in many cases
>> > already do, queue the event and process it a bit later, and in a
>> > different, safer, context.
>> 
>> But then as a result the mouse face will not be dismissed when entering
>> a menu.
>
> Again, I don't understand: we don't display mouse-face on menus, only
> on buffer text, on the mode line, and on the tool bar (if we implement
> it in Emacs, not use the toolkit's tool bar).
>
>> And we will also lose the valuable development feature of using
>> mouse-high to determine how wedged a stuck Emacs really is.
>
> Now you lost me.  What is that feature, and how is it related?
>
>> > But it's already "in keyboard.c"!  See the part that begins with
>> >
>> >   /* Try generating a mouse motion event.  */
>> >   else if (some_mouse_moved ())
>> >
>> > which ends with
>> >
>> >       if (!NILP (x) && NILP (obj))
>> >    obj = make_lispy_movement (f, bar_window, part, x, y, t);
>> >
>> > Then we process these "lispy movements" as input.
>> 
>> That isn't related to mouse-highlight, and only happens when track-mouse
>> is non-nil.
>
> Those are details.  My point is that we do process _some_ mouse events
> in the main loop, and it works.
>
>> > Not normally, no.  But if some Lisp program sets bad text properties
>> > or overlays, it could signal.  We can never assume in Emacs that some
>> > Lisp machine code never signals.
>> 
>> Then the simple solution would be to catch those signals inside
>> note_mouse_highlight.  What could be the problem with that?
>
> It's wrong, that's the problem.
>
> And again, this is a tangent.  The main issue is that this code cannot
> be called from a non-main thread, period.  Let's focus on that and
> solve those problems first.  You made me very worried by telling we do
> this nonchalantly and for a long time.

There is some kind of misunderstanding here.  You seem to be talking
about calling Lisp from another thread, which is a big no-no in my book
as well.  However, the problem is nowhere near as drastic.

Under the GTK builds, there is only a single thread.  The event loop
runs from the main thread.  Those calls to note_mouse_highlight *are*
being done from the main thread.  The problem is that it is unsafe to
signal in the main thread when handle_one_xevent is being called by
GLib, because GLib does the equivalent of this:

  static bool inside_callback;

  assert (!inside_callback);
  inside_callback = true;
  [call handle_one_xevent]
  inside_callback = false;

If handle_one_xevent signals, then inside_callback will never be set to
false.  As a result, the next time GLib enters its own event dispatch
code, it will abort, leading to the crash seen here.

No second thread is involved!

[Prev in Thread]

Current Thread

[Next in Thread]

Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/18
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/19
  - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/19
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/20
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu <=
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/21
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/21
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/22
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/22

Prev by Date: Re: master 09b5f00613: ; Fix calls to treesit functions
Next by Date: Re: scratch/comp-static-data ec88bbd1bf 7/9: Correctly build builtin syms string while hashing abi.
Previous by thread: Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Next by thread: Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Index(es):
- Date
- Thread