[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
From: |
Po Lu |
Subject: |
Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal |
Date: |
Wed, 21 Dec 2022 09:01:01 +0800 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Eli Zaretskii <eliz@gnu.org> writes:
> That is definitely incorrect, not in the general form you are
> presenting it. unblock_input processes input events, and those can
> legitimately signal errors. Other places in C call other functions
> that can signal. C code which cannot allow control to escape it via
> non-local exits must use the record_unwind_protect machinery to set up
> suitable unwinders.
>
> I very much hope that all your work on new X features and in
> particular on the new input events and XInput2 was not based on the
> above wrong assumption -- that unblock_input can never signal (or
> should never signal).
>
> But we moved far away from the original issue, and are in fact
> discussing a very far (and much less important, from my POV) tangent.
> Let's not lose our focus, okay? The original issue was whether we can
> have code which calls handle_one_xevent and other similar functions,
> which can call unblock_input and thus can signal an error, from
> threads other than the main thread, or from callbacks we install that
> are then called by toolkits from a non-main thread. By contrast, the
> past several messages were discussing whether unblock_input is allowed
> to signal even if called from the main thread. And that is an
> entirely different issue.
>
> It is a different issue because my point is simple: calls to
> handle_one_xevent etc. from non-main threads _cannot_ be made safe in
> Emacs, no matter what you do. So it's an absolute no-no. By
> contrast, if we call unblock_input from the main thread, and the code
> which calls it doesn't expect signals, all we need is either to
> restructure the caller, or use record_unwind_protect. Thus, even if
> you are absolutely right about all of the cases you show below -- and
> I'm not convinced you are right in all of them, but even if you are --
> even if all of them are indeed unsafe in the face of Lisp errors, all
> it means that we must examine them one by one and make them safe by
> one of the above mentioned techniques.
>
> So I would like to return to the main issue: calls to
> handle_one_xevent from non-main threads. That must not happen in
> Emacs, or else the corresponding build will be fragile and can crash
> given the opportune conditions. If you agree with that, let's discuss
> how to make those configurations safer. If you don't agree, please
> explain why.
>
>> Being called from the Lisp thread does not make it safe to signal!
>
> Not by itself, it doesn't. But it definitely makes it fixable, so
> that it _can_ be safe. By contrast, calling the Lisp machine from
> another thread _cannot_ be fixed. It's a time bomb that will
> definitely go off.
>
>> If the last unblock_input signals, match (allocated by FcFontMatch)
>> leaks.
>
> Yes, code which doesn't expect errors might leak descriptors or memory
> or other resources. But resource leaks are very rarely fatal, unless
> you leak a lot of them: modern machines have enough memory and file
> descriptors to let us leak them for many days before disaster
> strikes. By contrast, a single error signaled from the wrong thread
> will immediately crash Emacs because it longjmp's to the wrong stack!
>
>> >> > You cannot request that. note_mouse_highlight examines properties,
>> >> > and that can always signal, because properties are Lisp creatures and
>> >> > can invoke Lisp.
>> >>
>> >> Could you please explain how?
>> >
>> > Which part is unclear?
>>
>> Which properties involve Lisp execution, and why the evaluation of that
>> Lisp cannot be put inside internal_catch_all.
>
> You've just seen that: we examine text properties and overlays in a
> buffer whose restriction could have changed behind our backs, what
> with today's proliferation of calls into Lisp from very deep and
> low-level functions. We call Lisp in keyboard.c and in xdisp.c and in
> insdel.c and in window.c.
>
>> > These calls will all have to go, then. Maybe we should use some
>> > simplified, stripped down version of handle_one_xevent, which doesn't
>> > invoke processing that can access Lisp data and the global state. Why
>> > would a toolkit menu widget need to call note_mouse_highlight, which
>> > is _our_ function that implements _our_ features, about which GTK (or
>> > any other toolkit) knows absolutely nothing??
>>
>> The toolkit menu mostly demands that Emacs set the cursor to the
>> appropriate cursor for whatever the pointer is under.
>
> I cannot parse this: "set the cursor to the appropriate cursor"?
>
>> The X server can also demand Emacs expose parts of the frame at any
>> time, due to window configuration changes.
>
> We already handle this, by setting a flag in the SIGIO handler, and
> then calling expose_frame when it is safe. What are the problems you
> see here if we queue these events instead of processing them
> immediately?
>
>> > If it's impossible to do safely, it will not work. It already doesn't
>> > work in popup menus on w32, for similar reasons. That's a minor
>> > inconvenience to users, and a much smaller catastrophe than making
>> > these unsafe calls.
>>
>> But it worked since the beginning of 2006 or so, when xmenu.c was made
>> to call handle_one_xevent.
>
> "Worked" in the sense that we maybe didn't hear about problems, or
> heard too few of them? That doesn't mean the problems don't exist,
> just that people don't come immediately running here and complain.
>
> Besides, PGTK is very young, definitely not around since 2006. Give
> it some time.
>
>> Anyway, the "minor inconvenience" can be avoided by simply wrapping
>> note_mouse_highlight in internal_catch_all. Why do you think that
>> would be too harsh?
>
> You cannot use that machinery reliably in a non-main thread anyway.
> For starters, you have no idea what kind of stack do those threads
> get. They aren't our threads, so we don't know anything about them.
> Using internal_catch_all would mean we'd need to run unwinders, and
> there's no guarantee they could be run on those other threads. For
> example, they cannot safely access the state of the Lisp machine,
> because they run on other threads asynchronically.
>
> Besides, note_mouse_highlight is called from many places that don't
> need any internal_catch_all. If some caller cannot allow non-local
> exits, it should call internal_catch_all by itself. But that's
> another tangent.
>
>> > And I responded saying that there should be no "mouse stuck" because
>> > both XI_Motion and XI_Leave will be queued and processed in the order
>> > they arrived. What does this have to do with the safety of calling
>> > clear_mouse_face, or lack thereof?
>> >
>> > I'm saying that there's no need to process these events immediately,
>> > when the event calls into the Lisp machine. We can, and in many cases
>> > already do, queue the event and process it a bit later, and in a
>> > different, safer, context.
>>
>> But then as a result the mouse face will not be dismissed when entering
>> a menu.
>
> Again, I don't understand: we don't display mouse-face on menus, only
> on buffer text, on the mode line, and on the tool bar (if we implement
> it in Emacs, not use the toolkit's tool bar).
>
>> And we will also lose the valuable development feature of using
>> mouse-high to determine how wedged a stuck Emacs really is.
>
> Now you lost me. What is that feature, and how is it related?
>
>> > But it's already "in keyboard.c"! See the part that begins with
>> >
>> > /* Try generating a mouse motion event. */
>> > else if (some_mouse_moved ())
>> >
>> > which ends with
>> >
>> > if (!NILP (x) && NILP (obj))
>> > obj = make_lispy_movement (f, bar_window, part, x, y, t);
>> >
>> > Then we process these "lispy movements" as input.
>>
>> That isn't related to mouse-highlight, and only happens when track-mouse
>> is non-nil.
>
> Those are details. My point is that we do process _some_ mouse events
> in the main loop, and it works.
>
>> > Not normally, no. But if some Lisp program sets bad text properties
>> > or overlays, it could signal. We can never assume in Emacs that some
>> > Lisp machine code never signals.
>>
>> Then the simple solution would be to catch those signals inside
>> note_mouse_highlight. What could be the problem with that?
>
> It's wrong, that's the problem.
>
> And again, this is a tangent. The main issue is that this code cannot
> be called from a non-main thread, period. Let's focus on that and
> solve those problems first. You made me very worried by telling we do
> this nonchalantly and for a long time.
There is some kind of misunderstanding here. You seem to be talking
about calling Lisp from another thread, which is a big no-no in my book
as well. However, the problem is nowhere near as drastic.
Under the GTK builds, there is only a single thread. The event loop
runs from the main thread. Those calls to note_mouse_highlight *are*
being done from the main thread. The problem is that it is unsafe to
signal in the main thread when handle_one_xevent is being called by
GLib, because GLib does the equivalent of this:
static bool inside_callback;
assert (!inside_callback);
inside_callback = true;
[call handle_one_xevent]
inside_callback = false;
If handle_one_xevent signals, then inside_callback will never be set to
false. As a result, the next time GLib enters its own event dispatch
code, it will abort, leading to the crash seen here.
No second thread is involved!
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/18
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/19
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/19
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/20
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal,
Po Lu <=
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/21
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/21
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/22
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/22