bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12471: Avoid some signal-handling races, and simplify.


From: Eli Zaretskii
Subject: bug#12471: Avoid some signal-handling races, and simplify.
Date: Thu, 20 Sep 2012 20:44:51 +0300

> Date: Wed, 19 Sep 2012 23:07:21 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: bug-gnu-emacs@gnu.org, lekktu@gmail.com
> 
> On 09/19/2012 09:18 AM, Eli Zaretskii wrote:
> 
> > Why is it a good idea to replace this macro with a function?
> 
> The function makes it easier to control access to volatile variables,
> that is, to access them in a minimal number of times to lessen the
> number of race conditions.   This requires a local variable.

I don't understand.  The only volatile variable that is involved is
interrupt_input_blocked, which is a global variable.  How does a
function help in this regard?

> >   -  kill (getpid (), fatal_error_code);
> >   +  raise (fatal_error_code);
> >
> > If there are good reasons to prefer 'raise' to 'kill' in this context,
> 
> On POSIX hosts, the change is helpful because it fixes some
> race conditions.  raise (SIGABRT) is equivalent to
> pthread_kill (pthread_self (), SIGABRT), and we do want the
> signal to be delivered to the main thread rather than to
> some other thread (which 'kill' might do).

In that case, can we use 'pthread_kill' as you show above, instead of
'raise'?  I'd like to avoid redefining 'raise' on Windows; see below.

> > can we please special-case SIGABRT here and call 'emacs_abort'
> > directly, at least for hosts that cannot produce the backtrace?
> > Otherwise, assertion violations will not end up in 'emacs_abort',
> > AFAICS.
> 
> Sorry, I'm not following.  By "assertion violations" do you
> mean eassert failure?  If so, currently:
> 
>   eassert fails.
>   'die' prints "Emacs fatal error" with some details.
>   fatal_error_backtrace resets SIGABRT to SIG_DFL
>   fatal_error_backtrace outputs a backtrace if available
>   fatal_error_backtrace unblocks SIGABRT
>   fatal_error_backtrace uses 'kill' with SIGABRT to terminate Emacs
> 
> so emacs_abort is never called.  Under the proposed patch, the
> actions do basically the same thing:
> 
>   eassert fails
>   'die' resets SIGABRT to SIG_DFL
>   'die' prints "Emacs fatal error" with some details.
>   terminate_due_to_signal resets SIGABRT to SIG_DFL
>   terminate_due_to_signal outputs a backtrace if available
>   terminate_due_to_signal unblocks SIGABRT
>   terminate_due_to_signal uses 'raise' with SIGABRT to terminate Emacs
> 
> emacs_abort is not called in either the old code or the new.

It does, on Windows.  (I thought other platforms shared that, but
perhaps I was wrong.)  On Windows, 'kill' is redefined to call
'emacs_abort', when its first arg is Emacs's own PID and the second
arg is SIGABRT.

(Originally, 'die' called 'abort', which on Windows was redirected to
'w32_abort'.  Then calls to 'abort' were changed to call
'emacs_abort', and then 'w32_abort' was renamed to 'emacs_abort'.  But
then, when you made 'die' call fatal_error_backtrace, that made Emacs
on Windows exit silently upon assertion violations.  So I changed
'sys_kill' to call 'emacs_abort' on Windows.)

If you leave 'raise' there, it will invoke 'abort' from the system
library on Windows, which is not what we want.  And I don't want to
redefine yet another library function, if it can be avoided.

Incidentally, I think this feature of producing a backtrace on a fatal
error will be hated by both users and Emacs maintainers, but that's
another subject for another thread.

> >   -  handle_on_main_thread (sig, handle_interrupt_signal);
> >   +  deliver_process_signal (sig, handle_interrupt_signal);
> >    }
> >
> > Do we really need this double indirection?
> 
> Yes, because Gnome/gtk+ threads shouldn't execute handle_interrupt_signal
> at the same time that the Emacs main thread is running.  That signal
> handler doesn't do an atomic action, and if its actions are done in
> parallel with the Emacs main thread, bad things could happen.
> There's commentary to this effect in deliver_process_signal.

We are miscommunicating.  I meant why we need a bunch of functions,
called deliver_FOO_signal, whose body is a single line that calls
handle_on_main_thread (now deliver_process_signal, which is even a
more confusing name, IMO -- the previous one at least hinted at what
it did).  Then handle_on_main_thread does some thread-related magic
and/or calls the _real_ handler for the signal.  Thus "double
indirection": instead of calling the handler, we call a function that
calls another function, which calls the real handler.  (And btw, some
of these handlers are also one-liners: they call
fatal_error_backtrace, now renamed to terminate_due_to_signal, yet
another function.  Can you say "triple indirection"?)

This makes the code hard to understand, especially since these
functions are spread all over in several different source files.

Can we do better than that?  Ideally, there should be a single
function, the signal handler, which does everything.  Then a given
signal will have just two places to look up: the place where you call
sigaction to set up the handler, and another one where the handler is
implemented.

As for the FORWARD_SIGNAL_TO_MAIN_THREAD stuff, you could make each
signal handler invoke a function that does just this part.

Or maybe it would work to set up things so that no non-main threads
ever get any signals -- is there some pthread_* function that can
setup a callback to run each time a thread is created? if so, perhaps
we could use such a callback to block signals in each thread.

If any of that can be done, I think the signal-related code will
really become much easier to study, understand, and maintain.

> > Why is IEEE_FLOATING_POINT, which detects the representation of FP
> > numbers, related to SIGFPE?
> 
> Most hosts nowadays use IEEE floating point, so they do not trap on
> exceptions and don't need an exception handler.

Well, SIGFPE could happen for reasons entirely unrelated to math
calculations and functions.  A hardware problem, for example.

> The exception handler is problematic (it longjmps out of a signal
> handler, which leads to some real problems) and should be omitted
> unless it's really needed.

Given that SIGFPE will not happen due to calculations, how about
changing its handler to call fatal_error_backtrace, rather than signal
a Lisp-level error?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]