Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal

From:	Po Lu
Subject:	Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Date:	Tue, 20 Dec 2022 09:39:13 +0800
User-agent:	Gnus/5.13 (Gnus v5.13)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Po Lu <luangruo@yahoo.com>
>> Cc: emacs-devel@gnu.org
>> Date: Mon, 19 Dec 2022 09:56:40 +0800
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>> > But AFAIK it is _always_ safe for unblock_input to signal.  When do
>> > you think it isn't?
>> 
>> In almost all of the helper functions where it is called:
>> tear_down_x_back_buffer, set_up_x_back_buffer, et cetera.
>
> I don't think I understand why.  Let's take tear_down_x_back_buffer,
> for example: why isn't it safe to call unblock_input from it?
> tear_down_x_back_buffer is called:
>
>   . from x_set_inhibit_double_buffering, which is a frame parameter
>     handler, so it is only called from the Lisp thread
>   . from x_free_frame_resources, which AFAIU is only called from the
>     Lisp thread, when we delete a frame
>
> So what is the problem here? what am I missing?

If unblock_input signals inside, then the double buffering state could
be left inconsistent.  For example, the back buffer picture could be
freed, but the back buffer itself may be left present, which can lead to
crashes later on.  That it is called from the only thread present does
not make it safe.

More generally, C code does not expect unblock_input to signal!  See
this piece of call_process in callproc.c:

        {
          if (!specpdl_ref_valid_p (tempfile_index))
            record_deleted_pid (pid, Qnil);
          else
            {
              eassert (1 < nargs);
              record_deleted_pid (pid, args[1]);
              clear_unwind_protect (tempfile_index);
            }
          synch_process_pid = 0;
        }
    }

  unblock_child_signal (&oldset);
  unblock_input (); <=======

  if (pid < 0)
    report_file_errno (CHILD_SETUP_ERROR_DESC, Qnil, child_errno);

  /* Close our file descriptors, except for callproc_fd[CALLPROC_PIPEREAD]
     since we will use that to read input from.  */
  for (i = 0; i < CALLPROC_FDS; i++)
    if (i != CALLPROC_PIPEREAD && 0 <= callproc_fd[i])
      {
        emacs_close (callproc_fd[i]);
        callproc_fd[i] = -1;
      }
  emacs_close (filefd);
  clear_unwind_protect (specpdl_ref_add (count, -1));

If unblock_input signals, filefd will not be closed.

Or this piece of x_create_bitmap_mask in image.c:

  pixmap = image_bitmap_pixmap (f, id);
  width = x_bitmap_width (f, id);
  height = x_bitmap_height (f, id);

  block_input ();
  ximg = XGetImage (FRAME_X_DISPLAY (f), pixmap, 0, 0, width, height,
                    ~0, ZPixmap);

  if (!ximg)
    {
      unblock_input ();
      return;
    }

  result = x_create_x_image_and_pixmap (f, width, height, 1, &mask_img, &mask);

  unblock_input (); <======================
  if (!result)
    {
      XDestroyImage (ximg);
      return;
    }

If unblock_input signals, then ximg (which is potentially a lot of data)
leaks.

> But unblock_input is only called from the Lisp thread, at least in the
> w32 build.  And it should only be called from the Lisp thread.  It
> cannot be safely called from any other thread.

Being called from the Lisp thread does not make it safe to signal!
Consider the following example code:

  int fd;
  char *buffer;

  fd = emacs_open (...);

  if (fd == -1)
    return;

  buffer = xmalloc (100);

  block_input ();
  do_something_with_buffer_and_fd (fd, buffer);
  unblock_input ();

  /* more activity.  */
  close (fd);
  xfree (buffer);

Notice how fd and bufferare not closed if unblock_input signals.  Now
look at ftcrfont_open:

  block_input ();

  pat = ftfont_entity_pattern (entity, pixel_size);
  FcConfigSubstitute (NULL, pat, FcMatchPattern);
  FcDefaultSubstitute (pat);
  match = FcFontMatch (NULL, pat, &result);
  ftfont_fix_match (pat, match);

  FcPatternDestroy (pat);
  font_face = cairo_ft_font_face_create_for_pattern (match);
  if (!font_face
      || cairo_font_face_status (font_face) != CAIRO_STATUS_SUCCESS)
    {
      unblock_input ();
      FcPatternDestroy (match);
      return Qnil;
    }
  cairo_matrix_t font_matrix, ctm;
  cairo_matrix_init_scale (&font_matrix, pixel_size, pixel_size);
  cairo_matrix_init_identity (&ctm);

#ifdef HAVE_PGTK
  cairo_font_options_t *options = xsettings_get_font_options ();
#else
  cairo_font_options_t *options = cairo_font_options_create ();
#endif
#ifdef USE_BE_CAIRO
  if (be_use_subpixel_antialiasing ())
    cairo_font_options_set_antialias (options, CAIRO_ANTIALIAS_SUBPIXEL);
#endif
  cairo_scaled_font_t *scaled_font
    = cairo_scaled_font_create (font_face, &font_matrix, &ctm, options);
  cairo_font_face_destroy (font_face);
  cairo_font_options_destroy (options);
  unblock_input ();

If the last unblock_input signals, match (allocated by FcFontMatch)
leaks.

>> > You cannot request that.  note_mouse_highlight examines properties,
>> > and that can always signal, because properties are Lisp creatures and
>> > can invoke Lisp.
>> 
>> Could you please explain how?
>
> Which part is unclear?

Which properties involve Lisp execution, and why the evaluation of that
Lisp cannot be put inside internal_catch_all.

> These calls will all have to go, then.  Maybe we should use some
> simplified, stripped down version of handle_one_xevent, which doesn't
> invoke processing that can access Lisp data and the global state.  Why
> would a toolkit menu widget need to call note_mouse_highlight, which
> is _our_ function that implements _our_ features, about which GTK (or
> any other toolkit) knows absolutely nothing??

The toolkit menu mostly demands that Emacs set the cursor to the
appropriate cursor for whatever the pointer is under.  The X server can
also demand Emacs expose parts of the frame at any time, due to window
configuration changes.

> If it's impossible to do safely, it will not work.  It already doesn't
> work in popup menus on w32, for similar reasons.  That's a minor
> inconvenience to users, and a much smaller catastrophe than making
> these unsafe calls.

But it worked since the beginning of 2006 or so, when xmenu.c was made
to call handle_one_xevent.  Anyway, the "minor inconvenience" can be
avoided by simply wrapping note_mouse_highlight in internal_catch_all.
Why do you think that would be too harsh?

> I don't see how this is related, sorry.  You asked:
>
>> For example, if an XI_Motion event arrives and is queued, and
>> then a subsequent XI_Leave event arrives before that event has a chance
>> to be processed ``in our own safe context'', note_mouse_highlight will
>> be called after the mouse has left the frame, leading to stuck mouse
>> highlight.
>
> And I responded saying that there should be no "mouse stuck" because
> both XI_Motion and XI_Leave will be queued and processed in the order
> they arrived.  What does this have to do with the safety of calling
> clear_mouse_face, or lack thereof?
>
> I'm saying that there's no need to process these events immediately,
> when the event calls into the Lisp machine.  We can, and in many cases
> already do, queue the event and process it a bit later, and in a
> different, safer, context.

But then as a result the mouse face will not be dismissed when entering
a menu.  And we will also lose the valuable development feature of using
mouse-high to determine how wedged a stuck Emacs really is.

> But it's already "in keyboard.c"!  See the part that begins with
>
>   /* Try generating a mouse motion event.  */
>   else if (some_mouse_moved ())
>
> which ends with
>
>       if (!NILP (x) && NILP (obj))
>       obj = make_lispy_movement (f, bar_window, part, x, y, t);
>
> Then we process these "lispy movements" as input.

That isn't related to mouse-highlight, and only happens when track-mouse
is non-nil.  track-mouse is nil because it requires calling
XTmouse_position on all mouse movement, which is a very expensive
operation (with 15 km between me and the X server, it takes 90
milliseconds!)

> Not normally, no.  But if some Lisp program sets bad text properties
> or overlays, it could signal.  We can never assume in Emacs that some
> Lisp machine code never signals.

Then the simple solution would be to catch those signals inside
note_mouse_highlight.  What could be the problem with that?  We could
even log the signals to *Messages*, like with-demoted-errors.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/18
- Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/19
  - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu <=
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/20
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/20
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/21
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/21
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Eli Zaretskii, 2022/12/22
    - Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal, Po Lu, 2022/12/22

Prev by Date: Re: master 09b5f00613: ; Fix calls to treesit functions
Next by Date: Re: Adding the `prescient` packages to NonGNU ELPA?
Previous by thread: Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Next by thread: Re: bug#60144: 30.0.50; PGTK Emacs crashes after signal
Index(es):
- Date
- Thread