bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#57789: Emacs 28.1 clone build with native compilation crashes on s39


From: Eli Zaretskii
Subject: bug#57789: Emacs 28.1 clone build with native compilation crashes on s390x
Date: Thu, 15 Sep 2022 10:10:59 +0300

> From: Rob Browning <rlb@defaultvalue.org>
> Cc: 57789@debbugs.gnu.org
> Date: Wed, 14 Sep 2022 15:19:24 -0500
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Please run the crashing command under GDB, and when it segfaults,
> > produce the C-level and Lisp-level backtrace, and post them here.
> 
> Starting from scratch with the emacs-28.1 commit I can reproduce the
> failure when building via
> 
>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation
> 
> It crashes with the same segfault repeatably, i.e. if you run make
> again, it crashes again on the previously mentioned "... -l comp -f
> batch-byte+native-compile international/titdic-cnv.el" invocation.  That
> crash output is attached below.
> 
> After adjusting the Makefile.in invocation so I could run it with gdb in
> exactly the same environment once it's failing on that command, I
> captured the backtrace and included it below.

Thanks.  The backtrace indicates that the crash is in GC.  This
probably means we have some fundamental problem on that architecture.
Andrea, any advice for how to investigate?

Does the build of the same code with the same options sans
"--with-native-compilation" succeed, or does it also crash with
similar symptoms?  If the build without native-compilation succeeds,
my first question would be how mature and stable is libgccjit on that
platform?  Perhaps take this up with the GCC's libgccjit developers.

> With respect to the Lisp-level backtrace, I imagined you probably meant
> an xbacktrace?  If so (and assuming I'm guessing right about how I
> should do that), I haven't figured out how to arrange sourcing the
> src/.gdbinit from the src/Makefile.in command.

You can source it manually from the GDB prompt, when the segfault
happens, and then invoke xbacktrace manually, can't you?

> It looked like it might be because there were no debug symbols, so I
> tried adding a CFLAGS=-g3 to the end of the ./configure, but that caused
> the crash to disappear entirely.

Too bad, it means we have a heisenbug on our hands, which will make it
even harder to debug (as if debugging crashes in GC were not hard
enough already).

What happens if you modify this variable:

  (defcustom native-comp-debug (if (eq 'windows-nt system-type) 1 0)

to have the value 1 or even zero, and then rebuild from scratch? does
the build succeed then?

> Finally (and this was just a random guess based on previous experiences,
> particularly with programs like guile that play (normal, traditional)
> tricks with pointers/coercions/etc.) I noticed that emacs doesn't
> specify -fno-strict-aliasing, and unless all the C code has been written
> with that in mind, I assume that might open a window allowing the
> optimizer to introduce undesirable changes.  So I added a
> CFLAGS=-fno-strict-aliasing to the end of the ./configure command, and
> then the build and tests worked fine (twice in a row):
> 
>   ./configure --prefix=/home/rlb/opt/emacs-tmp --with-native-compilation \
>     CFLAGS=-fno-strict-aliasing
> 
> Of course that's not remotely conclusive, but if all of the C code
> wasn't written with strict-aliasing in mind, then I wondered if it might
> make sense to consider adding -fno-strict-aliasing as a default option.

I don't know enough about this.  Perhaps Andrea or Paul could comment.

> Also, even if that ends up being desirable, I'm not sure it'll be
> sufficient.  That is, I suspect I might want to run the full build/check
> with -fno-strict-aliasing in a loop for a bit to make sure the clean
> build/check is reliable, since I think I may have seen some test crashes
> (not the build crash) on one earlier run with that option, but I'm not
> sure that was a clean attempt.

Yes, running the full test suite would be the logical next step.

> Program received signal SIGSEGV, Segmentation fault.
> mark_object (arg=<optimized out>) at alloc.c:6809
> 6809            if (symbol_marked_p (ptr))
> (gdb) backtrace
> #0  mark_object (arg=<optimized out>) at alloc.c:6809

Any idea what cause SIGSEGV here?  Was 'ptr' an invalid pointer for
some reason, and if so, what exactly makes it invalid?

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]