[Bug ld/22727] New: [2.30, 2.31 regression] Thousands of EH-related exec

From: ro at gcc dot gnu.org
Subject: [Bug ld/22727] New: [2.30, 2.31 regression] Thousands of EH-related execution failures on Solaris/SPARC
Date: Thu, 18 Jan 2018 08:26:48 +0000


            Bug ID: 22727
           Summary: [2.30, 2.31 regression] Thousands of EH-related
                    execution failures on Solaris/SPARC
           Product: binutils
           Version: 2.31 (HEAD)
            Status: NEW
          Severity: normal
          Priority: P2
         Component: ld
          Assignee: unassigned at sourceware dot org
          Reporter: ro at gcc dot gnu.org
                CC: hjl.tools at gmail dot com
  Target Milestone: 2.30
              Host: sparc-sun-solaris2.11
            Target: sparc-sun-solaris2.11
             Build: sparc-sun-solaris2.11

When trying the binutils 2.30 branch on Solaris 11/SPARC with gcc mainline, I
ca. 8500 testsuite regressions, most/all related to EH failures.  E.g.

FAIL: g++.dg/cpp0x/bad_array_new1.C  -std=c++11 execution test

When I replace the newly-built libstdc++.so.6 with the system one, the test
executes fine.

gdb is lead completely astray for this failure

Thread 2 received signal SIGILL, Illegal instruction.
[Switching to Thread 1 (LWP 1)]
0xff17383c in standard_subs ()
   from ../../../sparc-sun-solaris2.11/libstdc++-v3/src/.libs/libstdc++.so.6
(gdb) where
#0  0xff17383c in standard_subs ()
   from ../../../sparc-sun-solaris2.11/libstdc++-v3/src/.libs/libstdc++.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) display/i $pc
1: x/i $pc
=> 0xff17383c <standard_subs+4>:        ldq  [ %o5 + -1688 ], %f62

but dbx shows a clearer picture:

signal SEGV (no mapping at the fault address) in (unknown) at 0xff17380c
0xff17380c: npos+0x36110:       ld       [%i3 - 2924], %f31
Current function is __cxa_throw
   80     __cxa_eh_globals *globals = __cxa_get_globals ();
(dbx) where
  [1] 0xff17380c(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xff17380c 
=>[2] __cxa_throw(obj = 0x25888, tinfo = 0x20cd8, dest = 0x20eb8), line 80 in
  [3] __cxa_throw_bad_array_new_length(), line 42 in "eh_aux_runtime.cc"
  [4] f(0xffffffff, 0x1, 0x0, 0xfeed0200, 0x0, 0x0), at 0x10a48 
  [5] main(0x1, 0xffbfea14, 0xffbfea1c, 0x0, 0x0, 0x107c0), at 0x10a74

Comparing that call to __cxa_get_globals in the new (gld 2.30.0-linked)

   0xff0722f8 <__cxxabiv1::__cxa_get_globals()>:        save  %sp, -96, %sp
   0xff0722fc <__cxxabiv1::__cxa_get_globals()+4>:      sethi  %hi(0), %g1
   0xff072300 <__cxxabiv1::__cxa_get_globals()+8>:      add  %g1, 4, %g1    !
   0xff072304 <__cxxabiv1::__cxa_get_globals()+12>:     sethi  %hi(0x105c00),
   0xff072308 <__cxxabiv1::__cxa_get_globals()+16>:     call  0xff06f4a0
   0xff07230c <__cxxabiv1::__cxa_get_globals()+20>:     add  %l7, 0xf8, %l7 !
   0xff072310 <__cxxabiv1::__cxa_get_globals()+24>:     sethi  %hi(0), %i0
   0xff072314 <__cxxabiv1::__cxa_get_globals()+28>:     add  %l7, %g1, %o0
   0xff072318 <__cxxabiv1::__cxa_get_globals()+32>:     call  0xff17380c
   0xff07231c <__cxxabiv1::__cxa_get_globals()+36>:     xor  %i0, 0, %i0
   0xff072320 <__cxxabiv1::__cxa_get_globals()+40>:     add  %o0, %i0, %i0
   0xff072324 <__cxxabiv1::__cxa_get_globals()+44>:     rett  %i7 + 8
   0xff072328 <__cxxabiv1::__cxa_get_globals()+48>:     nop 

with a gld 2.29-linked libstdc++.so.6:

   0xff072240 <__cxxabiv1::__cxa_get_globals()>:        save  %sp, -96, %sp
   0xff072244 <__cxxabiv1::__cxa_get_globals()+4>:      sethi  %hi(0), %g1
   0xff072248 <__cxxabiv1::__cxa_get_globals()+8>:      add  %g1, 4, %g1    !
   0xff07224c <__cxxabiv1::__cxa_get_globals()+12>:     sethi  %hi(0x105c00),
   0xff072250 <__cxxabiv1::__cxa_get_globals()+16>:     call  0xff06f3e8
   0xff072254 <__cxxabiv1::__cxa_get_globals()+20>:     add  %l7, 0x1b0, %l7   
    ! 0x105db0
   0xff072258 <__cxxabiv1::__cxa_get_globals()+24>:     sethi  %hi(0), %i0
   0xff07225c <__cxxabiv1::__cxa_get_globals()+28>:     add  %l7, %g1, %o0
   0xff072260 <__cxxabiv1::__cxa_get_globals()+32>:     call  0xff17a1c0
   0xff072264 <__cxxabiv1::__cxa_get_globals()+36>:     xor  %i0, 0, %i0
   0xff072268 <__cxxabiv1::__cxa_get_globals()+40>:     add  %o0, %i0, %i0
   0xff07226c <__cxxabiv1::__cxa_get_globals()+44>:     rett  %i7 + 8
   0xff072270 <__cxxabiv1::__cxa_get_globals()+48>:     nop 

shows that what used to be a call to __tls_get_addr is now a call to some
random address, causing the SEGV/ILL.

Disassembling that function gives

Disassembly of section .text.__cxa_get_globals:

00000000 <__cxa_get_globals>:
   0:   9d e3 bf a0     save  %sp, -96, %sp
   4:   03 00 00 00     sethi  %hi(0), %g1
   8:   82 00 60 00     add  %g1, 0, %g1        ! 0 <__cxa_get_globals>
   c:   2f 00 00 00     sethi  %hi(0), %l7
  10:   40 00 00 00     call  10 <__cxa_get_globals+0x10>
  14:   ae 05 e0 00     add  %l7, 0, %l7        ! 0 <__cxa_get_globals>
  18:   31 00 00 00     sethi  %hi(0), %i0
  1c:   90 05 c0 01     add  %l7, %g1, %o0
  20:   40 00 00 00     call  20 <__cxa_get_globals+0x20>
  24:   b0 1e 20 00     xor  %i0, 0, %i0
  28:   b0 02 00 18     add  %o0, %i0, %i0
  2c:   81 cf e0 08     return  %i7 + 8
  30:   01 00 00 00     nop 

with a reloc applied to the __cxa_get_globals call:

Relocation Section:  .rela.text.__cxa_get_globals
  index  type                   offset value     addend  section / symbol
    [7]  R_SPARC_TLS_LDM_CALL     0x20     0          0 
.text.__cxa_get_globals (anonymous namespace)::get_global()::global

A reghunt identified this patch as the culprit:

The first bad revision is:
changeset:   92065:a2a79207e4f5
user:        H.J. Lu <address@hidden>
date:        Mon Oct 16 03:49:54 2017 -0700
summary:     ELF: Call check_relocs after opening all inputs


