bug-binutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug binutils/31454] New: Add constant tracking to disassembly (objdump


From: jakub at redhat dot com
Subject: [Bug binutils/31454] New: Add constant tracking to disassembly (objdump -d, gdb disas)
Date: Thu, 07 Mar 2024 10:26:11 +0000

https://sourceware.org/bugzilla/show_bug.cgi?id=31454

            Bug ID: 31454
           Summary: Add constant tracking to disassembly (objdump -d, gdb
                    disas)
           Product: binutils
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: binutils
          Assignee: unassigned at sourceware dot org
          Reporter: jakub at redhat dot com
  Target Milestone: ---

Consider
unsigned foo (void) { return 0xdeadbeefU; }
unsigned long long bar (void) { return 0xdeadbeefcafebabeULL; }
static int p;
int *baz (void) { return &p; }
int main () {}
When linked on x86_64 with -O2 -fpic, objdump -d and gdb disassemble already
does some
immediate visualization to help user reading the code:
0000000000401140 <baz>:
  401140:       48 8d 05 d9 2e 00 00    lea    0x2ed9(%rip),%rax        #
404020 <__TMC_END__>
  401147:       c3                      ret    
or
Dump of assembler code for function baz:
   0x0000000000401140 <+0>:     lea    0x2ed9(%rip),%rax        # 0x404020 <p>
   0x0000000000401147 <+7>:     ret    
knows to handle lea with immediate and (%rip) to add the 0x2ed9 in there with
end of the instruction and print the resulting immediate and perhaps symbolic
rendering of it in the comment.
The 0xdeadbeef and 0xdeadbeefcafebabe immediates are clearly shown in the
assembly, so there is no need to help users reading that.
Now, let's try the same on other arches, e.g. aarch64:
  400140:       5297dde0        mov     w0, #0xbeef                     //
#48879
  400144:       72bbd5a0        movk    w0, #0xdead, lsl #16
in foo,
  400160:       d29757c0        mov     x0, #0xbabe                     //
#47806
  400164:       f2b95fc0        movk    x0, #0xcafe, lsl #16
  400168:       f2d7dde0        movk    x0, #0xbeef, lsl #32
  40016c:       f2fbd5a0        movk    x0, #0xdead, lsl #48
in bar and
  400180:       f00000e0        adrp    x0, 41f000 <baz+0x1ee80>
  400184:       913fa000        add     x0, x0, #0xfe8
in baz.  It would be helpful if the disassembly could for a small set of
instructions which are usually involved in constant creations in GPR registers
be able to propagate constants through them; for each GPR register remember if
it is set to a known constant (then also the constant value) or not. When
seeing a start of a function (new symbol?)
reset this knowledge, maybe also reset it on possible conditional/unconditional
jump destinations from the same function (though computing that might require
another pass through the instructions), when seeing a GPR register set with a
handled instruction to constant remember that constant, when seeing a handled
instruction where all the inputs 
have known constant values try to evaluate the instruction and remember the
resulting constant and then show in comments like in the lea case above the
immediate plus symbolic rendering if any.  And when seeing an unhandled
instruction that sets or clobbers some GPR (or might do that), forget the value
of that register.
So, for foo above, remember that w0 is set to 0xbeef, interpret the movk
instruction that the result is 0xdeadbeef and tell it to the user, ditto for
the second case, similarly remember for adrp and handle the add too, printing
there 41ffe8 <p>.
Now, repeat this on other arches, powerpc{,64,64le}, sparc{,64}, ...
On s390x, one can also see that it loads some constants from
.rodata/.data.rel.ro* and similar sections, those too would be nice to track
and print.
This would help users so that they don't have to scratch their heads
interpreting the instructions or having to actually see what it does at runtime
to find out what it actually computes.
In gdb, sometimes one just disassembles part of a function, not the whole one,
I think it would be perfectly fine to start with nothing known state at the
start of such a block and print only what is discovered in that block.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]