bug-binutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug gas/30336] New: The GNU Assembler has bugs in Intel syntax


From: soomink at kaist dot ac.kr
Subject: [Bug gas/30336] New: The GNU Assembler has bugs in Intel syntax
Date: Wed, 12 Apr 2023 01:51:30 +0000

https://sourceware.org/bugzilla/show_bug.cgi?id=30336

            Bug ID: 30336
           Summary: The GNU Assembler has bugs in Intel syntax
           Product: binutils
           Version: unspecified
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: gas
          Assignee: unassigned at sourceware dot org
          Reporter: soomink at kaist dot ac.kr
  Target Milestone: ---

Hi, I'm Soomin Kim from KAIST SoftSec Lab.

We are reporting two x86-64 assembler bugs we found, which are all relevant to
Intel assembly syntax. The bugs were discovered while we manipulated the label
names of toy assembly programs.

--------------------------------

The first bug:
```
$ cat ./variant1.s
.intel_syntax noprefix
.text
or:
ret
call or
$ as -msyntax=intel -o ./variant1.o ./variant1.s
./variant1.s: Assembler messages:
./variant1.s:5: Error: invalid use of operator "or"
```
`GNU as` rejects this program because of the token `or`. Note that this program
is generated from the below assembly program by changing the label name:
```
$ cat ./normal1.s
.intel_syntax noprefix
.text
LABEL:
ret
call LABEL
$ as -msyntax=intel -o ./normal1.o ./normal1.s
```
Unlike `variant1.s`, `GNU as` can compile this program. However, it was indeed
hard for me to find on the Internet why the name (`or`) matters. For example, a
Wikipedia webpage (https://en.wikipedia.org/wiki/X86_assembly_language) lists
several keywords but does not include `or`.

Surprisingly, `or` does not raise a problem in AT&T syntax. Please refer to the
below program:
```
$ cat ./variant2.s
.text
or:
ret
call or
$ as -msyntax=att -o ./variant2.o ./variant2.s
```
We thought this is a bug of `GNU as` because (1) the one written in AT&T was
accepted by `GNU as`, and (2) there are no reasons to reject the case. Other
usages of `or` (an instruction mnemonic, for example) cannot be applied to the
argument of `call` instruction, and clearly there is a definition of the label
`or`.

--------------------------------

The second bug:
```
$ cat ./variant1.s
.intel_syntax noprefix

.data
rsp:
.long 1
.long 2
.long 3
.long 4

.text
lea rax, [rsp] // rsp here is intended to refer to a pointer in .data section
$ as -msyntax=intel -o ./variant1.o ./variant1.s
$ objdump -d ./variant1.o
./variant1.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:   48 8d 04 24             lea    (%rsp),%rax
```
This bug is somewhat similar to the first bug, but has a different aspect. We'd
better show the original assembly program to make it easy to understand this
bug.
```
$ cat ./normal.s
.intel_syntax noprefix

.data
LABEL:
.long 1
.long 2
.long 3
.long 4

.text
lea rax, [LABEL]
$ as -msyntax=intel -o ./normal1.o ./normal1.s
```
The code semantics of the original program is loading the pointer LABEL to the
register `rax`. However, after we change the name of the label to `rsp`, which
is an existing register name, the resulting program certainly has different
code semantics. The binary code from `GNU as` moves a value stored in the
register `rsp` to `rax`.

The problem here is that even though there is an ambiguity in choosing the
right target between the label `rsp` and the register `rsp`, `GNU as` randomly
chooses one of them, so the program has an unintended behavior.

Likewise, this issue will never happen with AT&T syntax. Please refer to the
below code:
```
$ cat ./variant2.s
.data
rsp:
.long 1
.long 2
.long 3
.long 4

.text
leaq (rsp), %rax
$ as -msyntax=att -o ./variant2.o ./variant2.s
$ objdump -d ./variant2.o

./variant2.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:   48 8d 04 25 00 00 00 00         lea    0x0,%rax
```
The label `rsp` is successfully transformed into a relocation entry in the
object file.

--------------------------------

We have seen two different situations where the names of labels can make `GNU
as` confused. We thought these are very interesting, as it is rather hard to
strictly say that `GNU as` is wrong.

We think there are two possibilities:
(1) Intel syntax rejects the use of an opcode name as a label, or
(2) `GNU as` just mishandles the label.

In one sense, the ambiguity of Intel syntax (due to the absence of an official
Intel assembly syntax manual) is the problem. For decades, many assemblers have
been developed ad-hoc without any standards. So, it seems to be a hard decision
problem to allow/deny several tokens or to choose the right usage.

On the other hand, `GNU as` need to handle both two cases. They may reduce the
usability and correctness of `GNU as`. A user might want to write a function
named `or`, but get rejected by `GNU as`. A user might want to load a data
pointer named `rsp`, but the resulting program loads a stack pointer, which can
differ from the user's intention.

We suggest that `GNU as` should compile the first case, and `GNU as` should
*not* compile the second case or should raise the alarm for the one.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]