[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Native compilation - specific optimisation surely possible?
From: |
Alan Mackenzie |
Subject: |
Native compilation - specific optimisation surely possible? |
Date: |
Sun, 2 Jan 2022 10:20:02 +0000 |
Hello, Emacs.
The following very short function:
;; -*- lexical-binding: t -*-
(defun comp-test-55 (x)
(unless (integerp x)
x))
byte compiles to:
byte code for comp-test-55:
doc: ...
args: (arg1)
0 dup
1 integerp
2 not
3 goto-if-nil-else-pop 1
6 dup
7:1 return
, then on an amd-64 machine, native compiles to (annotation added by
me):
00000000000012c0 <F636f6d702d746573742d3535_comp_test_55_0>:
Setup of the function:
12c0: 55 push %rbp
12c1: 53 push %rbx
12c2: 48 89 fb mov %rdi,%rbx
12c5: 48 83 ec 08 sub $0x8,%rsp
12c9: 48 8b 05 18 2d 00 00 mov 0x2d18(%rip),%rax # 3fe8
<freloc_link_table@@Base-0x240>
12d0: 48 8b 28 mov (%rax),%rbp
fixnump:
12d3: 8d 47 fe lea -0x2(%rdi),%eax
12d6: a8 03 test $0x3,%al
12d8: 75 26 jne 1300
<F636f6d702d746573742d3535_comp_test_55_0+0x40>
12da: 48 8b 05 ff 2c 00 00 mov 0x2cff(%rip),%rax # 3fe0
<d_reloc@@Base-0x220>
12e1: 48 8b 78 10 mov 0x10(%rax),%rdi
Nil in %rdi?:
12e5: 31 f6 xor %esi,%esi
12e7: ff 95 c0 27 00 00 call *0x27c0(%rbp) `eq'
<========================
12ed: 48 85 c0 test %rax,%rax
12f0: 48 0f 45 c3 cmovne %rbx,%rax
Tear down of the function:
12f4: 48 83 c4 08 add $0x8,%rsp
12f8: 5b pop %rbx
12f9: 5d pop %rbp
12fa: c3 ret
12fb: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
bignump:
1300: 8d 47 fb lea -0x5(%rdi),%eax
1303: a8 07 test $0x7,%al
1305: 74 09 je 1310
<F636f6d702d746573742d3535_comp_test_55_0+0x50>
1307: 31 ff xor %edi,%edi
1309: eb da jmp 12e5
<F636f6d702d746573742d3535_comp_test_55_0+0x25>
130b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
pseudovectorp:
1310: be 02 00 00 00 mov $0x2,%esi
1315: ff 55 08 call *0x8(%rbp)
1318: 84 c0 test %al,%al
131a: 75 be jne 12da
<F636f6d702d746573742d3535_comp_test_55_0+0x1a>
131c: 31 ff xor %edi,%edi
131e: eb c5 jmp 12e5
<F636f6d702d746573742d3535_comp_test_55_0+0x25>
.. The input parameter x (or arg1) is passed into the function in the
register %rdi. integerp is coded successively as fixnump followed (if
necessary) by bignump. The fixnump is coded beautifully in three
instructions.
I don't understand what's happening at 12da. It seems that the address
of a stack pointer is being loaded into %rax, from which the result of
`fixnump' (which was already in %rax) is loaded into %rdi.
But my main point is the compilation of the `not' instruction at 12e5.
The operand to `not' is in %rdi. It is coded up as (eq %rdi nil) by
loading 0 (nil) into %rsi at 12e5, then making a function call to `eq'
at 12e7.
Surely the overhead of the function call for `eq' makes this a candidate
for optimisation? `not' could be coded up in two instructions (test
%rdi,%rdi followed by a conditional jump or (faster) the cmovne which is
%already there).
`not' is presumably a common opcode in byte compiled functions. `eq'
surely more so. So why are we coding these up as function calls?
Andrea?
--
Alan Mackenzie (Nuremberg, Germany).
- Native compilation - specific optimisation surely possible?,
Alan Mackenzie <=