Native compilation - specific optimisation surely possible?

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Native compilation - specific optimisation surely possible?

From:	Alan Mackenzie
Subject:	Native compilation - specific optimisation surely possible?
Date:	Sun, 2 Jan 2022 10:20:02 +0000

Hello, Emacs.

The following very short function:


;; -*- lexical-binding: t -*-
(defun comp-test-55 (x)
  (unless (integerp x)
    x))


byte compiles to:


byte code for comp-test-55:
  doc:   ...
    args: (arg1)
    0       dup
    1       integerp
    2       not
    3       goto-if-nil-else-pop 1
    6       dup
    7:1     return


, then on an amd-64 machine, native compiles to (annotation added by
me):



00000000000012c0 <F636f6d702d746573742d3535_comp_test_55_0>:
Setup of the function:
    12c0:       55                      push   %rbp
    12c1:       53                      push   %rbx
    12c2:       48 89 fb                mov    %rdi,%rbx
    12c5:       48 83 ec 08             sub    $0x8,%rsp
    12c9:       48 8b 05 18 2d 00 00    mov    0x2d18(%rip),%rax        # 3fe8 
<freloc_link_table@@Base-0x240>
    12d0:       48 8b 28                mov    (%rax),%rbp
fixnump:
    12d3:       8d 47 fe                lea    -0x2(%rdi),%eax
    12d6:       a8 03                   test   $0x3,%al
    12d8:       75 26                   jne    1300 
<F636f6d702d746573742d3535_comp_test_55_0+0x40>

    12da:       48 8b 05 ff 2c 00 00    mov    0x2cff(%rip),%rax        # 3fe0 
<d_reloc@@Base-0x220>
    12e1:       48 8b 78 10             mov    0x10(%rax),%rdi
Nil in %rdi?:
    12e5:       31 f6                   xor    %esi,%esi
    12e7:       ff 95 c0 27 00 00       call   *0x27c0(%rbp)      `eq' 
<========================
    12ed:       48 85 c0                test   %rax,%rax
    12f0:       48 0f 45 c3             cmovne %rbx,%rax
Tear down of the function:
    12f4:       48 83 c4 08             add    $0x8,%rsp
    12f8:       5b                      pop    %rbx
    12f9:       5d                      pop    %rbp
    12fa:       c3                      ret    
    12fb:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
bignump:
    1300:       8d 47 fb                lea    -0x5(%rdi),%eax
    1303:       a8 07                   test   $0x7,%al
    1305:       74 09                   je     1310 
<F636f6d702d746573742d3535_comp_test_55_0+0x50>

    1307:       31 ff                   xor    %edi,%edi
    1309:       eb da                   jmp    12e5 
<F636f6d702d746573742d3535_comp_test_55_0+0x25>
    130b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
pseudovectorp:
    1310:       be 02 00 00 00          mov    $0x2,%esi
    1315:       ff 55 08                call   *0x8(%rbp)
    1318:       84 c0                   test   %al,%al
    131a:       75 be                   jne    12da 
<F636f6d702d746573742d3535_comp_test_55_0+0x1a>
    131c:       31 ff                   xor    %edi,%edi
    131e:       eb c5                   jmp    12e5 
<F636f6d702d746573742d3535_comp_test_55_0+0x25>

..  The input parameter x (or arg1) is passed into the function in the
register %rdi.  integerp is coded successively as fixnump followed (if
necessary) by bignump.  The fixnump is coded beautifully in three
instructions.

I don't understand what's happening at 12da.  It seems that the address
of a stack pointer is being loaded into %rax, from which the result of
`fixnump' (which was already in %rax) is loaded into %rdi.  

But my main point is the compilation of the `not' instruction at 12e5.
The operand to `not' is in %rdi.  It is coded up as (eq %rdi nil) by
loading 0 (nil) into %rsi at 12e5, then making a function call to `eq'
at 12e7.

Surely the overhead of the function call for `eq' makes this a candidate
for optimisation?  `not' could be coded up in two instructions (test
%rdi,%rdi followed by a conditional jump or (faster) the cmovne which is
%already there).

`not' is presumably a common opcode in byte compiled functions.  `eq'
surely more so.  So why are we coding these up as function calls?

Andrea?

-- 
Alan Mackenzie (Nuremberg, Germany).

[Prev in Thread]

Current Thread

[Next in Thread]

Native compilation - specific optimisation surely possible?, Alan Mackenzie <=
- Re: Native compilation - specific optimisation surely possible?, Andrea Corallo, 2022/01/02
  - Re: Native compilation - specific optimisation surely possible?, Alan Mackenzie, 2022/01/03

Prev by Date: Re: Propose to add setup-wizard.el to ELPA
Next by Date: Re: Propose to add setup-wizard.el to ELPA
Previous by thread: Bootstrap speed
Next by thread: Re: Native compilation - specific optimisation surely possible?
Index(es):
- Date
- Thread