gforth
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gforth] Performance anomality with dynamic superinstructions on MIP


From: Anton Ertl
Subject: Re: [gforth] Performance anomality with dynamic superinstructions on MIPSel
Date: Sun, 23 Mar 2014 16:29:22 +0100
User-agent: Mutt/1.5.18 (2008-05-17)

On Sat, Mar 22, 2014 at 03:25:42PM +0100, David Kuehling wrote:
> >>>>> "Bernd" == Bernd Paysan <address@hidden> writes:
> 
> > Am Samstag, 22. März 2014, 07:24:55 schrieb David Kuehling:
> >> I'm using a recent gforth revision from git (6ec9915f6277de) and
> >> noticed that running gforth --dynamic produces pretty extreme
> >> performance degradation [..]
> 
> > How does this affect other microbenchmarks, e.g. onebench.fs? And:
> > SEE-CODE <word> shows the dynamically generated code; could you
> > provide that for the microbenchmark above?
> 
> Ahh, SEE-CODE does a nice job.  The disassembly for the full
> code-sequences of my recursive micro-benchmark for gforth-fast with and
> w/o --dynamic is listed below.  Looks like there is a problem with the
> CALL code sequence generated for calls into colon-definitions:
> 
>   gforth-fast --dynamic
>   : test1 ;
>   : test2 test1 ;
>   see-code test2
> 
>   $2BB725B0 call
>   $2BB725B4 <test1> 
>   ( $2BFC9FA8 ) 3 16 0 addu,
>   ( $2BFC9FAC ) 16 0 16 lw,
>   ( $2BFC9FB0 ) 2 18 0 addu,
>   ( $2BFC9FB4 ) 3 3 4 addiu,
>   ( $2BFC9FB8 ) 18 18 -4 addiu,
>   ( $2BFC9FBC ) 16 16 4 addiu,
>   ( $2BFC9FC0 ) 3 -4 2 sw,
>   ( $2BFC9FC4 ) 2 -4 16 lw,
>   ( $2BFC9FC8 ) $7C03E83B , ( illegal inst ) 
>   ( $2BFC9FCC ) 4 -32680 28 lw,
>   ( $2BFC9FD0 ) 30 3 0 addu,
>   ( $2BFC9FD4 ) 4 4 30 addu,
>   ( $2BFC9FD8 ) 3 2 0 addu,
>   ( $2BFC9FDC ) 4 256 29 sw,
>   ( $2BFC9FE0 ) 3 jr,
>   ( $2BFC9FE4 ) 1 1 0 or,
>   $2BB725B8 ;s ok
> 
> Compare this against the disassembly of CALL:
> see call:
> 
>   Code call  
>   ( $403C34 ) 3 16 0 addu,
>   ( $403C38 ) 16 0 16 lw,
>   ( $403C3C ) 2 18 0 addu,
>   ( $403C40 ) 3 3 4 addiu,
>   ( $403C44 ) 18 18 -4 addiu,
>   ( $403C48 ) 16 16 4 addiu,
>   ( $403C4C ) 3 -4 2 sw,
>   ( $403C50 ) 2 -4 16 lw,
>   ( $403C54 ) 3 2 0 addu,
>   ( $403C58 ) 3 jr,
>   ( $403C5C ) 1 1 0 or,
>   end-code
> 
> Instead of NEXT the code in test2 holds some nonsense, starting with
> invalid instruction $7C03E83B .

Looks to me like the dispatch code that is appended to the code for
the call does some additional stuff (in this particular case we could
actually take the whole CALL instead of cutting the dispatch part off
and appending some other dispatch code, but some gcc versions replace
the NEXT at the end of the word with a direct jump to dispatch code,
and then that does not work).  You can look at where the parts are
with

gforth-fast --debug

The output contains something like:

Compiled with gcc-4.3.2
goto * 0x804b539 0x804fec9 len=12
...
call            0-0   11 0x804b6f0 0x80501f0 0x804b6f0 len= 26 rest= 3 send=1

This means that the fragment appended every time there is something
other than a fall-through is 12 bytes long, whereas the normal code is
3 bytes long (IIRC what the numbers mean).  You can find the appended
fragment at 0x804b539 (or 0x804fec9), and the code for CALL at
0x804b6f0, 0x80501f0, or 0x804b6f0.

And for the machine where I did this, the code generated by the
dynamic call is also longer than the CALL code itself.

Maybe we need another round of playing around with gcc to find out how
to make it produce a short copyable "goto *".  Or we might change the
copying code to copy the whole thing if the NEXT part is relocatable.

> Don't know why that instruction doesn't
> SIGILL, but maybe it's a non-standard/undocumented instruction on
> Loongson2f.  The binutils also don't know anything about that opcode:
> 
>   echo -e "\x3b\xe8\x03\x7c" > /tmp/inst  
>   objdump -D -EL -b binary -m mips:loongson_2f /tmp/inst 
>   [..]
>    0:   7c03e83b        0x7c03e83b

Since this code was originally generated with gas, the binutils do
know about this.  To use gdb (i.e., binutils) for the disassembly, use

' disasm-gdb is discode

before doing the SEE-CODE or SEE.

- anton



reply via email to

[Prev in Thread] Current Thread [Next in Thread]