guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wip-rtl return location


From: Andy Wingo
Subject: Re: wip-rtl return location
Date: Fri, 03 Aug 2012 10:24:30 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)

Heya Mark,

Thanks for the comments :-)

So the current thing is:

  CALL:
       call f
  MVRA:
       truncate
  RA:
       return

In interpreted code it's fine that we return to a different address, as
they are all indirect jumps anyway.  For compiled code, the "call"
instruction won't be just one instruction -- in RTL it's a
macro-instruction that also handles shuffling the arguments into place,
whereas with native compilation you would handle the shuffling then do
the call.  So for native compilation you could put the MVRA before the
call because the call would have fixed width.

Anyway, on to your point:

On Fri 03 Aug 2012 04:29, Mark H Weaver <address@hidden> writes:

> I wonder if it might be better to avoid this branch misprediction by
> always returning to the same address.  Upon return, a special register
> would contain N-1, where N is the number of return values.  The first
> few return values would also be stored in registers (hopefully at least
> two), and if necessary the remaining values would be stored elsewhere,
> perhaps on the stack or in a list or vector pointed to by another
> register.

It's a good idea for the native-compilation case.  I don't think the
overhead of the conditional jump in interpreted code would be worth it,
though.  Dunno.  Probably wouldn't matter?

> In the common case where a given call site expects a small constant
> number of return values, the compiler could emit a statically-predicted
> conditional branch to verify that N-1 is the expected value (usually
> zero), and then generate code that expects to find the return values in
> the appropriate registers.

But here's the rub, this introduces a conditional branch into the
calling sequence where there was no conditional branch before (for the
single-valued case, which is empirically the majority of cases).

Apparently the intel Core architecture ignores static branch
predictions:

  http://www.agner.org/optimize/microarchitecture.pdf

So it would increase pressure on the branch target buffer.  OTOH the
dynamic predictions would almost always hit, in either case.

> On some architectures, it might also make sense for the callee to set
> the processor's "zero?" condition code as if N-1 had been tested, to
> allow for a shorter check in the common single-value case.
>
> Of course, the calling convention can be chosen independently for each
> instruction set architecture / ABI.
>
> What do you think?

I think it's definitely worth exploring.  I would be OK with it, and
receiving results in registers would be good.

In the context of what we do with the bytecode (as opposed to calling
convention optimizations that we will do with native code), WDYT about
the bytecode calling convention I outlined above?

Cheers,

Andy
-- 
http://wingolog.org/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]