emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ideal performance of ELisp


From: Lynn Winebarger
Subject: Re: Ideal performance of ELisp
Date: Wed, 17 Aug 2022 08:41:45 -0400

On Tue, Aug 16, 2022 at 1:22 PM Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
> > BTW, I was being serious - if there's a way to write a simple jump to
> > do case-dispatching for that trampolining while loop, I'd definitely
> > look at making semantic produce such automata so the native compiler,
> > in particular, could optimize the result properly.
>
> AFAIK a (pcase x ('foo ..) ('bar ...) ...) should be compiled to
> a `switch` bytecode which uses a hash-table lookup to find the
> destination target to jump to.
>
> Not sure how well it works for very large tables where we risk bumping
> into the 32K limit of bytecode jumps.

That did occur to me.
Perhaps one way to address that limitation without completely
overhauling the VM would be to introduce a form of segmented memory
for byte strings.  It would require modification of the byte-code
vector during the activation frame of exec_byte_code, but I believe
there is a scheme that would ensure the byte-code object would be
returned to its non-modified state before either executing a funcall
or returning.
It would require one new byte-code op, which I will call "trampoline",
and a vector of byte-code strings (or "code segments") either at a
known location in the constants vector, or as an additional entry in
the byte-code vector itself.  I'd lean toward the latter because it
would be easy to discriminate byte-code vectors that are  allowed to
use the trampoline instruction without imposing additional space
overhead for existing byte-vectors.
The first entry in the code segment vector would be the "real"  or
"entry" byte-string, and no funcalls or returns would be performed
except when that code segment was active.
The trampoline instruction would two arguments off the operand stack -
a code segment, and an offset in the code segment - set the byte
string of the code vector to the indicated code segment, then set the
pc to the offset to perform the jump.
A typical entry byte-string for an automata might take a label and an
arbitrary number of arguments, look up the label in a dispatch table,
and trampoline to that label with the rest of the arguments on the
stack. The following instructions might raise an error to indicate the
label was invalid.  The rest of the entry byte string would contain a
block that would expect the operand stack to have the form [ fxn,
arg0, .., argn, cont-segment,cont-offset ], then perform a funcall
using all the elements on the operand stack but the bottom two, then
swap the return value to the bottom of the operand stack and
trampoline to the continuation label.  Another block would be required
that would just perform the return operation.  There might be another
for dealing with stack unwinding, I don't know the details of how that
works.
At any rate, I believe that would be a feasible scheme that would
allow the byte-compiler to (a) overcome the address space limit, and
(b) allow implementation of efficient tail recursion in simple cases
like dispatching to locally-bound closures that are closed over the
same lexical environment.  The compiler could presumably do (b) now if
there wasn't the concern for the limited address space.

Such an instruction would definitely make writing an efficient automata easier.

Lynn



reply via email to

[Prev in Thread] Current Thread [Next in Thread]