emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: native compilation units


From: Stefan Monnier
Subject: Re: native compilation units
Date: Mon, 13 Jun 2022 13:15:21 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux)

> To be clear, I'm trying to first understand what Andrea means by "safe".
> I'm assuming it means the result agrees with whatever the byte
> compiler and VM would produce for the same code.

Not directly.  It means that it agrees with the intended semantics.
That semantics is sometimes accidentally defined by the actual
implementation in the Lisp interpreter or the bytecode compiler, but
that's secondary.

The semantic issue is that if you call

    (foo bar baz)

it normally (when `foo` is a global function) means you're calling the
function contained in the `symbol-function` of the `foo` symbol *at the
time of the function call*.  So compiling this to jump directly to the
code that happens to be contained there during compilation (or the code
which the compiler expects to be there at that point) is unsafe in
the sense that you don't know whether that symbol's `symbol-function`
will really have that value when we get to executing that function call.

The use of `cl-flet` (or `cl-labels`) circumvents this problem since the
call to `foo` is now to a lexically-scoped function `foo`, so the
compiler knows that the code that is called is always that same one
(there is no way to modify it between the compilation time and the
runtime).

> I doubt I'm bringing up topics or ideas that are new to you.  But if
> I do make use of semantic/wisent, I'd like to know the result can be
> fast (modulo garbage collection, anyway).

It's also "modulo enough work on the compiler (and potentially some
primitive functions) to make the code fast".

> I've been operating under the assumption that
>
>    - Compiled code objects should be first class in the sense that
>    they can be serialized just by using print and read.  That seems to
>    have been important historically, and was true for byte-code
>    vectors for dynamically scoped functions.  It's still true for
>    byte-code vectors of top-level functions, but is not true for
>    byte-code vectors for closures (and hasn't been for at least
>    a decade, apparently).

It's also true for byte-compiled closures, although, inevitably, this
holds only for closures that capture only serializable values.

> But I see that closures are being implemented by calling an ordinary
> function that side-effects the "constants" vector.

I don't think that's the case.  Where do you see that?
The constants vector is implemented as a normal vector, so strictly
speaking it is mutable, but the compiler will never generate code that
mutates it, AFAIK, so you'd have to write ad-hoc code that digs inside
a byte-code closure and mutates the constants vector for that to happen
(and I don't know of such code out in the wild).

> OTOH, prior to commit
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=d0c47652e527397cae96444c881bf60455c763c1
> it looks like the closures were constructed at compile time rather than by
> side-effect,

No, this commit only changes the *way* they're constructed but not the
when and both the before and the after result in constant vectors which
are not side-effected (every byte-code closure gets its own fresh
constants-vector).

> Wedging closures into the byte-code format that works for dynamic scoping
> could be made to work with shared structures, but you'd need to modify
> print to always capture shared structure (at least for byte-code vectors),
> not just when there's a cycle.

It already does.

> The approach that's been implemented only works at run-time when
> there's shared state between closures, at least as far asI can tell.

There can be problems if two *toplevel* definitions are serialized and
they share common objects, indeed.  The byte-compiler may fail to
preserve the shared structure in that case, IIRC.  I have some vague
recollection of someone bumping into that limitation at some point, but
it should be easy to circumvent.

> Then I think the current approach is suboptimal.  The current
> byte-code representation is analogous to the a.out format.
> Because the .elc files run code on load you can put an arbitrary
> amount of infrastructure in there to support an implementation of
> compilation units with exported compile-time symbols, but it puts
> a lot more burden on the compiler and linker/loader writers than just
> being explicit would.

I think the practical performance issues with ELisp code are very far
removed from these problems.  Maybe some day we'll have to face them,
but we still have a long way to go.

>> You explicitly write `(require 'cl-lib)` but I don't see any
>>
>>     -*- lexical-binding:t -*-
>>
>> anywhere, so I suspect you forgot to add those cookies that are needed
>> to get proper lexical scoping.
>> Ok, wow, I really misread the NEWS for 28.1 where it said
> The 'lexical-binding' local variable is always enabled.

Are you sure?  How do you do that?
Some of the errors you showed seem to point very squarely towards the
code being compiled as dyn-bound ELisp.


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]