emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH, RFC] Macros expansion changes, robust symbol macros, and con


From: Daniel Colascione
Subject: Re: [PATCH, RFC] Macros expansion changes, robust symbol macros, and constant propagation
Date: Tue, 17 Sep 2013 16:43:55 -0700
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

On 9/17/13 2:52 PM, Stefan Monnier wrote:> Thanks, Daniel,
>
> I'm not happy about all of it, but I do like parts of it.

Thanks. The patch isn't quite finished. I appreciate your looking at
this early version.

>>  - removes Fmacroexpand and implements macroexpand-1 and macroexpand
in lisp
>
> Sounds good (indeed for gv.el we want macroexpand-1).  Do you have some
> measurement of the performance impact?

On my machine, macroexpand is about 30% slower than the old version,
depending on environment size.  The savings from the dontexpand tag
should make up for it, and macro expansion isn't a hot path anyway.  I
might close the gap by implementing memq-car in C.

>>  - adds a facility for telling macroexpand-all not to re-expand macros
>>    (effectively, providing for a variable macro expansion order)
>
> The current code already lets you do "variable macro expansion order"
> since calling the macroexpander redundantly will simply have no effect
> (other than waste time).  So the need to prevent re-expansion
> is only performance, AFAIK.

No, it's a correctness issue. This comment in the old lexical
symbol-macrolet code (commented out in head) hints at the problem:

;; This binding should hide its symbol-macro,
;; but given the way macroexpand-all works, we
;; can't prevent application of `env' to the
;; sub-expressions, so we need to α-rename this
;; variable instead.

The current macroexpand-all strategy is to call macroexpand on the head
of each list in the put, then recursively macroexpand-all all the
arguments, using the current macro environment for both kinds of
expansion. It's not possible to tell macroexpand-all not to perform the
latter operation; nor is it possible to tell macroexpand-all to use a
different macro environment when expanding subsequent parts of the same
form.

Today, if a macro wants to expand its arguments using a macro
environment different from the one with which it was invoked, it calls
macroexpand-all internally and returns a fully expanded form, the
reasoning being that macroexpand-all's argument-expansion logic is
idempotent on fully expanded forms. That doesn't work when it comes to
symbol macro shadowing, as the author of the comment above figured out.
Maybe an example would help:

Say we have this form:

(symbol-macrolet ((x 4)) (let ((x 5)) x))

When we expand this form, `symbol-macrolet` adds
`macroexp--sm-macroexpand-1' to the environment as a macro expansion
hook (so it acts like a compiler macro for every form), pushes the
symbol macro "x => 4" onto the macro environment as well, then calls
`macroexpand-all' on the arguments.  Next, we expand (while still inside
the first `symbol-macrolet` call) the `let', parse its binding list, and
realize that we want to expand its body with the "x => 4" macro
shadowed.  `macroexp--sm-macroexpand-1' pushes a new binding, "x =>
nil", onto the environment and expand the body, `x', using
`macroexpand-all'. Now the macro expansion hook sees `x', consults the
environment, sees "x => nil", and decides to leave `x' as `x'.  The
expansion hook processes `let' into `(let ((x 5)) x)' as expected.

The trouble is using today's macro-expansion strategy, `macroexpand-all'
will recursively expand the `let' body.  It sees `x' and invokes the
expansion hook, which consults the environment, sees the binding "x =>
4", and expands `x' to `4'.  `macroexpand-all' then returns `(let ((x
5)) 4', which is not exactly what we wanted.

The existing, commented-out symbol-macro code tries to avoid this
outcome by renaming `x' to a different symbol (say `y') so that we never
see the "x => 4" binding .  But this approach leads to incorrect
behavior when `x' is dynamically bound; other problems are that it is
inefficient and confuses the debugger.

My patch solves this problem a different way: we have
`macroexp--sm-macroexpand-1' return `(list dontexpand ((let ((x 5)) x)'.
 `macroexpand-all` strips the dontexpand tag, but remembers not to
expand `x', so `macroexp--sm-macroexpand-1' never runs on `x' with
environment "x => 4", and we produce the correct overall form.

Repeated expansion per se isn't a problem so long as we do it in the
correct macro environment --- and remember, the macro expansion hook is
part of the environment, so correct code can't lose it.  If we expand
`x' with environment ["x => nil" "x => 4"
"expand-with-macroexp--sm-macroexpand-1"], we correctly expand `x' to
`x'.  If we expand `(let ((x 5)) x)' with environment ["x => 4"
"expand-with-macroexp--sm-macroexpand-1"], we end up invoking the
expansion hook and still correctly expand to `(let ((x 5)) x'.  And, of
course, if we expand
`(symbol-macrolet ((x 4)) (let ((x 5)) x))' with the empty environment,
we get the correct result.

This approach solves the problem generally and completely, and without
any symbol rewriting.  That it's also an efficiency win for unrelated
code is gravy.

> BTW, I see you sometimes strip the macroexpand-already-expanded tag.
> Can't this lead again to repeated expansion?

Yes, but only in a harmless way.  See above.

>
>>  - re-implements symbol-macros using this new facility
>>    * no more problems with EQ symbol names
>
> Good.
>
>>    * shadowable and non-shadowable (let becomes letf) varieties
>
> Not sure how important that is.

I'd prefer to always use the shadowable variant and ditch the automatic
conversion to `letf', but see bug#12119.  I don't want to break existing
code.

>>    * macros expanded only once, so no need to rename symbols
>
> I don't know what renaming this refers to.  Can you give details.

See the discussion above.

>>  - adds symbol-macros to the core, in macroexp
>
> I don't think I like this: CL's symbol-macro is *very* rarely used, so
> for now I'd rather keep it in cl-macs.el.

Ah, but we use symbol macros to do constant propagation, and we want
constant propagation to be in the core!

>
>>  - adds variable-capture analysis to macroexp
>
> That's a fairly large function, which re-implements an analysis largely
> already performed in cconv.el (tho it might not be completely
> straightforward to reuse that code for your purpose).

We need this analysis for optimizing let bindings.  cconv's analysis is
similar in some respects, and I looked into using it, but cconv's
analysis does both more (closure candidates) and less (reference
analysis) than what we need --- besides, it has a very inconvenient
interface.

> It's only used for cl--expr-contains, so it should be in cl-macs.el, not
> in macroexp.el.  And I'm really not sure this complexity is warranted
> for those uses: does it fix actual bugs?

We'll be using it for other tasks too.  Also, any unsafe code
transformation is a bug.  If we have a robust analysis engine, why not
use it everywhere?

>>  - changes cl-defsubst so that it works like regular defsubst
>
> Doesn't this break setf on defstruct slots?

Yes, but I fixed it by restoring the old explicit-accessor code.  Having
gv depend on compiler macros is scary.  A compiler macro should be pure
optimization.  Code shouldn't depend on compiler macros for correctness.

>>  - fixes various cl-lib bugs using the new macrexp features and
symbol-macros
>
> Did you keep track of the bugs it fixes?  It would be good to know.

I'll come up with a list when I actually merge the change.

>> +(defun byte-optimize-do-constant-propagation (let-form)
>> +  (let* ((bindings (cadr let-form))
>> +         (body (cddr let-form))
>> +         (const-bindings)
>> +         (nonconst-bindings))
>> +    ;; Split bindings into const, nonconst sets.  We'll collapse
>> +    ;; redundant nonconst bindings in the `symbol-macrolet'.
>> +    (let ((bc bindings))
>> +      (while bc
>> +        (let* ((binding (pop bc))
>> +               (var (or (car-safe binding) binding))
>> +               (val (and (consp binding) (cadr binding))))
>> +          (cond ((and (not (special-variable-p var))
>> +                      (macroexp-const-p val))
>> +                 (push (list var val) const-bindings))
>> +                ((and (eq (car let-form) 'let*)
>> +                      const-bindings
>> +                      nonconst-bindings)
>> +                 ;; Handle the rest of the let* forms as a child;
>> +                 ;; we'll combine the nested let*s later.
>> +                 (setq body `((let* (,binding ,@bc) ,@body)))
>> +                 (setq bc nil))
>> +                (t (push binding nonconst-bindings))))))
>> +    (if (not const-bindings)
>> +        form
>> +      `(,(car let-form)
>> +        ,nonconst-bindings
>> +        ,@(macroexp-unprogn
>> +           (macroexpand-all
>> +            `(symbol-macrolet-shadowable
>> +                 ,(nreverse const-bindings)
>> +               ,@body)))))))
>
> How does this handle the case where the var is first bound to a constant
> and later setq'd?

Oops. That code didn't make it into the patch. The intent is to use
macroexp-analyze-free-variables to make mutated variables ineligible for
this optimization.

>>              (push (list temp) cl--loop-bindings)
>>              (setq form `(if (setq ,temp ,cond)
>> -                                ,@(cl-subst temp 'it form))))
>> +                                (symbol-macrolet-shadowable ((it ,temp))
>> +                                  ,@form))))
>
> Why not (setq form `(let ((it ,cond)) (when it ,@form)))?

In dynamically bound code, we want the `it' symbol to be lexically
scoped.  I suppose it doesn't matter all that much, but if we have
robust symbol-macro machinery, why not use it?

>
>> +(defalias 'cl-letf 'letf)
>> +(defalias 'cl-letf* 'letf*)
>
> I don't think I want to have letf in the core

I really don't like letf.  That said, we need it in the core if we have
symbol macros in the core and we want to preserve existing
symbol-macrolet behavior.  We already have gv support in the core, and
the letf implementation is small.  I don't think adding letf to the core
should be a problem.

> (but I'd like to change
> cl-letf so that in the case of simple-variable bindings, those should be
> dynamically scoped, even in lexical-binding code).

I'm not sure this change makes sense.  It'd break existing code and
break optimizations that depend on letf boiling down to regular let in
simple cases.  I'm not sure what problem it would solve either.  If
users need to do something like this, they can use letf on (symbol-value
variable) or use a symbol macro to the same effect.  If
we do have a letf, it should act just like let does for forms that let
accepts.

>> +(defun cl--struct-setf-expander (x name accessor pred-form pos)
>
> Aha!  I wonder if I don't rely on cl-defsubst's previous behavior
> elsewhere as well.
>
>> +                (push `(define-setf-expander ,accessor (cl-x)
>
> You can't use define-setf-expander here since it requires cl.el.

Thanks for the reminder.  The original code used define-setf-expander,
so I kept it.  I'll write it a different way.

>
>> -                                               nil)
>> +                                               (and unsafe (list
argn argv)))
>
> Why undo this recent change?

Copypasta from a slightly stale local copy.  Thanks for spotting the error.

>
>> +    (`(lambda ,arglist . ,body)
>> +     (macroexp-analyze-free-variables
>
> This form should never appear in macroexpanded code.

It can appear in macroexpanded subtrees.  Try (macroexpand-all '(setq
foo (lambda (x) (1+ x)))).

>
>> +    (`(closure ,cells ,arglist . ,body)
>
> And this one shouldn't either.

Hrm --- what if somebody substitutes a function by value into a form,
and that function's value happens to be a closure form?

>
>> +(defconst macroexp--sm-environment-tag
>> +  (if (boundp 'macroexp--sm-environment-tag)
>> +      (symbol-value 'macroexp--sm-environment-tag)
>> +    (make-symbol "--macroexp--sm-environment-tag--"))
>
> Why not
>
>    (defvar macroexp--sm-environment-tag
>      (make-symbol "--macroexp--sm-environment-tag--")

Because if you hit C-M-x with point inside (say, to update the
docstring), you define macroexp--sm-environment-tag to a new value and
wreak havoc with any stored macro environments.

> Also, IIUC we could just use `:symbol-macro' instead.

I was wondering about that.  The point of all this is to try to make
sure that we never accidentally confuse these tags with real functions
or with things macros could actually return.  Using a keyword symbol
instead of an uninterned one, since we'll never have a function with a
keyword-symbol name --- right?

>> +    (`(lambda ,arglist . ,_)
>> +     ;; Lambda arguments always shadow symbol macros
>
> Again, such a form should never appear in macroexpanded code.

They do for me.

>> +    (`(function (lambda . ,_))
>> +     ;; macroexpand-all has special logic to detect lambda in function
>> +     ;; position, so we need a special case here too.
>
> I don't understand the comment.

Without this clause, we don't match the other lambda case (because
symbol-macrolet has a transformation for lambda) and miss some shadowing.

>> +(defconst macroexpand-already-expanded
>> +  (if (boundp 'macroexpand-already-expanded)
>> +      (symbol-value 'macroexpand-already-expanded)
>> +    (make-symbol "--macroexpand-already-expanded--"))
>
> Again: why not a defvar?  Also, we could define it as an interned
> well-known symbol.  E.g. macroexp--already-expanded.

In this case, I think an interned well-known symbol will work.  I'd
prefer making it a keyword symbol, though, in order to highlight its
special effects.  It turns out that evalling a form that starts with a
keyword symbol works just fine as long as the symbol has a function
definition, and ours will.

> Also, all that macroexpand-1 code should ideally be in macroexp.el
> rather than in subr.el.  Maybe that introduces bootstrapping problems,
> but IIRC it can be made to work (I played around with such a thing
> during the cl-lib.el work).

I'll try to make it work.  The cyclic dependencies on macroexp make me
nervous, though.  What's so bad about putting things in subr?

>> +;; It may seem odd to macro expansion hooks in the macro environment
>> +;; instead of dynamically-binding a hypothetical
>> +;; macroexpand-1-functions variable.
>
> Indeed.
>
>> +;; The reason we do it this way is
>> +;; so that we can expand macros from outside the dynamic extent of a
>> +;; form that introduces a macro expansion hook --- e.g.,
>> +;; `cl-symbol-macrolet'.
>
> I don't understand the argument.  Do you have a concrete/detailed
> example where it's needed/useful/better?

The macro environment should hold everything you need to expand a form
in a given lexical environment.  By making the correct interpretation of
the macro environment depend on a dynamic variable outside the
environment itself, you introduce the possibility of subtle error ---
you effectively give macro environments dynamic extent!  If we can give
our environments indefinite extent, why not? It's less confusing that
way.  See http://clhs.lisp.se/Issues/iss229_w.htm.

>> +;; We can store other state in the environment as well: all that's
>> +;; required for compatibility with naive users of the macro
>> +;; environment is to ensure that no car of any cons in the macro
>> +;; environment refers to a form we might try to expand.  That's why
>> +;; all the macro tags should be uninterned.
>
> Keywords should work as well.

Agreed.

>> +(defun memq-car (key alist)
>> +  "Find cons with car KEY in ALIST.
>> +Like `assq', except that instead of returning
>> +the cons cell whose car is `eq' to KEY, it returns
>> +the cons cell whose car is that cons cell and whose
>> +cdr is the rest of the alist."
>
> Let's leave it close to its users, with a macroexp- prefix for now.

What if I want to implement it in C after all?

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]