emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Thoughts on the buffer positions in the byte compiler's warning messages


From: Alan Mackenzie
Subject: Thoughts on the buffer positions in the byte compiler's warning messages.
Date: Sun, 18 Sep 2016 15:23:03 +0000
User-agent: Mutt/1.5.24 (2015-08-30)

Hello, Emacs.

The byte compiler reporting wrong positions in its warning messages is a
long standing problem.  See bugs #2681, #8774, #9109, #22288, #24128,
#24449.  #24449 and #2681 have recently been fixed.

The compiler's difficulty comes from how it reads the source code.  It
actually _reads_ it (in the lisp sense) then gets to work on the lisp
form produced, rather than reading (in the file access sense) one line
at a time and processing that, the way typical compilers do.

So, how does the byte compiler produce any position information at all?
It does so because the reader, in addition to producing the lisp form,
also produces a linear alist of the positions each symbol it encountered
was found at.  So, if the form were:

    (defun foo (bar)
      (baz))

, the alist (called read-symbol-positions-list) would look something
like:

    ((defun . 1) (foo . 7) (bar . 12) (baz . 20))

This alist is the sole source of information the compiler has to link
symbols in the form being compiled with source positions.  It does this
(in function byte-compile-set-symbol-position, which takes a single
argument, a symbol) by searching this alist for the NEXT occurrence of
the desired symbol.  So that, for example, if there were a warning
concerning "(baz)", that function would search forward from the "current
position", find (baz . 20) in read-symbol-positions-list, and from 20 it
calculates the pertinent line and column positions.

Not surprisingly, it often gets things wrong.  For example, if a warning
message is output before byte-compile-set-symbol-position has been
called for the pertinent symbol, the line and column output will be that
of the previous symbol.  This happens in bug #8774, where in:

 1  (defun fix-page-breaks ()
 2    "Fix page breaks in SAS 6 print files."
 3    (interactive)
 4    (save-excursion
 5      (goto-char (point-min))
 6      (if (looking-at "\f") (delete-char 1))
 7      (replace-regexp "^\\(.+\\)\f" "\\1\n\f\n")
 8      (goto-char (point-min))
 9      (replace-regexp "^\f\\(.+\\)" "\f\n\\1")
10          (goto-char (point-min))))

, the output messages are:

    ~/eglen.el:6:28:Warning: `replace-regexp' is for interactive use only; use
        `re-search-forward' and `replace-match' instead.
    ~/eglen.el:7:6:Warning: `replace-regexp' is for interactive use only; use
        `re-search-forward' and `replace-match' instead.

Note the positions - 6:28 points at "delete-char", and 7:6, apparently
correct, points at "replace-regexp".  Trouble is, both are wrong: the
first message should point at 7:6, and the second at 9:6.  This would
actually be fairly easy to fix, by centralising the point where
byte-compile-set-symbol-position is called, into byte-compile-form, at
the same time removing it from direct error-checking functions.

The problem with this whole mechanism is that it is strictly
left-to-right.  Once the "current-position" has passed a symbol, there
is no going back to it.  This works, more or less, with straight code.
Where a form is first transformed (whether by the byte code optimiser,
macro expansion, or the closure conversion, or whatever) and then
compiled, the "current position" becomes foggy indeed.  The macro
expander has its own routines for outputting messages (which I don't
understand at the moment), but even so, sometimes gets it wrong.

######################################################################### 

I've been trying to come up with a general solution to these problems.
What I have at the moment, which is rather vague, amounts to this:

After the reader has produced the form to be compiled and
read-symbol-positions-list, we combine these to produce a @dfn{shadow
form} with the same shape as the form, but where there's a symbol in the
form, there is a corresponding list in the shadow form, noting the
corresponding "position" in the form, and onto which warning/error
messages can be pushed.  These can then be output at the end of the
compilation.

The info in the shadow form will allow the correct node corresponding to
one in the form to be found, thus correct line/column numbers in
messages are assured for normal code.  Possibly a hash table will serve
somehow to speed up searches.

For transformed code (macro invocations, optimised forms, etc.), things
become more difficult.  However, these transformations mostly leave most
of the cons cells in the form unchanged, just rearranging them somewhat.
So the "pointers" in the shadow form will continue to be associated with
them, enabling accurate warning messages even here.

Obviously, this mechanism would cause the byte compiler to run more
slowly.  Whether or not this is significant or not would be down to
experience.

Comments?

-- 
Alan Mackenzie (Nuremberg, Germany).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]