Update 2 on Bytecode Offset tracking

Another update on my Summer of Code project, improving ELisp traceback information.

On the C side (saving bytecode execution offset so mapbacktrace can pass it to Lisp), I’ve submitted a patch with the offset tracking code to bug-gnu-emacs. This is the version which updates the thread’s offset only before reaching a Bcall op in exec_byte_code.

On the Lisp side (turning the offset into a source code position), right now I have a very rudimentary modified byte-compiler which compiles expressions annotated with source code data (in the form of a cl-struct). The code is up here.

The main entry point is a function source-map-byte-compile-definition, similar to byte-compile, which:

takes a string or sexp containing a function definition
reads it into a source-map-_expression_ struct containing source code information (and the original sexp)
passes the struct through the compilation process, maintaining an alist associating source-map-_expression_s to LAP.

The alist is converted to a vector, stored in a bytecode-source-map struct and added to the symbol’s bytecode-source-map property. There are functions to retrieve the code for a specific offset in the function, or to fetch an alist of the code and LAP. For example:

(let ((lexical-binding t))
  (source-map-byte-compile-definition
   '(defalias 'plus2-times3 #'(lambda (arg) (* (+ arg 2) 3)))))

(disassemble #'plus2-times3)
;; byte code for plus2-times3:
;;   doc:   ...
;;   args: (arg1)
;; 0       dup
;; 1       constant  2
;; 2       plus
;; 3       constant  3
;; 4       mult
;; 5       return

(bytecode-source-map 'plus2-times3 0) ;; => "arg"
(bytecode-source-map 'plus2-times3 1) ;; => "2"
(bytecode-source-map 'plus2-times3 2) ;; => "(+ arg 2)"
(bytecode-source-map 'plus2-times3 3) ;; => "3"
(bytecode-source-map 'plus2-times3 4) ;; => "(* (+ arg 2) 3)"

(source-map-bytecomp-annotated-lap 'plus2-times3)
;; (("arg" byte-dup)
;;  ("2" byte-constant 2 . 0)
;;  ("(+ arg 2)" byte-plus . 0)
;;  ("3" byte-constant 3 . 1)
;;  ("(* (+ arg 2) 3)" byte-mult . 0)
;;  ("(* (+ arg 2) 3)" byte-return . 0))

It’s quite limited as it is. byte-optimize is disabled, and it assumes the _expression_ is fully macroexpanded. cconv also isn’t supported yet, so only simple lexically-scoped functions work. If you’d like to try it out, there are instructions in the repository’s README for running the ERT tests (basically cd source-mapping, ./run-tests.sh).

It isn’t that much slower than byte-compile in simple tests, but there’s no way to get a realistic idea of performance while only simple expressions are supported. Since arefs are pretty fast, I’m guessing the struct slot accesses wouldn’t slow down execution too much, but creating so many records certainly uses a lot of memory.

I’d like to know what others think of this: if there are inherent flaws in this approach, if there’s a simpler or more obvious way to integrate this into the byte-compilation process, or any other comments.

-Zach

From:	Zach Shaftel
Subject:	Update 2 on Bytecode Offset tracking
Date:	Tue, 28 Jul 2020 15:19:24 -0400
User-agent:	mu4e 1.4.10; emacs 28.0.50