emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

regexp font-lock highlighting


From: martin rudalics
Subject: regexp font-lock highlighting
Date: Mon, 30 May 2005 10:41:25 +0200
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

The recent modification of `lisp-font-lock-keywords-2' to highlight
subexpressions of regexps has two minor bugs:

(1) If you attempt to write the regexp to match the string "\\)" as
    "\\\\\\\\)" the last three chars of that regexp are highlighted with
    `font-lock-comment-face'.

(2) If the region enclosed by the arguments START and END of
    `font-lock-fontify-keywords-region' contains one of "\\(", "\\|",
    "\\)" within a comment, doc-string, or key definition, all
    subsequent occurrences within a normal string are _not_ highlighted.
    `font-lock-fontify-keywords-region' goes to START when it evaluates
    your lambda, decides that the expression should not get highlighted
    since it has the wrong face, and wrongly concludes that no such
    expression exists up to END.

The following lambda should avoid these problems:

       ((lambda (bound)
          (catch 'found
            (while (re-search-forward 
"\\(\\\\\\\\\\)\\(?:\\(\\\\\\\\\\)\\|\\([(|)]\\)\\(\\?:\\)?\\)" bound t)
              (unless (match-beginning 2)
                (let ((face (get-text-property (1- (point)) 'face)))
                  (when (or (and (listp face)
                                 (memq 'font-lock-string-face face))
                            (eq 'font-lock-string-face face))
                    (throw 'found t)))))))
        ;; Should we introduce a lowlight face for this?
        ;; Ideally that would retain the color, dimmed.
        (1 'font-lock-comment-face prepend)
        (3 'bold prepend)
        (4 font-lock-type-face prepend t))



Moreover I don't think that anything is "broken" in the following:

       ;; Underline innermost grouping, so that you can more easily see what
       ;; belongs together.  2005-05-12: Font-lock can go into an
       ;; unbreakable endless loop on this -- something's broken.
       
;;("[\\][\\][(]\\(?:\\?:\\)?\\(\\(?:[^\\\"]+\\|[\\]\\(?:[^\\]\\|[\\][^(]\\)\\)+?\\)[\\][\\][)]"
         ;;1 'underline prepend)

I believe that `font-lock-fontify-keywords-region' starts backtracking
and this can take hours in more complicated cases.  Anyway, regexps are
not suited to handle this.  If you are willing to pay for two additional
buffer-local variables such as

(defvar regexp-left-paren nil
  "Position of innermost unmatched \"\\\\(\".
The value of this variable is valid iff `regexp-left-paren-end' equals the upper
bound of the region `font-lock-fontify-keywords-region' currently 
investigates.")
(make-variable-buffer-local 'regexp-left-paren)

(defvar regexp-left-paren-end 0
  "Buffer position indicating whether the value of `regexp-left-paren' is valid.
If the value of this variable equals the value of the upper bound of the region
investigated by `font-lock-fontify-keywords-region' the current value of
`regexp-left-paren' is valid.")
(make-variable-buffer-local 'regexp-left-paren-end)

the following modification of the above lambda expression should handle
this problem:

       ((lambda (bound)
          (catch 'found
            (while (re-search-forward
                    
"\\(\\\\\\\\\\)\\(?:\\(\\\\\\\\\\)\\|\\(\\((\\)\\|\\(|\\)\\|\\()\\)\\)\\)" 
bound t)
              (when (match-beginning 3)
                (let ((face (get-text-property (1- (point)) 'face))
                      match-data-length)
                  (when (or (and (listp face)
                                 (memq 'font-lock-string-face face))
                            (eq 'font-lock-string-face face))
                    (cond
                     ((match-beginning 4) ; \\(
                      (setq regexp-left-paren (match-end 4))
                      (setq regexp-left-paren-end bound)
                      (set-match-data
                       (append (butlast (match-data) 2)
                               (list (point-min-marker) (point-min-marker)))))
                     ((match-beginning 5) ; \\|
                      (set-match-data
                       (append (butlast (match-data) 4)
                               (list (point-min-marker) (point-min-marker)))))
                     ((match-beginning 6) ; \\)
                      (set-match-data
                       (append (butlast (match-data) 6)
                               (if (= regexp-left-paren-end bound)
                                   (list (copy-marker regexp-left-paren) 
(match-beginning 6))
                                 (list (point-min-marker) (point-min-marker)))))
                      (setq regexp-left-paren nil)
                      (setq regexp-left-paren-end 0)))
                    (throw 'found t)))))))
        ;; Should we introduce a lowlight face for this?
        ;; Ideally that would retain the color, dimmed.
        (1 'font-lock-comment-face prepend)
        (3 'bold prepend)
        (4 'underline prepend))

I have tried this on some elisp files which had the original solution
choke and did not encounter any problems.  Note that I removed the
"\\(\\?:\\)?" since I find it distracting to put yet another face here.
If you believe that you _really_ need it you will have to reinsert it,
but in that case you have to modify match-data cropping as well.  (I do
have to modify match-data since redisplay wants some valid buffer
positions for highlighting.)



Finally, I would use three distinct font-lock faces for regexps:

- One face for highlighting the "\\"s which by default should inherit
  from `font-lock-string-face' with a dimmed foreground - I'm using
  Green4 for strings and PaleGreen3 for the "\\"s.  Anyone who doesn't
  like the highlighting could revert to `font-lock-string-face'.

- One face for highlighting the "(", "|" and ")" in these expressions.
  I find `bold' good here but again would leave it to the user whether
  she wants to turn off highlighting this.  Moreover, such a face could
  allow paren-highlighting to _never_ match a paren with that face with
  a paren with another face.  Consequently, paren-matching could finally
  provide more trustable information within regular expressions.

- One face for highlighting the innermost grouping.  Basically,
  `underline' is not bad here but appears a bit noisy in multiline
  expressions or things like

  (concat "\\("
          some-string
          "\\)")

  I'm using a background which is slightly darker than the default
  background and gives regular expressions a very distinguished
  appearance.  Anyway, users should be allowed to turn highlighting off
  by using the default face.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]