[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Scan of regexp mistakes
From: |
Paul Eggert |
Subject: |
Scan of regexp mistakes |
Date: |
Fri, 8 Mar 2019 09:13:04 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 |
On 3/5/19 7:06 AM, Mattias Engdegård wrote:
>
> I can run it periodically but would surely forget. Should I put the trawler
> in the Emacs source tree (if so, where?), in ELPA, or elsewhere?
Stefan mentioned one possibility. Though even then I daresay it'd be
helpful if you ran it periodically, just as I periodically run
admin/merge-gnulib. (If you don't run it, it's likely nobody else will....)
> - (re-search-forward "^\\(\s+=+\s?+\\)+\n")
> + (re-search-forward "^\\(\s+=+\s+\\)+\n")
> ^^^
> Are you sure this shouldn't be `\s*'
No, good point. I'll change it to that. See attached patch.
> - "[a-z0-9$%(*-=?[_][^<>\")!;:,{}\n\t @]*"
> + "[a-z$%(*-=?[_][^<>\")!;:,{}\n\t @]*"
>
> You kept the rather odd range `*-=' which comprises `*+,-./0123456789:;<='.
> Is it supposed to be that way?
Goodness knows what is intended here, as this is some ad hoc variant of
RFC 5322/2822/822 and I don't know which variant. Might as well spell
out the range in a more-conventional way, though. The attached patch
replaces *-= with *+./= (as ,:;< aren't allowed unquoted in RFC 5322
atoms) and puts - and 0-9 elsewhere; this should be closer to what was
wanted and should be clearer anyway. I was unable to track down whatever
suggestion was made by Felix Wiemann long ago, and so removed that
comment (since the regexp no longer matches his suggestion anyway).
> - (while (re-search-forward "\\ce[»\\.\\?]\\|«\\ce" nil t)
> + (while (re-search-forward "\\ce[»\\.?]\\|«\\ce" nil t)
>
> Should `\' really be kept in the set of characters? It looks like it was only
> included as an attempt to escape `.' and `?'.
Yes, probably. Fixed in the attached.
> searching for A-z uncovers more suspect regexps, some of which aren't found
> by the trawler.
I wonder where those all came from? I attempted to fix them in the attached.
> Here is another one in the same file (line 33), but that wasn't found by the
> trawler:
>
> (replace-regexp-in-string "[\000-\032\177<>#%\"{}|\\^[]`%?;]"
>
> That \032 doesn't look right (number base confusion?), and it looks like it's
> meant as a single character alternative but it isn't, given the misplaced `]'.
The regexp has other troubles. It doesn't include !$'()*+,/:@&= (all of
which are reserved characters according to RFC 3986), and it has
duplicate %. The attached patch fixes the % and puts in a FIXME about
the other chars.
> diff --git a/lisp/org/org-mobile.el b/lisp/org/org-mobile.el
> index 1ff6358403..83dcc7b0d1 100644
> --- a/lisp/org/org-mobile.el
> +++ b/lisp/org/org-mobile.el
> @@ -845,11 +845,11 @@ If BEG and END are given, only do this in that region."
> (cl-incf cnt-error)
> (throw 'next t))
> (move-marker bos-marker (point))
> - (if (re-search-forward "^** Old value[ \t]*$" eos t)
> + (if (re-search-forward "^\\*\\* Old value[ \t]*$" eos t)
>
> Shouldn't this start with "^\\**", or does it have to be exactly two
> asterisks?
>
> (setq old (buffer-substring
> (1+ (match-end 0))
> (progn (outline-next-heading) (point)))))
> - (if (re-search-forward "^** New value[ \t]*$" eos t)
> + (if (re-search-forward "^\\*\\* New value[ \t]*$" eos t)
>
> Idem.
"\\**" would be safer, yes. Fixed in the attached.
> - ((string-match "^//\\(.?*\\)/\\(<.*>\\)$" path)
> + ((string-match "^//\\(.*\\)/\\(<.*>\\)$" path)
>
> Another repetition-of-repetition. Sure it shouldn't be `*?' instead? It looks
> likely, since there is a `/' following that would be eaten by the `.*' given
> half a chance.
The comment on the next line says "Planner has the id after the final
slash", which implies that the first .* should indeed be greedy.
>
> diff --git a/lisp/progmodes/fortran.el b/lisp/progmodes/fortran.el
> index be272c0922..c1a267f4c5 100644
> --- a/lisp/progmodes/fortran.el
> +++ b/lisp/progmodes/fortran.el
> @@ -2052,7 +2052,7 @@ If ALL is nil, only match comments that start in column
> > 0."
> (when (<= (point) bos)
> (move-to-column (1+ fill-column))
> ;; What is this doing???
> - (or (re-search-forward "[\t\n,'+-/*)=]" eol t)
> + (or (re-search-forward "[-\t\n,'+./*)=]" eol t)
>
> Where did the . come from? Don't you think that `+-/*' were meant to include
> those four symbols only?
I couldn't figure out what the code was doing (note the comment...) so
decided to preserve the semantics of the old regexp. But you're right,
"." is likely not intended there. I removed it in the attached.
> ;; Regexp bug in XEmacs disallows ][ inside [], and wants + last
> -
> "\\s-*\\.\\(\\(address@hidden|---]\\|[][]\\|\\\\[()|]\\)+\\)\\s-*(\\(.*\\))\\s-*\\(,\\|)\\s-*;\\)")
> +
> "\\s-*\\.\\(\\(address@hidden|[][]\\|\\\\[()|]\\)+\\)\\s-*(\\(.*\\))\\s-*\\(,\\|)\\s-*;\\)")
> (setq rep (match-string-no-properties 3))
> (goto-char (match-end 0))
> (setq tpl-wild-list
>
> Are you sure that | shouldn't be there too? Or is this some kind of XEmacs
> idiom?
>
You're right. Wilson Snyder later stepped in and fixed that string. Nice
to have a real expert in the house.
0001-More-regexp-corrections-and-tweaks.patch
Description: Text Data
- Scan of regexp mistakes, Mattias Engdegård, 2019/03/03
- Re: Scan of regexp mistakes, Paul Eggert, 2019/03/04
- Re: Scan of regexp mistakes, Clément Pit-Claudel, 2019/03/04
- Re: Scan of regexp mistakes, Mattias Engdegård, 2019/03/05
- Re: Scan of regexp mistakes, Mattias Engdegård, 2019/03/09
- Re: Scan of regexp mistakes, Paul Eggert, 2019/03/09
- Re: Scan of regexp mistakes, Mattias Engdegård, 2019/03/09