[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Regexps and strings once again

From: Lars Magne Ingebrigtsen
Subject: Regexps and strings once again
Date: Mon, 15 Sep 2014 01:27:51 +0200
User-agent: Gnus/5.130012 (Ma Gnus v0.12) Emacs/24.4.50 (gnu/linux)

(Skip to 1) if you're not interested in why I started thinking about
this now.)

I was just fiddling around with a DOM traversal library (i.e., "document
object model", or something -- HTML traversal, like), and it has
functions for finding nodes by various criteria, like IDs.  So there are
functions like `dom-by-id' that take a DOM fragment and an ID and
returns the matching nodes.

I wrote the function as taking a regexp.  And I find what I'm doing
wrong 90% of the time when using it is that I expect an exact match, but
instead I'm getting all matching nodes.

This reminded me of this pretty general problem once again.  We have
oodles of functions in Emacs that does matching either on exact(ish)
strings, or regexps, and then we have an optional parameter that says
whether we want to interpret the string as an exact string or a

It's kinda annoying, especially when the function defaults to the
interpretation you don't want.  And you have to remember which optional
parameter you're supposed to set.

So:  Here's yet another suggestion for how to deal with regexps in a
more general way in Emacs.  Or rather two.

1) New Special Syntax

A while ago, there was some suggestion about introducing a special
syntax for string literals, and it didn't really go anywhere, because
introducing a new syntax to Emacs is kinda a big deal.  But let's just
suggest it anyway:

(dom-by-id dom #/I (can)?haz new syntax/)

And see!  Perl Regexp syntax as well!  No more backslashitis!

Anyway, I assume that everybody would want this, but that it's too much
work for anybody to actually commit to.

2) Cheat; i.e., introduce a convention

What if we just mark a string as a regexp?

(dom-by-id dom (regexp "I \\(couldn't\\)?haz new syntax"))

It would basically just put a text property on the string, and functions
like `dom-by-id' would just do

(if (regexp-p match)
    (string-match match id)
  (string= match id))

Of course, both `regexp' and the proposed new syntax could compile the
regexp and return a regexp object and stuff if we wanted to be more
efficient...  But the regexp cache is already quite efficient, isn't it?

(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no

reply via email to

[Prev in Thread] Current Thread [Next in Thread]