emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "Raw" string literals for elisp


From: Alan Mackenzie
Subject: Re: "Raw" string literals for elisp
Date: Wed, 8 Sep 2021 11:30:35 +0000

Hello, Anna.

Just as a matter of context, I implemented C++ raw strings, and recently
enhanced the code also to handle other CC Mode derived languages such as
C# and Vala.

On Tue, Sep 07, 2021 at 21:49:33 -0400, Anna Glasgall wrote:
> [My previous message appears to have been eaten, or at least it's not
> showing up in the archive; resending from a different From: address.
> Apologies for any duplication]

> Hello Emacs developers,

> I've long been annoyed by the number of backslashes needed when using
> string literals in elisp for certain things (regexes, UNC paths, etc),
> so I started work on a patch (WIP attached) to implement support for
> "raw" string literals, a la Python r-strings. These are string literals
> that work exactly like normal string literals, with the exception that
> backslash escapes (except for \") are not processed; \ may freely
> appear in the string without need to escape. I've made good progress,
> but unfortunately I've run into a roadblock and am not sure what to do
> next.

One not so small point.  How do you put a backslash as the _last_
character in a raw string?

If this is difficult, it may well be worth comparing other languages
with raw strings.  C++ Mode has a complicated system of identifiers at
each end of the raw string (I'm sure you know this).  C# represents a "
inside a multi-line string as "".  Vala (and, I believe, Python) have
triple quote delimters """ and cannot represent three quotes in a row
inside the multi-line string.

It is probably worth while stating explicitly that Elisp raw strings can
be continued across line breaks without having to escape the \n.

> I've successfully taught the elisp reader (read1 in lread.c) how to
> read r-strings. I thought I had managed to make lisp-mode/elisp-mode
> happy by allowing "r" to be a prefix character (C-x C-e and the
> underlying forward-sexp/backward-sexp seemed to work fine at first),
> but realized that I ran into trouble with strings containing the
> sequence of characters '\\"'.

> The reader correctly reads r"a\\"" as a string containing the sequence
> of characters 'a', '\', '"', and M-: works. Unfortunately, if I try
> sexp-based navigation or e.g. C-x C-e, it falls apart. The parser in
> syntax.c, which afaict is what lisp-mode is using to try and find sexps
> in buffer text, doesn't seem to know what to do with this expression.
> I've spent some time staring at syntax.c, but I must confess that I'm
> entirely defeated in terms of what changes need to be made here to
> teach this other parser about prefixed strings in where the prefix has
> meaning that affects the interpretation of the characters between
> string fences.

You probably want to use syntax-table text properties.  See the page
"Syntax Properties" in the Elisp manual.  In short, you would put, say,
a "punctuation" property on most backslashes to nullify their normal
action.  Possibly, you might want such a property on a double quote
inside the string.  You might also want a property on the linefeeds
inside a raw string.  With these properties, C-M-n and friends will work
properly.

Bear in mind that you will also need to apply and remove these
properties as the user changes the Lisp text, for example by removing a
\ before a ".  There is an established mechanism in Emacs for this sort
of action (which CC Mode doesn't use) which I would advise you to use.

> I've attached a copy of my WIP patch; it's definitely not near final
> code quality and doesn't have documentation yet, all of which I would
> take care of before submitting for inclusion. I also haven't filled out
> the copyright assignment paperwork yet, but should this work reach a
> point where it was likely to be accepted, I'd be happy to do that.

Thanks!

> I'd very much appreciate some pointers on what to try next here, or
> some explanation of how syntax.c/syntax.el works beyond what's in the
> reference manual. If this is a fool's errand I'm tilting at here, I'd
> also appreciate being told that before I sink more time into it :)

It is definitely NOT a fool's errand.  There may be some resistance to
the idea of raw strings from traditionalists, but I hope not.  It would
be worth your while really to understand the section in the Elisp manual
on syntax and all the things it can (and can't) do.

Help is always available on emacs-devel.

You're going to have quite a bit of Lisp programming to do.  For
example, font-lock needs to be taught how to fontify a raw string.

But at the end of the exercise, you will have learnt so much about Emacs
that you will qualify as a fully fledged contributor.  :-)

> thanks,

> Anna Glasgall

-- 
Alan Mackenzie (Nuremberg, Germany).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]