emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "Raw" string literals for elisp


From: Anna Glasgall
Subject: Re: "Raw" string literals for elisp
Date: Wed, 08 Sep 2021 10:27:17 -0400
User-agent: Evolution 3.40.0-1

On Wed, 2021-09-08 at 11:30 +0000, Alan Mackenzie wrote:
> Hello, Anna.
> 
> Just as a matter of context, I implemented C++ raw strings, and
> recently
> enhanced the code also to handle other CC Mode derived languages such
> as
> C# and Vala.
> 

Great, I'll definitely take a look at that.

> On Tue, Sep 07, 2021 at 21:49:33 -0400, Anna Glasgall wrote:
> > [My previous message appears to have been eaten, or at least it's
> > not
> > showing up in the archive; resending from a different From:
> > address.
> > Apologies for any duplication]
> 
> > Hello Emacs developers,
> 
> > I've long been annoyed by the number of backslashes needed when
> > using
> > string literals in elisp for certain things (regexes, UNC paths,
> > etc),
> > so I started work on a patch (WIP attached) to implement support
> > for
> > "raw" string literals, a la Python r-strings. These are string
> > literals
> > that work exactly like normal string literals, with the exception
> > that
> > backslash escapes (except for \") are not processed; \ may freely
> > appear in the string without need to escape. I've made good
> > progress,
> > but unfortunately I've run into a roadblock and am not sure what to
> > do
> > next.
> 
> One not so small point.  How do you put a backslash as the _last_
> character in a raw string?

That is an excellent question. I'll need to take a look at how some
other languages handle that :/

Thanks for giving me another test case!

> 
> If this is difficult, it may well be worth comparing other languages
> with raw strings.  C++ Mode has a complicated system of identifiers
> at
> each end of the raw string (I'm sure you know this).  C# represents a
> "
> inside a multi-line string as "".  Vala (and, I believe, Python) have
> triple quote delimters """ and cannot represent three quotes in a row
> inside the multi-line string.
> 
> It is probably worth while stating explicitly that Elisp raw strings
> can
> be continued across line breaks without having to escape the \n.
> 
> > I've successfully taught the elisp reader (read1 in lread.c) how to
> > read r-strings. I thought I had managed to make lisp-mode/elisp-
> > mode
> > happy by allowing "r" to be a prefix character (C-x C-e and the
> > underlying forward-sexp/backward-sexp seemed to work fine at
> > first),
> > but realized that I ran into trouble with strings containing the
> > sequence of characters '\\"'.
> 
> > The reader correctly reads r"a\\"" as a string containing the
> > sequence
> > of characters 'a', '\', '"', and M-: works. Unfortunately, if I try
> > sexp-based navigation or e.g. C-x C-e, it falls apart. The parser
> > in
> > syntax.c, which afaict is what lisp-mode is using to try and find
> > sexps
> > in buffer text, doesn't seem to know what to do with this
> > expression.
> > I've spent some time staring at syntax.c, but I must confess that
> > I'm
> > entirely defeated in terms of what changes need to be made here to
> > teach this other parser about prefixed strings in where the prefix
> > has
> > meaning that affects the interpretation of the characters between
> > string fences.
> 
> You probably want to use syntax-table text properties.  See the page
> "Syntax Properties" in the Elisp manual.  In short, you would put,
> say,
> a "punctuation" property on most backslashes to nullify their normal
> action.  Possibly, you might want such a property on a double quote
> inside the string.  You might also want a property on the linefeeds
> inside a raw string.  With these properties, C-M-n and friends will
> work
> properly.
> 
> Bear in mind that you will also need to apply and remove these
> properties as the user changes the Lisp text, for example by removing
> a
> \ before a ".  There is an established mechanism in Emacs for this
> sort
> of action (which CC Mode doesn't use) which I would advise you to
> use.
> 

It was unclear to me how much additional processing during typing would
be acceptable here as opposed to just running the existing C code.
Hopefully native compilation support will to some extent nullify any
penalty from adding additional logic in Lisp here?

> > I've attached a copy of my WIP patch; it's definitely not near
> > final
> > code quality and doesn't have documentation yet, all of which I
> > would
> > take care of before submitting for inclusion. I also haven't filled
> > out
> > the copyright assignment paperwork yet, but should this work reach
> > a
> > point where it was likely to be accepted, I'd be happy to do that.
> 
> Thanks!
> 
> > I'd very much appreciate some pointers on what to try next here, or
> > some explanation of how syntax.c/syntax.el works beyond what's in
> > the
> > reference manual. If this is a fool's errand I'm tilting at here,
> > I'd
> > also appreciate being told that before I sink more time into it :)
> 
> It is definitely NOT a fool's errand.  There may be some resistance
> to
> the idea of raw strings from traditionalists, but I hope not.  It
> would
> be worth your while really to understand the section in the Elisp
> manual
> on syntax and all the things it can (and can't) do.
> 
> Help is always available on emacs-devel.
> 
> You're going to have quite a bit of Lisp programming to do.  For
> example, font-lock needs to be taught how to fontify a raw string.
> 

I am already moderately familiar with writing elisp at this point, but
yes, I still have a lot to learn :)

> But at the end of the exercise, you will have learnt so much about
> Emacs
> that you will qualify as a fully fledged contributor.  :-)
> 

thanks,

Anna


> > thanks,
> 
> > Anna Glasgall
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]