emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Interpret #r"..." as a raw string


From: Daniel Brooks
Subject: Re: [PATCH] Interpret #r"..." as a raw string
Date: Fri, 26 Feb 2021 16:39:05 -0800
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Sat, 27 Feb 2021 03:18:57 +0900 (JST)
>> From: Naoya Yamashita <conao3@gmail.com>
>> 
>> I write a patch to allow Emacs reader interpret raw string.
>
> What is a "raw string", and how does it differ from regular Lisp
> strings?
>
> Thanks.

Many languages have multiple string types because they simplify the
process of writing strings that contain quotation characters,
backslashes, or other syntax such as interpolation.

Think of sh, where double–quoted strings allow substitutions, while
single–quoted strings do not. The single–quoted strings are similar to
raw strings. Or Perl, where similar but more complex rules apply,
including strings that look like q{foo} and can be delimited by any
punctuation characters. Or Raku, which allows unicode punctuation as
delimiters such as q«foo». Or Rust, where r"foo" is a raw string that
can be delimited not just by double quotes, but also double quotes plus
an arbitrary number of # characters.

For example, suppose I am writing a shell script and I want to print out
an html anchor:

    echo "<a href=\"https://example.com/\";>click here for an example</a>"

vs:

    echo '<a href="https://example.com/";>click here for an example</a>'

The single–quoted string is nicer because I don’t have to escape the
quotes. Of course, HTML also allows me to use single quotes in place of
double quotes (and with no change of the semantics of the HTML), so
changing them would also be an option. Perhaps an even better example
would be a shell script that emits elisp, where strings must be
double–quoted.

Of course the primary difference between single– and double–quoted
strings in Shell and Perl is interpolation, rather than escape
characters. In Raku this is extended so that there are half a dozen
different features that can be independently turned on or off for any
given quoted item. Q"foo" is a raw string. q"foo" adds the backslash
escape mechanism for concisely representing various characters such as
tabs, newlines, and so on. qq"foo" adds interpolation on top of
escaping. qw"foo bar" and qqw"foo bar" add word splitting, so that you
get not a single string but a list of the words in the string. qx"foo"
is like the backtick syntax in Shell; it runs the quoted item in a
subshell. qqx"foo" does interpolation on it before running it in the
subshell. Heredocs allow for multiline strings. All of these forms allow
you to use arbitrary punctuation characters as delimiters. Then there is
a whole thing with adjectives where you can pick and choose those
features using an even more uniform syntax. And finally regexes are yet
more fun on top of all of that. Raku even has an unquoting mechanism
that is rather similar to the lisp unquote; it allows the nesting of
different string types.

Most languages don’t go to this extreme, but in languages that have raw
strings they are a way to turn off complicated features that you don’t
want to use in every instance.

As written, Naoya’s raw string patch allows the user to turn off string
escaping, but not to chose alternative delimiters (which has little or
no precedent in elisp) or to turn off string interpolation (which isn’t
built in to the elisp syntax, but is instead implemented by library
functions such as format.)

Naoya, your patch looks fairly good to my unpractised eye, but you might
consider adding an error message for malformed expressions such as
#r'foo', where the character after the r isn’t a double quote character.

Probably best to start thinking about how to document the syntax in the
elisp manual too.

Personally, I quite like the idea. Raw strings are useful for a lot more
than just regular expressions.

db48x



reply via email to

[Prev in Thread] Current Thread [Next in Thread]