On 26 Sep 2021, at 06:49, Werner LEMBERG <wl@gnu.org> wrote:
The idea here is different, it is for identifiers, and in the
input syntax only, does not change the internal semantics at all.
It is good not having to type backslash when a command is used.
Really? I highly doubt that. In particular, what about lyrics
mode?
The idea would be to change the file lexer.ll by adding U and
UCOMMAND:
A [a-zA-Z\200-\377]
U [\200-\377]
AA {A}|_
N [0-9]
ANY_CHAR (.|\n)
SYMBOL {A}([-_]{A}|{A})*
COMMAND \\{SYMBOL}
UCOMMAND {U}{SYMBOL}
Then in select places, that is context switches, add {UCOMMAND}:
{COMMAND} {
return scan_escaped_word (YYText_utf8 () + 1);
}
{UCOMMAND} {
return scan_escaped_word (YYText_utf8 ());
}
You might provide a MR, maybe it gets accepted. I still doubt that it
would be a good idea.
There is a conflict in some contexts between {SYMBOL} and {COMMAND}, so may not
work. To get a regular COMMAND syntax, they should start with something that
SYMBOL does not.
Otherwise you might replace the function YYText_utf8 with proper UTF-8
patterns, a variation of:
/* UTF-8 character with valid Unicode code point. */
utf8char
[\x09\x0A\x0D\x20-\x7E]|[\xC2-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|[\xE1-\xEC\xEE\xEF]([\x80-\xBF]{2})|\xED[\x80-\x9F][\x80-\xBF]|\xF0[\x\90-\xBF]([\x80-\xBF]{2})|[\xF1-\xF3]([\x80-\xBF]{3})|\xF4[\x80-\x8F]([\x80-\xBF]{2})