bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes


From: Mattias Engdegård
Subject: bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes
Date: Fri, 23 Oct 2020 19:31:47 +0200

23 okt. 2020 kl. 16.44 skrev Eli Zaretskii <eliz@gnu.org>:

> There's nothing special in the text "\303" that says it must be an
> octal escape.  They are just 4 ASCII characters.

The grammar uses the name 'c-string', so it is reasonable to assume that most 
of the lexical conventions of C strings are obeyed.

Perhaps you mean that \ooo can occur outside c-string productions? If so, 
please say where you have seen it, or have evidence of it being produced.

The possibilities are limited. For example, stream records may use an unquoted 
(newline-terminated) string in place of a c-string, but I haven't seen any 
evidence of this in practice and it appears that gdb-mi.el does not handle that 
case either.

The only other possibility in the grammar would be inside 'variable' 
productions (field keys) which are unquoted, but those only come from a small 
set of fixed names.

To be clear: the exact encoding of non-ASCII bytes (whether present literally 
or as octal escapes in c-string tokens) is unclear, and I do not attempt to 
solve that problem here and now. This is about more fundamental parsing and 
lexing problems.

Namely: handling octal escapes in gdb-gdbmi-marker-filter is doing it at the 
wrong level. Moreover, this substitution, when performed, is not correct since 
it ignores the context; the 4-byte (excluding quotes) string "\\377" then 
appears to the user as the 2-byte string "\\377", where the '\377' sequence is 
painted in a distinct colour and really is the raw byte 0xff, and thus not a 
valid C character escape sequence.

Finally, the JSON mess is evidently the wrong way to go since it does not take 
care of strings properly -- and heavens know what else, since 
gdb-jsonify-buffer works at the wrong level (a pattern here) by doing regexp 
replacement on the whole text prior to parsing.

> That doesn't really answer my question, though, about the use case
> that causes such a string to be in the program.  Without a use case, I
> could tell you to set gdb-mi-decode-strings non-nil and be done with
> it.

Why a string is in a program is irrelevant; the user does not necessarily know 
that. If Emacs says that a string contains six decimal digits when it really 
just contains two (nonzero) bytes, then that is a lie no matter what.

> Well, I know that several possible ways exist, but each one of them
> loses in some situations.  You say "the code receiving the parse tree
> could decide", but will that code have information to make that
> decision correctly?  And if you must decide in the parser, how would
> you suggest to make the decision to avoid making incorrect decisions?

Again that problem is outside the scope of this bug but I think we can agree 
that it is easier, or at least no more difficult, to make a correct decision 
knowing the context of the string than not.

Let's see what a value/result parser can do and work from there.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]