bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes


From: Eli Zaretskii
Subject: bug#44173: 28.0.50; gdb-mi mangles strings with octal escapes
Date: Fri, 23 Oct 2020 17:44:39 +0300

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Fri, 23 Oct 2020 16:21:55 +0200
> Cc: 44173@debbugs.gnu.org
> 
> 23 okt. 2020 kl. 15.19 skrev Eli Zaretskii <eliz@gnu.org>:
> 
> > The basic ambiguity, AFAIR, is what is described last here: a string
> > reported bu GDB could include literal \nnn sequences, which are not
> > non-ASCII characters that GDB/MI converts to octal escapes.  The
> > information which was which is lost once we receive the GDB/MI output.
> 
> So you mean that GDB would produce the value "\303" that does not
> stand for a string containing the single byte octal 303?

Yes.

> When does this occur?

There's nothing special in the text "\303" that says it must be an
octal escape.  They are just 4 ASCII characters.

Moreover, even if it is an escape, it is not clear how to interpret
it: as a raw byte or as a character encoded in some 8-bit encoding.
As you probably know, GDB has settings that control how strings it
reports are encoded, so the same string can be reported in different
forms.

> > AFAIU, this bug's root cause is the way we solved the ambiguity, which
> > basically assumes one of the possible interpretations should be
> > preferred to another, because it is more popular/useful.
> 
> Then we disagree. The code doesn't do the right thing if gdb-mi-decode-string 
> is nil, unless you by 'ambiguity' mean that GDB sometimes inserts a spurious 
> backslash that should be ignored. When gdb-mi-decode-string is non-nil, it is 
> sometimes wrong as well.

The ambiguity is whether gdb-mi-decode-strings should be nil or
non-nil.  We have it nil by default because non-ASCII strings and file
names are rare, but when you are debugging a program that uses such
strings, you had better set it non-nil.  And when some of your
program's source file use non-ASCII characters, you _must_ set it
non-nil, otherwise "M-x gdb" will not find the source files it needs
to visit as you step through the program.

> > Let me turn the table and ask you how did you get that string you show
> > in the original report?
> 
> A program in the C language containing the local declaration
> 
>   char *s = "\303\266";
> 
> produces nonsense in the 'Locals' window when debugged. It doesn't matter 
> what the string means; I would have been happy with gdb/emacs interpreting it 
> as utf-8, latin-1 or just raw bytes presented in octal or hex.

That doesn't really answer my question, though, about the use case
that causes such a string to be in the program.  Without a use case, I
could tell you to set gdb-mi-decode-strings non-nil and be done with
it.

> > And what will then happen to non-ASCII strings and file names reported
> > by GDB?  How will our parser solve that?
> 
> The parser can either leave the strings as undecoded unibyte strings -- that 
> is, "\303\266" would be a 2-char string -- or decode them according to 
> gdb-mi-decode-strings, in which case it might become a 1-char multibyte 
> string. In the former case, the code receiving the parse tree could decide 
> what to do with the strings and how to display them, perhaps on a 
> case-by-case basis.

Well, I know that several possible ways exist, but each one of them
loses in some situations.  You say "the code receiving the parse tree
could decide", but will that code have information to make that
decision correctly?  And if you must decide in the parser, how would
you suggest to make the decision to avoid making incorrect decisions?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]