bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #65053] spurious "hexadecimal escape sequence out of range" warning


From: Vaclav Slavik
Subject: [bug #65053] spurious "hexadecimal escape sequence out of range" warnings
Date: Sun, 31 Dec 2023 07:42:20 -0500 (EST)

Follow-up Comment #4, bug#65053 (group gettext):


> This is why I cannot ignore the problem with FreeBSD and Solaris.

You're obviously the best authority on what your past self meant, but are you
sure we're talking about the same change? For reference, this is the commit:
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=commit;h=0c0345632aedfb254b69f72cce268728113edf2e
(and subsequent analog for Vala:
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=commit;h=dd9a098416034695ad3aed8c22631b9ffaa82d58).

The commit seems very clear that it really is only about actual overflow. The
warning message says that. It uses the same wording as a C compiler for actual
overflow. The commit message says so (with some extra explanation). The code
in phase7_getc() clearly does that, it compares to 0x100. There's no
indication anywhere that it is related to portability issues with Solaris or
FreeBSD, and every indication that it is about 8-bit character overflow.

By your explanation, the code should warn about any non-ASCII \x sequences,
i.e. >= 0x80 as they too are locale-dependent on these platforms. But it
doesn't do that and happily accepts them. That seems inconsistent with the
portability check purpose as well.

I expected this to be a simple case of phase7_getc() being ignorant of wide
characters and just having to use a value different from 0x100 for the check.
I mean, the relevant invocation of the function even says so ("We could worry
about the 'L' before wide character constants, but ignoring it has no effect
unless one of the keywords is "L".") and it was true that it didn't matter
before 0c0345632aedfb254b69f72cce268728113edf2e, but does now.

Also, it warns on _character_ literals which are never going to by passed to a
gettext function, and are therefore non-issue for portability (by way of
sharing common parsing code).

> Maybe I should change the warning "hexadecimal escape sequence out of range"
to "hexadecimal escape in wide-char literal is unsupported; use \u instead of
\x if you meant to designate a Unicode character" ?

If you think this check is correct and should stay as-is, then I'd say yes,
the message definitely could use clarity, because as it is now, it is very
misleading. A bogus warning with an obvious workaround is mere annoyance; one
with no obvious path to silence it, which looks like a bug to a user, is much
worse.

But doing what you suggest is going to be non-trivial and probably not worth
the effort: the parser doesn't differentiate narrow and wide literals; it does
correctly warn about narrow-char overflows and arguably that's what it should
continue to say there; it also accepts some \x escapes in wide-char literals
(although that's arguably OK, it doesn't have to warn about every
instance...).

I suppose the minimal clarifying change would be something like "hexadecimal
escape sequence out of range; use \u in wide-char literal if you meant to
designate a Unicode character"? Though you could rightfully complain that such
wording steers towards using wchar_t...


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?65053>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]