[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Corrupted multibyte characters in command substitutions fixes may be
From: |
Frank Heckenbach |
Subject: |
Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem. |
Date: |
Mon, 07 Feb 2022 12:21:09 +0100 |
> In the case of bash with environment having LC_CTYPE: C.UTF-8 or
> en_US.UTF-8
> read:
> 0xC3 (len=1) i.e. Ã ('A' w/tilde in a legacy 8-bit latin-compatible
> charset),
> but invalid if bash processes the environment setting of en_US.UTF-8.
>
> Should bash process it as legacy input or invalid UTF8?
> Either way, what should it return? a UTF-8 char
> (hex 0xc30x83) transcoded from the latin value of A-tilde, or
> keep the binary value the same (return 0x83),
> should it return a warning message? If it does, should
> it return NUL for the returned value because the input was erroneous?
Assuming Latin-1 when nothing in the environment points to it seems
questionable. It might just as well be a Cyrillic character in
ISO-8859-5 or whatever.
Email filters were mentioned. Emails may use charsets different from
the current environment -- even several different ones within a mail
(I've sent such mails myself). So if bash were to "fix" input
depending on the environment, even writing a pass-through filter
would require parsing the Content-Type headers and changing the
environment accordingly (or else, use an 8-bit clean charset
throughout).
So I don't think bash should change the input (unintentionally as
with the original bug or intentionally as discussed here) unless and
until it needs to do charset-dependent operations
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., L A Walsh, 2022/02/05
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Frank Heckenbach, 2022/02/06
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Chet Ramey, 2022/02/06
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Alex fxmbsw7 Ratchev, 2022/02/06
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Greg Wooledge, 2022/02/06
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Alex fxmbsw7 Ratchev, 2022/02/06
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Lawrence Velázquez, 2022/02/07
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Alex fxmbsw7 Ratchev, 2022/02/07
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Lawrence Velázquez, 2022/02/07
- Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem., Alex fxmbsw7 Ratchev, 2022/02/07