bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Corrupted multibyte characters in command substitutions fixes may be


From: Alex fxmbsw7 Ratchev
Subject: Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem.
Date: Sun, 6 Feb 2022 23:11:43 +0100

i just have a small question here
the dropping of null bytes is no friend of me and i understand you're
there to skip it instead of process, which results in null bytes gone
which is not much of an use

can't these \0 bytes be encoded at least when a utf8 locale is used as
\u0 instead of dropping ? <the two utf 8 bytes> and a null, ... just
prefix the utf 8 encoding chars to the null
and they'd be safely maybe still here

just asking..

On Sun, Feb 6, 2022 at 6:38 PM Chet Ramey <chet.ramey@case.edu> wrote:
>
> On 2/5/22 9:41 PM, L A Walsh wrote:
>
> > That's debatable, BTW, as I was reminded of a similar
> > passthrough of what one might call 'invalid input' w/o warning,
> > resulting in code that worked in a specific circumstance to a change
> > in bash issuing a warning that resulted in breaking code, that, at that
> > point, worked as expected.
>
> Memory is a tricky thing. This statement -- you've made it twice -- got me
> wondering what you might be referring to, so I went digging.
>
>
> > Specifically, it involved reading a value typically in the range
> > 50 <=x <=150 from an active file (like a value from /proc that varies
> > based on OS internal values) where the data was stored in a
> > quad, or Little-Endian DWORD value, so the value was in the the
> > 2 least significant bytes with the most significant bytes following
> > (in a higher position) in memory, like:
> > Byte# => 00 01 02 03, for value 100 decimal:
> > hex   => 64 00 00 00
> >
> > The working code expected to see 0x64 followed by 0x00 which it
> > used as string terminator. >
> > Chet "fixed" this silent use of 0x00 as a string terminator to no longer
> > ignore it, but have bash issue a warning message, which caused the
> > "read < fn" to fail and return 0 instead of the ascii character 'd', which
> > the program had interpret as the DPI value of the user's screen.
>
> So it seems like you've conflated two different things. The first is the
> command substitution warning about dropping NULL bytes from 2016:
>
> https://lists.gnu.org/archive/html/bug-bash/2016-09/msg00015.html
>
> which I talked about a couple of days ago:
>
> https://lists.gnu.org/archive/html/bug-bash/2022-02/msg00054.html
>
> The second is a change from back in 2011 (bash-4.2 days) that changed bash
> to drop NULL bytes in the read builtin:
>
> https://lists.gnu.org/archive/html/bug-bash/2011-11/msg00136.html
>
> One of my messages in that thread contains a quickie survey of other
> shells' behavior here. The change is in line with what other shells do.
>
>
> > It took some debugging and hack arounds to find another way to access
> > the data.  So what some might have called silent data corruption because
> > bash silently passed through the nul terminated datum as a string
> > terminator, my program took as logical behavior.  I complained about
> > the change,
>
> Where? Since this is the opposite of what happened in the command
> substitution case, I'm assuming you mean the read change from 2011. You
> didn't participate in the original discussion, and I'm just not inclined
> to go digging around the archives for it.
>
> > remarking that if bash was going to sanitize returned values
> > (in that case checking for what should have been an ascii value with NUL
> > not being in the allowed value of string characters), that bash might
> > also be saddled with checking for invalid Unicode sequences and warning 
> > about
> > them as well, regardless of the source of the corruption, some programs
> > might expect to get a raw byte sequence rather than some encoded form
> > with the difference in interpretation causing noticeable bugs.
>
> You might actually have said something like this at some point.
>
> I'd prefer to think your memory has conflated these two things, and that
> this is how you remember it. That's better than the alternative.
>
> --
> ``The lyf so short, the craft so long to lerne.'' - Chaucer
>                  ``Ars longa, vita brevis'' - Hippocrates
> Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]