bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Corrupted multibyte characters in command substitutions fixes may be


From: Alex fxmbsw7 Ratchev
Subject: Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem.
Date: Sun, 6 Feb 2022 23:16:04 +0100

im sorry i didnt realize it would just prefix to null byte, which uses
nullbyte, so it wont work
cheers

On Sun, Feb 6, 2022 at 11:11 PM Alex fxmbsw7 Ratchev <fxmbsw7@gmail.com> wrote:
>
> i just have a small question here
> the dropping of null bytes is no friend of me and i understand you're
> there to skip it instead of process, which results in null bytes gone
> which is not much of an use
>
> can't these \0 bytes be encoded at least when a utf8 locale is used as
> \u0 instead of dropping ? <the two utf 8 bytes> and a null, ... just
> prefix the utf 8 encoding chars to the null
> and they'd be safely maybe still here
>
> just asking..
>
> On Sun, Feb 6, 2022 at 6:38 PM Chet Ramey <chet.ramey@case.edu> wrote:
> >
> > On 2/5/22 9:41 PM, L A Walsh wrote:
> >
> > > That's debatable, BTW, as I was reminded of a similar
> > > passthrough of what one might call 'invalid input' w/o warning,
> > > resulting in code that worked in a specific circumstance to a change
> > > in bash issuing a warning that resulted in breaking code, that, at that
> > > point, worked as expected.
> >
> > Memory is a tricky thing. This statement -- you've made it twice -- got me
> > wondering what you might be referring to, so I went digging.
> >
> >
> > > Specifically, it involved reading a value typically in the range
> > > 50 <=x <=150 from an active file (like a value from /proc that varies
> > > based on OS internal values) where the data was stored in a
> > > quad, or Little-Endian DWORD value, so the value was in the the
> > > 2 least significant bytes with the most significant bytes following
> > > (in a higher position) in memory, like:
> > > Byte# => 00 01 02 03, for value 100 decimal:
> > > hex   => 64 00 00 00
> > >
> > > The working code expected to see 0x64 followed by 0x00 which it
> > > used as string terminator. >
> > > Chet "fixed" this silent use of 0x00 as a string terminator to no longer
> > > ignore it, but have bash issue a warning message, which caused the
> > > "read < fn" to fail and return 0 instead of the ascii character 'd', which
> > > the program had interpret as the DPI value of the user's screen.
> >
> > So it seems like you've conflated two different things. The first is the
> > command substitution warning about dropping NULL bytes from 2016:
> >
> > https://lists.gnu.org/archive/html/bug-bash/2016-09/msg00015.html
> >
> > which I talked about a couple of days ago:
> >
> > https://lists.gnu.org/archive/html/bug-bash/2022-02/msg00054.html
> >
> > The second is a change from back in 2011 (bash-4.2 days) that changed bash
> > to drop NULL bytes in the read builtin:
> >
> > https://lists.gnu.org/archive/html/bug-bash/2011-11/msg00136.html
> >
> > One of my messages in that thread contains a quickie survey of other
> > shells' behavior here. The change is in line with what other shells do.
> >
> >
> > > It took some debugging and hack arounds to find another way to access
> > > the data.  So what some might have called silent data corruption because
> > > bash silently passed through the nul terminated datum as a string
> > > terminator, my program took as logical behavior.  I complained about
> > > the change,
> >
> > Where? Since this is the opposite of what happened in the command
> > substitution case, I'm assuming you mean the read change from 2011. You
> > didn't participate in the original discussion, and I'm just not inclined
> > to go digging around the archives for it.
> >
> > > remarking that if bash was going to sanitize returned values
> > > (in that case checking for what should have been an ascii value with NUL
> > > not being in the allowed value of string characters), that bash might
> > > also be saddled with checking for invalid Unicode sequences and warning 
> > > about
> > > them as well, regardless of the source of the corruption, some programs
> > > might expect to get a raw byte sequence rather than some encoded form
> > > with the difference in interpretation causing noticeable bugs.
> >
> > You might actually have said something like this at some point.
> >
> > I'd prefer to think your memory has conflated these two things, and that
> > this is how you remember it. That's better than the alternative.
> >
> > --
> > ``The lyf so short, the craft so long to lerne.'' - Chaucer
> >                  ``Ars longa, vita brevis'' - Hippocrates
> > Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/
> >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]