bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Corrupted multibyte characters in command substitutions fixes may be


From: Alex fxmbsw7 Ratchev
Subject: Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem.
Date: Sun, 6 Feb 2022 23:19:23 +0100

a replacement sequence to null bytes i would find a solution to null bytes
no i didnt understand these posts of these emails but i am just
concerned about the null bytes not being dropped

On Sun, Feb 6, 2022 at 11:16 PM Alex fxmbsw7 Ratchev <fxmbsw7@gmail.com> wrote:
>
> im sorry i didnt realize it would just prefix to null byte, which uses
> nullbyte, so it wont work
> cheers
>
> On Sun, Feb 6, 2022 at 11:11 PM Alex fxmbsw7 Ratchev <fxmbsw7@gmail.com> 
> wrote:
> >
> > i just have a small question here
> > the dropping of null bytes is no friend of me and i understand you're
> > there to skip it instead of process, which results in null bytes gone
> > which is not much of an use
> >
> > can't these \0 bytes be encoded at least when a utf8 locale is used as
> > \u0 instead of dropping ? <the two utf 8 bytes> and a null, ... just
> > prefix the utf 8 encoding chars to the null
> > and they'd be safely maybe still here
> >
> > just asking..
> >
> > On Sun, Feb 6, 2022 at 6:38 PM Chet Ramey <chet.ramey@case.edu> wrote:
> > >
> > > On 2/5/22 9:41 PM, L A Walsh wrote:
> > >
> > > > That's debatable, BTW, as I was reminded of a similar
> > > > passthrough of what one might call 'invalid input' w/o warning,
> > > > resulting in code that worked in a specific circumstance to a change
> > > > in bash issuing a warning that resulted in breaking code, that, at that
> > > > point, worked as expected.
> > >
> > > Memory is a tricky thing. This statement -- you've made it twice -- got me
> > > wondering what you might be referring to, so I went digging.
> > >
> > >
> > > > Specifically, it involved reading a value typically in the range
> > > > 50 <=x <=150 from an active file (like a value from /proc that varies
> > > > based on OS internal values) where the data was stored in a
> > > > quad, or Little-Endian DWORD value, so the value was in the the
> > > > 2 least significant bytes with the most significant bytes following
> > > > (in a higher position) in memory, like:
> > > > Byte# => 00 01 02 03, for value 100 decimal:
> > > > hex   => 64 00 00 00
> > > >
> > > > The working code expected to see 0x64 followed by 0x00 which it
> > > > used as string terminator. >
> > > > Chet "fixed" this silent use of 0x00 as a string terminator to no longer
> > > > ignore it, but have bash issue a warning message, which caused the
> > > > "read < fn" to fail and return 0 instead of the ascii character 'd', 
> > > > which
> > > > the program had interpret as the DPI value of the user's screen.
> > >
> > > So it seems like you've conflated two different things. The first is the
> > > command substitution warning about dropping NULL bytes from 2016:
> > >
> > > https://lists.gnu.org/archive/html/bug-bash/2016-09/msg00015.html
> > >
> > > which I talked about a couple of days ago:
> > >
> > > https://lists.gnu.org/archive/html/bug-bash/2022-02/msg00054.html
> > >
> > > The second is a change from back in 2011 (bash-4.2 days) that changed bash
> > > to drop NULL bytes in the read builtin:
> > >
> > > https://lists.gnu.org/archive/html/bug-bash/2011-11/msg00136.html
> > >
> > > One of my messages in that thread contains a quickie survey of other
> > > shells' behavior here. The change is in line with what other shells do.
> > >
> > >
> > > > It took some debugging and hack arounds to find another way to access
> > > > the data.  So what some might have called silent data corruption because
> > > > bash silently passed through the nul terminated datum as a string
> > > > terminator, my program took as logical behavior.  I complained about
> > > > the change,
> > >
> > > Where? Since this is the opposite of what happened in the command
> > > substitution case, I'm assuming you mean the read change from 2011. You
> > > didn't participate in the original discussion, and I'm just not inclined
> > > to go digging around the archives for it.
> > >
> > > > remarking that if bash was going to sanitize returned values
> > > > (in that case checking for what should have been an ascii value with NUL
> > > > not being in the allowed value of string characters), that bash might
> > > > also be saddled with checking for invalid Unicode sequences and warning 
> > > > about
> > > > them as well, regardless of the source of the corruption, some programs
> > > > might expect to get a raw byte sequence rather than some encoded form
> > > > with the difference in interpretation causing noticeable bugs.
> > >
> > > You might actually have said something like this at some point.
> > >
> > > I'd prefer to think your memory has conflated these two things, and that
> > > this is how you remember it. That's better than the alternative.
> > >
> > > --
> > > ``The lyf so short, the craft so long to lerne.'' - Chaucer
> > >                  ``Ars longa, vita brevis'' - Hippocrates
> > > Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/
> > >



reply via email to

[Prev in Thread] Current Thread [Next in Thread]