[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Corrupted multibyte characters in command substitutions fixes may be

From: Chet Ramey
Subject: Re: Corrupted multibyte characters in command substitutions fixes may be worse than problem.
Date: Sun, 6 Feb 2022 12:38:02 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.4.1

On 2/5/22 9:41 PM, L A Walsh wrote:

That's debatable, BTW, as I was reminded of a similar
passthrough of what one might call 'invalid input' w/o warning,
resulting in code that worked in a specific circumstance to a change
in bash issuing a warning that resulted in breaking code, that, at that
point, worked as expected.

Memory is a tricky thing. This statement -- you've made it twice -- got me
wondering what you might be referring to, so I went digging.

Specifically, it involved reading a value typically in the range
50 <=x <=150 from an active file (like a value from /proc that varies
based on OS internal values) where the data was stored in a
quad, or Little-Endian DWORD value, so the value was in the the
2 least significant bytes with the most significant bytes following
(in a higher position) in memory, like:
Byte# => 00 01 02 03, for value 100 decimal:
hex   => 64 00 00 00

The working code expected to see 0x64 followed by 0x00 which it
used as string terminator. >
Chet "fixed" this silent use of 0x00 as a string terminator to no longer
ignore it, but have bash issue a warning message, which caused the
"read < fn" to fail and return 0 instead of the ascii character 'd', which
the program had interpret as the DPI value of the user's screen.

So it seems like you've conflated two different things. The first is the
command substitution warning about dropping NULL bytes from 2016:


which I talked about a couple of days ago:


The second is a change from back in 2011 (bash-4.2 days) that changed bash
to drop NULL bytes in the read builtin:


One of my messages in that thread contains a quickie survey of other
shells' behavior here. The change is in line with what other shells do.

It took some debugging and hack arounds to find another way to access
the data.  So what some might have called silent data corruption because
bash silently passed through the nul terminated datum as a string
terminator, my program took as logical behavior.  I complained about
the change,

Where? Since this is the opposite of what happened in the command
substitution case, I'm assuming you mean the read change from 2011. You
didn't participate in the original discussion, and I'm just not inclined
to go digging around the archives for it.

remarking that if bash was going to sanitize returned values
(in that case checking for what should have been an ascii value with NUL
not being in the allowed value of string characters), that bash might
also be saddled with checking for invalid Unicode sequences and warning about
them as well, regardless of the source of the corruption, some programs
might expect to get a raw byte sequence rather than some encoded form
with the difference in interpretation causing noticeable bugs.

You might actually have said something like this at some point.

I'd prefer to think your memory has conflated these two things, and that
this is how you remember it. That's better than the alternative.

``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]