Re: Some byte combinations affect UTF-8 string reading

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some byte combinations affect UTF-8 string reading

From:	Chet Ramey
Subject:	Re: Some byte combinations affect UTF-8 string reading
Date:	Tue, 26 Feb 2019 14:57:29 -0500
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.5.1

On 2/25/19 5:42 PM, Olga Ustuzhanina wrote:
> On Mon, 25 Feb 2019 12:59:38 -0800
> L A Walsh <bash@tlinx.org> wrote:
> 
>> In this case, the decode of \xc2 doesn't swallow the following
>> character.
> 
> I want to clarify that \xc2 (and other characters in the range
> mentioned above) can only swallow a \0. Other characters are
> unaffected.

The other characters wouldn't be treated as a delimiter either. The \0
is `swallowed' because it's the C string terminator.

The \0 gets added to the input string, but it's not treated as a delimiter,
since it's part of the invalid multibyte sequence. Then the next character
is read, that \0 is treated as a delimiter, and the input string is
assigned to the variable, including the \0. That gets treated as a normal C
string terminator, since variable values can't contain NULs.

(This is why read discards \0 unless it's a delimiter. It would terminate
the value assigned to the variable.)

Bash-4.4 returned different results because it didn't attempt to validate
reading multibyte characters at all unless it was reading a fixed number of
characters.

-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/

[Prev in Thread]

Current Thread

[Next in Thread]

Some byte combinations affect UTF-8 string reading, Olga Ustuzhanina, 2019/02/25
- Re: Some byte combinations affect UTF-8 string reading, Chet Ramey, 2019/02/25
  - Re: Some byte combinations affect UTF-8 string reading, L A Walsh, 2019/02/25
    - Re: Some byte combinations affect UTF-8 string reading, Olga Ustuzhanina, 2019/02/25
    - Re: Some byte combinations affect UTF-8 string reading, Chet Ramey <=
    - Re: Some byte combinations affect UTF-8 string reading, Grisha Levit, 2019/02/25

Prev by Date: Re: running fc with negative index segfaults bash
Next by Date: Re: turning on file+line for functions with shopt -s extdebug gives error
Previous by thread: Re: Some byte combinations affect UTF-8 string reading
Next by thread: Re: Some byte combinations affect UTF-8 string reading
Index(es):
- Date
- Thread