Re: Some byte combinations affect UTF-8 string reading

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Some byte combinations affect UTF-8 string reading

From:	Olga Ustuzhanina
Subject:	Re: Some byte combinations affect UTF-8 string reading
Date:	Tue, 26 Feb 2019 05:42:08 +0700

On Mon, 25 Feb 2019 12:59:38 -0800
L A Walsh <bash@tlinx.org> wrote:

> In this case, the decode of \xc2 doesn't swallow the following
> character.

I want to clarify that \xc2 (and other characters in the range
mentioned above) can only swallow a \0. Other characters are
unaffected.

> 
> But in 4.4.12, using IFS='':
> 
> ntc() {  while IFS='' read -r input; do printf "$input;" ; done ; }

Looks like `-d ''` is necessary to get `read` to process anything:

$ ntc() {  while IFS='' read -r  input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd

$ ntc() {  while IFS='' read -r -d '' input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd
00000000: c23b 3b3b                                .;;;

On bash 4.4.19 I have a different output:

$ ntc() {  while IFS='' read -r -d ''  input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd
00000000: c23b 3b3b 3b                             .;;;;

[Prev in Thread]

Current Thread

[Next in Thread]

Some byte combinations affect UTF-8 string reading, Olga Ustuzhanina, 2019/02/25
- Re: Some byte combinations affect UTF-8 string reading, Chet Ramey, 2019/02/25
  - Re: Some byte combinations affect UTF-8 string reading, L A Walsh, 2019/02/25
    - Re: Some byte combinations affect UTF-8 string reading, Olga Ustuzhanina <=
    - Re: Some byte combinations affect UTF-8 string reading, Chet Ramey, 2019/02/26
    - Re: Some byte combinations affect UTF-8 string reading, Grisha Levit, 2019/02/25

Prev by Date: Re: "$@" expansion when it is consists of only null strings
Next by Date: Re: Some byte combinations affect UTF-8 string reading
Previous by thread: Re: Some byte combinations affect UTF-8 string reading
Next by thread: Re: Some byte combinations affect UTF-8 string reading
Index(es):
- Date
- Thread