[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Some byte combinations affect UTF-8 string reading
From: |
Olga Ustuzhanina |
Subject: |
Re: Some byte combinations affect UTF-8 string reading |
Date: |
Tue, 26 Feb 2019 05:42:08 +0700 |
On Mon, 25 Feb 2019 12:59:38 -0800
L A Walsh <bash@tlinx.org> wrote:
> In this case, the decode of \xc2 doesn't swallow the following
> character.
I want to clarify that \xc2 (and other characters in the range
mentioned above) can only swallow a \0. Other characters are
unaffected.
>
> But in 4.4.12, using IFS='':
>
> ntc() { while IFS='' read -r input; do printf "$input;" ; done ; }
Looks like `-d ''` is necessary to get `read` to process anything:
$ ntc() { while IFS='' read -r input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd
$ ntc() { while IFS='' read -r -d '' input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd
00000000: c23b 3b3b .;;;
On bash 4.4.19 I have a different output:
$ ntc() { while IFS='' read -r -d '' input; do printf "$input;" ; done ; }
$ printf "\xc2\0\0\0\0" | ntc | xxd
00000000: c23b 3b3b 3b .;;;;