[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution
From: |
Stephane Chazelas |
Subject: |
Re: Locale not Obeyed by Parameter Expansion with Pattern Substitution |
Date: |
Mon, 18 Nov 2019 20:46:26 +0000 |
User-agent: |
NeoMutt/20171215 |
2019-11-17 01:25:31 -0800, Chris Carlen:
[...]
> # write 'REVERSE PILCROW SIGN' to B, then repeat as above:
> printf -v B '\u204B'
> set -- ${B//?()/ }
> echo "${@@Q}" #-> $'\342' $'\201' $'\213'
>
> # NOTE: Since there is only one character (under the UTF-8 locale),
> # this should have set only the first positional parameter with the
> # character REVERSE PILCROW SIGN, not split it into bytes (AFAIK).
[...]
Yes, the question is where to resume searching after a match of
an empty string in ${var//pattern/replacement}.
Note that it's even worse in ksh93 where bash copied that syntax
from:
$ A=$'\u2048\u2048' ksh93 -c 'printf "%q\n" "${A//?()/:}"'
$':\u[2048]:\x81:\x88:\u[2048]:\x81:\x88:'
(here with ksh93u+)
Then there's the question of what
${B/$'\201'/}
should do. Should that $'\201' match the byte component of the encoding of
U+204B?
It seems to me that zsh's approach is best:
$ A=$'\u2048\201\u2048' zsh -c "printf '%q\n' \"\${A//$'\201'/:}\""
⁈:⁈
That is replace that \201 byte, except when it's part of a
properly encoded character.
Compare with:
$ A=$'\u2048\201\u2048' bash -c "printf '%q\n' \"\${A//$'\201'/:}\""
$'\342:\210:\342:\210'
$ A=$'\u2048\201\u2048' ksh93 -c "printf '%q\n' \"\${A//$'\201'/:}\""
$'\u[2048]:\x88:\u[2048]:\x88'
(or yash which can't deal with that \201 byte at all as it can't
form a valid character).
--
Stephane