bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: UTF-8 printf string formating problem


From: Pádraig Brady
Subject: Re: UTF-8 printf string formating problem
Date: Mon, 07 Apr 2014 14:14:28 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 04/06/2014 12:56 PM, Dan Douglas wrote:
> On Sunday, April 06, 2014 01:24:58 PM Jan Novak wrote:
>> To solve this problem I suppose to add "wide" switch to printf
>> or to add "%S" format   (similarly to wprintf(3) )
> 
> ksh93 already has this feature using the "L" modifier:
> 
> ksh -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
> ★★★
> bash -c "printf '%.3Ls\n' $'\u2605\u2605\u2605\u2605\u2605'"
>
> 
> Also, zsh does this by default with no special option. I tend to lean towards 
> going by character anyway because that's what most shell features such as 
> "read -N" do, and most work directly involving the shell is with text not 
> binary data.

So we can count bytes, chars or cells (graphemes).

Thinking a bit more about it, I think shell level printf
should be dealing in text of the current encoding and counting cells.
In the edge case where you want to deal in bytes one can do:
  LC_ALL=C printf ...

I see that ksh behaves as I would expect and counts cells,
though requires the explicit %L enabler:
  $ ksh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
  á★★
  $ ksh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
  A★
  $ ksh -c "printf '%.3Ls\n' $'AA\u2605\u2605\u2605'"
  A

zsh seems to just count characters:
  $ zsh -c "printf '%.3Ls\n' $'a\u0301\u2605\u2605\u2605'"
  á★
  $ zsh -c "printf '%.3s\n' $'a\u0301\u2605\u2605\u2605'"
  á★
  $ zsh -c "printf '%.3Ls\n' $'A\u2605\u2605\u2605'"
  A★★

GNU awk seems to just count characters:
$ awk 'BEGIN{printf "%.3s\n", "A★★★"}'
A★★

I see that dash gives invalid directive for any of %ls %Ls %S.

Pity there is no consensus here.
Personally I would go for:
  printf '%3s' 'blah'  # count cells
  printf '%3Ls' 'blah' # count chars
  LANG=C '%3Ls' 'blah' # count bytes
  LANG=C '%3s' 'blah'  # count bytes

Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]