help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to represent NBSP in gawk regex?


From: Eli Zaretskii
Subject: Re: How to represent NBSP in gawk regex?
Date: Mon, 21 Feb 2022 19:25:42 +0200

> Date: Mon, 21 Feb 2022 11:04:14 -0600 (CST)
> From: "Neil R. Ormos" <ormos-gnulists17@ormos.org>
> 
>   gawk 'BEGIN{print length("\xc2\xa0") }'
> 
> prints 1 in a UTF-8 locale, showing that Gawk recognizes the run of bytes as 
> a single character.

It isn't Gawk, it's the underlying C library.  Which is why the above
only works reliably in a locale whose codeset is UTF-8 -- the relevant
library routines change their behavior depending on the locale.

> Although both David's and Wolfgang's solutions work, I wonder if there is a 
> more portable way to represent the character that is not nailed-up for a 
> specific character set.

What do you mean by "character set" in this context?  That is an
overloaded terminology, and it's easy to become confused if we don't
define precisely what we mean.

> Surely a UTF-256 is on the horizon.

I very much doubt that.  I think UTF-8 is here to stay for a very long
time.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]