help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to make gawk aware of unicode characters?


From: Wolfgang Laun
Subject: Re: How to make gawk aware of unicode characters?
Date: Sat, 7 Jan 2023 07:22:31 +0100

$ cat uring.awk
BEGIN { print "Jenůfa";  uring = sprintf( "%c", 0x016F); }
/ů/ { print "literal: " $1; }
$1 ~ uring { print "variable: " $1; }

$ gawk -f uring.awk  </dev/null | gawk -f uring.awk
Jenůfa
literal: Jenůfa
variable: Jenůfa

Apparently  it is not possible to create a string or regex using "\x016F".
This is an annoying gap in handling all unicode characters. But one can
pick up unicode characters from some web page. (I'm using gawk 5.1.0 on
Linux.)

Wolfgang

On Fri, 6 Jan 2023 at 17:06, Peng Yu <pengyu.ut@gmail.com> wrote:

> I am not sure how to do it specifically after reading that chapter.
> Could you please some working code for my simple example?
>
> On 1/6/23, david kerns <david.t.kerns@gmail.com> wrote:
> > see chapter 13  - https://www.gnu.org/software/gawk/manual/gawk.html
> >
> > On Fri, Jan 6, 2023 at 7:56 AM Peng Yu <pengyu.ut@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I use the following code to match no-break space.
> >>
> >> $ awk -e '/§\xc2\xa03/ { print }' <<< '§ 3'
> >> § 3
> >>
> >> However, the unicode is U+00A0 instead of \xc2\xa0. Obviously, gawk
> >> treats the input as a stream of bytes instead of Unicode characters.
> >> Is there a way to let gawk be aware of Unicode characters so that I
> >> can write something like \u00a0 as in many other languages?
> >>
> >>
> >>
> https://www.utf8-chartable.de/unicode-utf8-table.pl?names=2&utf8=string-literal&unicodeinhtml=hex
> >>
> >> --
> >> Regards,
> >> Peng
> >>
> >>
> >
>
>
> --
> Regards,
> Peng
>
>

-- 
Wolfgang Laun


reply via email to

[Prev in Thread] Current Thread [Next in Thread]