Minor utf32-to-utf8 bug

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Minor utf32-to-utf8 bug

From:	István Pásztor
Subject:	Minor utf32-to-utf8 bug
Date:	Sun, 10 Nov 2019 14:07:49 +0000

Hi

The encoding of six bytes long utf-8 sequences is buggy. Today unicode
requires at most 4 bytes long utf-8 sequences but if we handle 5 and 6 too
then let's do it the right way.

The attached patch was created using a fresh master clone
(d894cfd104086ddf68c286e67a5fb2e02eb43b7b).

I'm writing a tool (pxargs, an xargs variant) that can accept input strings
in shell-quoted format and used the bash manual and source code as
references. I haven't compiled the latest bash sources to check the bug but
it is likely to affect ANSI-C quoted strings like $'\U7fffffff'.

Best Regards,
Istvan Pasztor

six_bytes_long_utf8_sequences_d894cfd104086ddf68c286e67a5fb2e02eb43b7b.patch
Description: Binary data

[Prev in Thread]

Current Thread

[Next in Thread]

Minor utf32-to-utf8 bug, István Pásztor <=
- Re: Minor utf32-to-utf8 bug, Chet Ramey, 2019/11/11

Prev by Date: Re: [FR] save command times and exit status in history automatically
Next by Date: Re: Minor utf32-to-utf8 bug
Previous by thread: bug in arithmetic expansion
Next by thread: Re: Minor utf32-to-utf8 bug
Index(es):
- Date
- Thread