[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM
From: |
Lars Ingebrigtsen |
Subject: |
bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM |
Date: |
Mon, 04 Jul 2022 12:34:29 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) |
Eli Zaretskii <eliz@gnu.org> writes:
> I see that it's actually 6 bytes _including_ the BOM. So I think this
> is confusing: if we are going to return a string with the BOM, we
> should not count the BOM as part of the LENGTH bytes. Because if I
> requested to get characters which fit into N bytes, I should get those
> N bytes of payload. Or maybe we should have an optional argument to
> control whether LENGTH includes or excludes the BOM.
It the caller has asked for a max number of bytes in a coding system
that includes a BOM, then the BOM has to be counted -- otherwise the
bytes won't fit into whatever field the protocol they're using limits
the string to.
However, utf-16 is in a slightly special situation here, since the byte
order is often implied, and people use utf-16 instead of
utf-16be-with-signature (or something), and utf-16 (in Emacs) is defined
to have a BOM. (And we don't have a -without-signature variant, do we?)
> In any case, we should mention this aspect in the doc string, I think.
Yes. But should we have -without-signature variants for utf-16? Then
the doc string could recommend using that if the caller wants BOM-less
bytes.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/02
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/02
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/03
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/03
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Andreas Schwab, 2022/07/03
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/03
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM,
Lars Ingebrigtsen <=
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Eli Zaretskii, 2022/07/04
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/05
- bug#48324: 27.2; hexl-mode duplicates the UTF-8 BOM, Lars Ingebrigtsen, 2022/07/03