bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE


From: Eli Zaretskii
Subject: bug#40407: [PATCH] slow ENCODE_FILE and DECODE_FILE
Date: Sun, 05 Apr 2020 16:28:13 +0300

> From: Mattias Engdegård <mattiase@acm.org>
> Date: Sun, 5 Apr 2020 12:14:59 +0200
> Cc: 40407@debbugs.gnu.org
> 
> > I think in the use case where we return a copy, we should make sure
> > the return value is unibyte when encoding and multibyte when decoding.
> 
> I'm not necessarily opposed to the suggestion, but why not return a unibyte 
> string in both cases, simplifying the code?

For compatibility with what happens now:

  (multibyte-string-p (decode-coding-string "abc" 'utf-8)) => t

> In addition, some operations (aref) are faster on unibyte. Either way, it's 
> nothing that a caller could rely on, is there? (In particular when taking 
> NOCOPY into account.)

That is true, of course, but many/most of our strings are multibyte
nowadays, even if they are ASCII.  Suddenly getting a unibyte string
instead would be surprising, I think, even if no one should depend on
it not happening.  (NOCOPY case is different: then it's the caller's
responsibility to deal with the issue.)  So I'd rather we produced a
multibyte string when "decoding" by copying.

> +/* Whether a (unibyte) string only contains chars in the 0..127 range.  */

One subtle point regarding this comment: I'd remove the "unibyte"
part, because (1) you apply this test to multibyte strings as well,
and (2) strings encoded in iso-2022 will look "pure-ASCII", but they
aren't.  The latter subtlety doesn't interfere with the caller,
because iso-2022 is not ASCII-compatible, but it's something I'd
mention in the comment, lest someone uses this function for some
other use case.

The patch is OK otherwise.  Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]