bug#12291: [rev 109796] wrong UTF-8 handling

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12291: [rev 109796] wrong UTF-8 handling

From:	Kenichi Handa
Subject:	bug#12291: [rev 109796] wrong UTF-8 handling
Date:	Mon, 03 Sep 2012 09:59:22 +0900

In article <83bohrqr83.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Date: Tue, 28 Aug 2012 21:22:26 +0200 (CEST)
> > From: Werner LEMBERG <wl@gnu.org>
> > Cc: 12291@debbugs.gnu.org, smithcu@gvsu.edu
> > 
> > > I think the correct behaviour on reading such a file by utf-8 is to
> > > treat each byte as raw-byte.
> > 
> > Maybe.  I'm not sure how Emacs should behave in reading such files.

> We can either read them as raw bytes, or convert them to u+FFFD.  The
> former sounds like a more useful behavior to me, FWIW.

What to convert to U+FFFD?  Each byte, or the byte sequence?

Anyway, we can't simply convert them to U+FFFD because it
results in change of file contents just by reading and
writing.  We can add post-read-conversion and
pre-write-conversion functions to the conding system utf-8
to perform the conversion (and adding text properties for
reverting) and reverting (using the text properties attached
at the time of reading).  But, is it worth doing that?

I think converting each invalid byte to raw-byte is simpler
and equally useful.

---
Kenichi Handa
handa@gnu.org

[Prev in Thread]

Current Thread

[Next in Thread]

bug#12291: [rev 109796] wrong UTF-8 handling, Kenichi Handa <=
- bug#12291: [rev 109796] wrong UTF-8 handling, Eli Zaretskii, 2012/09/02

Prev by Date: bug#12338: 24.2.50; Error during redisplay (apropos which-function-mode?)
Next by Date: bug#12291: [rev 109796] wrong UTF-8 handling
Previous by thread: bug#11749: Acknowledgement (24.1; C-mode indentation gives wrong-type-argument error.)
Next by thread: bug#12291: [rev 109796] wrong UTF-8 handling
Index(es):
- Date
- Thread