bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12291: [rev 109796] wrong UTF-8 handling


From: Kenichi Handa
Subject: bug#12291: [rev 109796] wrong UTF-8 handling
Date: Mon, 03 Sep 2012 09:59:22 +0900

In article <83bohrqr83.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:

> > Date: Tue, 28 Aug 2012 21:22:26 +0200 (CEST)
> > From: Werner LEMBERG <wl@gnu.org>
> > Cc: 12291@debbugs.gnu.org, smithcu@gvsu.edu
> > 
> > > I think the correct behaviour on reading such a file by utf-8 is to
> > > treat each byte as raw-byte.
> > 
> > Maybe.  I'm not sure how Emacs should behave in reading such files.

> We can either read them as raw bytes, or convert them to u+FFFD.  The
> former sounds like a more useful behavior to me, FWIW.

What to convert to U+FFFD?  Each byte, or the byte sequence?

Anyway, we can't simply convert them to U+FFFD because it
results in change of file contents just by reading and
writing.  We can add post-read-conversion and
pre-write-conversion functions to the conding system utf-8
to perform the conversion (and adding text properties for
reverting) and reverting (using the text properties attached
at the time of reading).  But, is it worth doing that?

I think converting each invalid byte to raw-byte is simpler
and equally useful.

---
Kenichi Handa
handa@gnu.org





reply via email to

[Prev in Thread] Current Thread [Next in Thread]