[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#12291: [rev 109796] wrong UTF-8 handling
From: |
Kenichi Handa |
Subject: |
bug#12291: [rev 109796] wrong UTF-8 handling |
Date: |
Mon, 03 Sep 2012 09:59:22 +0900 |
In article <83bohrqr83.fsf@gnu.org>, Eli Zaretskii <eliz@gnu.org> writes:
> > Date: Tue, 28 Aug 2012 21:22:26 +0200 (CEST)
> > From: Werner LEMBERG <wl@gnu.org>
> > Cc: 12291@debbugs.gnu.org, smithcu@gvsu.edu
> >
> > > I think the correct behaviour on reading such a file by utf-8 is to
> > > treat each byte as raw-byte.
> >
> > Maybe. I'm not sure how Emacs should behave in reading such files.
> We can either read them as raw bytes, or convert them to u+FFFD. The
> former sounds like a more useful behavior to me, FWIW.
What to convert to U+FFFD? Each byte, or the byte sequence?
Anyway, we can't simply convert them to U+FFFD because it
results in change of file contents just by reading and
writing. We can add post-read-conversion and
pre-write-conversion functions to the conding system utf-8
to perform the conversion (and adding text properties for
reverting) and reverting (using the text properties attached
at the time of reading). But, is it worth doing that?
I think converting each invalid byte to raw-byte is simpler
and equally useful.
---
Kenichi Handa
handa@gnu.org
- bug#12291: [rev 109796] wrong UTF-8 handling,
Kenichi Handa <=