[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF
From: |
Eli Zaretskii |
Subject: |
Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files |
Date: |
Sun, 27 Sep 2015 10:27:58 +0300 |
> Cc: address@hidden, address@hidden, address@hidden
> From: Paul Eggert <address@hidden>
> Date: Sat, 26 Sep 2015 13:32:33 -0700
>
> Eli Zaretskii wrote:
> > The relevant statistics for Emacs is of source files, not of HTML
> > pages.
>
> Sure, and source files are how this thread got started: nowadays in GNU
> projects
> they're typically UTF-8 regardless of system locale settings, and Emacs
> should
> be better about supporting this typical situation. UTF-8 is common partly
> because source files are shared widely via the Internet, on sites like
> Savannah.
>
> The days of lonely hackers writing code in their own private Shift-JIS
> directories are largely over. Of course Emacs can still support such users,
> but
> the default should be tailored to what's more typical nowadays.
Emacs supports the typical situation quite well already, definitely so
in a typical (i.e. UTF-8) locale. The issue at hand is not how to
support the typical situation, it's whether that typical situation is
the _only_ situation that matters, so much so that we can ignore the
locale-derived defaults.
In any case, I said we needed _statistics_, i.e. numbers, not just
impressions and opinions.
I don't know how to find a representative set of C sources, not even
for European locales. I looked at the C files of GNU projects from
the last years on my main development system, which is probably not
very representative. There are more than 142,000 C files there.
Using the 'file' utility, I found about 1.8% of UTF-8 encoded files
and about 0.2% ISO-8859 encoded files (the vast majority was US ASCII,
of course). That's still more than 250 ISO-8859 encoded files.
I've also looked at the *.po files in the latest releases of GNU Make,
Gawk, Texinfo, and Binutils, and I find that between 20% and 25% of
such files still use non-UTF-8 encodings. I see similar figures for
the txi-*.tex files that came with Texinfo 6.0. Presumably, that
follows the default conventions of the respective locales.
So, while I agree with you that UTF-8 encoded files are the majority
among non-ASCII files (and Emacs development aligns itself with that
fact very well), the non-UTF-8 minority, even in the Posix world, is
still significant enough, and we cannot possibly ignore it.
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, (continued)
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Eli Zaretskii, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Eli Zaretskii, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Chad Brown, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Eli Zaretskii, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Chad Brown, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Eli Zaretskii, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Paul Eggert, 2015/09/26
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files,
Eli Zaretskii <=
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, David Kastrup, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Rustom Mody, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Eli Zaretskii, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Paul Eggert, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Eli Zaretskii, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Paul Eggert, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Eli Zaretskii, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Andreas Schwab, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, David Kastrup, 2015/09/27
- Re: [Emacs-diffs] master db828f6: Don't rely on defaults in decoding UTF-8 encoded Lisp files, Eli Zaretskii, 2015/09/27