[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#33796: 27.0.50; Use utf-8 is all our Elisp files
From: |
Paul Eggert |
Subject: |
bug#33796: 27.0.50; Use utf-8 is all our Elisp files |
Date: |
Wed, 19 Dec 2018 09:54:40 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 |
> I'm not really sure who to ask about this.
You can ask me (:-). Although I can't read east-Asian languages I do
have significant experience with CJK text as my previous (15-year) job
was in a company whose customers were almost all CJK and where CJK
internationalization was essential and where I regularly dealt with
weird encodings and displays. And this one is an easy call: for
maintaining these particular files, UTF-8 is an improvement and this
patch should go in.
To take just one example, titdic-cnv.el: people who are seriously
maintaining it and who need to read the Chinese text will almost surely
have their environment set up to display UTF-8 Chinese text well
already. Furthermore, if you take a look at all the changes made to this
file in the last decade, here are the statistics:
edits contributor
15 Author: Paul Eggert <eggert@cs.ucla.edu>
10 Author: Glenn Morris <rgm@gnu.org>
2 Author: Stefan Monnier <monnier@iro.umontreal.ca>
2 Author: Juanma Barranquero <lekktu@gmail.com>
1 Author: Phillip Lord <phillip.lord@russet.org.uk>
1 Author: Kenichi Handa <handa@m17n.org>
1 Author: Andreas Schwab <schwab@linux-m68k.org>
Only one edit was made by a CJK user, and handa's edit involved only
ASCII characters. Switching this file to UTF-8 would not have made any
of our maintenance any more difficult in the last decade.
Conversely, I commonly use tools like 'git grep' to look for issues in
the code, and these tools mishandle non-UTF-8 files and I see mojibake
on my screen because of this. So it will be a significant win for me
(and I suspect others) when we switch these files to UTF-8.
To try to answer Stefan's questions:
> - Do those people who edit those files really care about the difference?
No, almost always: see above.
> utf-8 is becoming standard even in the CJK world so
> maybe the change is not that terrible (or at least, users have gotten
> used to lowering their expectations in this respect).
Yes, that’s happened. I looked for recent reports about this, and it
appears that the controversy is mostly over. For example,
<https://gihyo.jp/lifestyle/serial/01/ganshiki-soushi/0069> (dated 2015)
lamented the demise of Japanese Knoppix and said that Plamo Linux had
problems with EUC-JP and suggested users switch to UTF-8. More recently
<https://qiita.com/tenforward/items/5e353f290f0b401139cb> (dated this
year) says that the choice of EUC-JP or UTF-8 is user-specific for Plamo
Linux, and that applications like Firefox have problems with EUC-JP so
discretion is advised if you choose EUC-JP. If even hardcore holdouts
like Plamo are folding....
> - If the change is indeed problematic, can we adjust it by using
> a file-global language tag?
I hope that’s not necessary, but it’d be OK if we have to do it.
> - If that's not sufficient, can we use a scheme like that
> of etc/HELLO but to keep the files directly usable as Elisp (so as to
> have our cake and eat it too)?
etc/HELLO is pretty much a disaster for me now, as I can’t use any tool
other than Emacs to look at it, and even Emacs screws up if I do
something like 'M-x grep RET hello etc/HELLO RET'. I’d rather not extend
this disaster to other files.
PS. One minor suggestion for your patch: please also update the list of
files in admin/notes/unicode to remove mention of the files in question.
PPS. How about also converting etc/tutorials/TUTORIAL.ja,
lisp/leim/quail/hanja-jis.el, lisp/leim/quail/japanese.el,
lisp/leim/quail/py-punct.el, and lisp/leim/quail/pypunct-b5.el?
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Stefan Monnier, 2018/12/18
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Eli Zaretskii, 2018/12/18
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files,
Paul Eggert <=
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Eli Zaretskii, 2018/12/19
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Paul Eggert, 2018/12/19
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Eli Zaretskii, 2018/12/20
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Paul Eggert, 2018/12/20
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Eli Zaretskii, 2018/12/21
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Stefan Monnier, 2018/12/21
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Eli Zaretskii, 2018/12/21
- bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Eli Zaretskii, 2018/12/21
bug#33796: 27.0.50; Use utf-8 is all our Elisp files, Stefan Monnier, 2018/12/19