monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] I18n, filename encoding, text file encoding


From: Christof Petig
Subject: [Monotone-devel] I18n, filename encoding, text file encoding
Date: Fri, 28 Nov 2003 11:20:43 +0100
User-agent: Mozilla/5.0 (X11; U; Linux ppc; de-AT; rv:1.5) Gecko/20031110 Debian/1.5-3

I recently spent some thoughts envisioning a decent scheme for internationalized SCM (source code management) support.

To make local use possible now, monotone should allow binary filenames (containing spaces as well as high bit characters (in whatever encoding used by the importer)). Actually this prevents me from dumping cvs right now. A later migration to UTF-8 etc might be done using rename operations.

IIRC a lot of people did not yet migrate to UTF-8 when it comes to file names. (to be honest the filenames on all computers I know of are still 8859-1) And I can not see that _all_ people will use UTF-8 for filenames in the near future (about five years that is). Western Europe tends to stick with 8859-1/15 and I suspect CJK (east asian) users will stick to one of their encodings. But to support real international development (or even development between developers using different file name encodings [likely during the transition to utf-8]) monotone might support filename encoding conversions when interfacing checked out version and database version.

This leads me to another problem: Sometimes file name encoding and file content encoding are not the same. I have a lot of UTF-8 encoded files for gtk2 projects while I still use ISO for file names.

Perhaps people using different code sets want to co-edit text files. I suspect that this is far more likely to occur in CJK.

So several things are possible:
- stay away from this can of worms and make people of a project agree on name and content encoding [while accepting "illegal in UTF-8" sequences for the poor people still using different code sets] Assume utf-8 names for a windows client (which clearly has native unicode filenames).

- tackle file name conversion and stay away from content conversion

- go for full encoding transparency. [Someone from CJK should comment on whether this is interesting]

Clearly confusing but 8bit transparency will already cover my actual needs.

   Christof

PS: There's a misspelling in http://www.venge.net/monotone/self-hosting.html:

$ monotone --db=monotone.db lscerts manifest dcc23
should read
$ monotone --db=monotone.db ls certs manifest dcc23





reply via email to

[Prev in Thread] Current Thread [Next in Thread]