[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: CVS and unicode

From: Arthur Barrett
Subject: RE: CVS and unicode
Date: Thu, 8 Sep 2005 08:39:23 +1000


>>> In CVS a Unicode file has to be a Binary file (-kb) - which prevents

>>> merging, diffs, etc etc.  If you do not define it as -kb then 
>>> eventually the file will be corrupted.
>This is completely wrong and lacks any technical substance. 

Firstly don't mistake me for any Unicode/UTF-8/UTG-16 guru - I was
simply trying to answer the question in a helpful way.

This time I'm just trying to clear up a couple of things about what the
CVSNT for Linux/Unix/Windows (free / GPL) implmentation of Unicode
support can and can't do based on Christian's comments.  I hope the
information is helpful to those following the discussion.

> Now on the core. UTF-8 files needn't be binary files, in fact, if you
> normal CVS behaviour in the way you're used to it for ASCII text
files, they 
> mustn't be binary files. 

Yes.  And that was the point of my original reply.  But you've certainly
worded it better.

> Differences occur with extended encodings like ISO-8859-x (e.g.
ISO-8859-1 or 
> ISO-8859-15 etc.) or Windows CP-* (e.g. Windows CP-1252). In these

With CVSNT the file will be checked in/out in UCS-2 (or UTF-16) encoding
and internally stored as UTF-8 by the server.  You can also use an
extended encoding  -- any encoding supported by the client-side iconv
library can be used.  This allows you to specify that a file uses
ISO-8859-1 and have it converted (by iconv) to the locale used by the
current client.  This way a single user can checkout 10 files that each
use different extended encodings and not have to change their
environment variable for each file (and work out what to change it to).

> The Unicode thingy in CVSNT is just a hack to work around operating
> issues regarding MS Windows.

No (but it helps this too) - see your own next comment.

> UTF-16 in fact can be problematic. Normal keyword substitution is
likely to 
> fail at least with some older versions of CVS. 

Not just keyword substitution, but merges and diffs, line endings etc

All versions of CVS other than CVSNT need to treat UTF-16 files as

> uses wchar instead of char for keyword substitution. UTF-16 isn't in 
> widespread use, so I didn't care about that yet.

UTF-16 is the native internal representation of text in the NT based
versions of Windows (NT/2000/XP/2003) and in the Java and .NET bytecode
environments, as well as in Mac OS X's Cocoa and Core Foundation

If anyone is after more information please contact the CVSNT newsgroup
because this is the limit of my knowledge:

Also the CVSNT manual may be of some help:


Arthur Barrett

reply via email to

[Prev in Thread] Current Thread [Next in Thread]