[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CVS and unicode

From: ai26
Subject: Re: CVS and unicode
Date: Sun, 11 Sep 2005 00:22:13 +0200

In a message of Sat, 10 Sep 2005 17:52:09 +0200
Received on Sat, 10 Sep 2005 18:07:17 +0200

Christian Hujer <address@hidden> wrote to address@hidden

>Am Samstag, 10. September 2005 16:04 schrieb Spiro Trikaliotis:
>> Hello Christian,
>> * On Sat, Sep 10, 2005 at 12:38:19PM +0200 Christian Hujer wrote:
>> > Currently, CVS has extremely tolerant behaviour regarding binary files
>> > which were accidently added as text files. As long as they do not contain
>> > keywords (like $Id...$), they are extremely likely to still be handled
>> > conveniently.
>> This is true for Unix based systems, but not for systems where CR/LF is
>> the usual line ending. Checking in a binary file from a Windows system,
>> you have very good chances to break it if there is a CR/LF anywhere
>> inside of it.
>> For non-CR/LF machines, checking in binary files without -kb does
>> not do any harm even if there are keywords ($Id$, for example) inside of
>> it. CVS checks them in "as-is" and only expands the keywords on
>> checkout. Thus, if you forgot doing the -kb on checkin, just set the
>> state afterwards with cvs admin and check the file out again.
>> As told, this is NOT true for CR/LF based systems.
>It's even true for CRLF. The CRLF byte sequences are:


you are still missing the point even with Spiro's explanation.  A
non-Unix cvs *client* will convert any CR/LF sequence of the sandbox
file into a plain LF in the ,v repository file on checkin.  Therefore, a
binary file not checked in with -kb will loose every 0x0D that preceedes
a 0x0A.  And you can't restore them since you don't know which 0x0A was
preceeded by a 0x0D and which one wasn't.  It's a binary file after all.

Of course if your binary never had 0x0D 0x0A sequences then you are fine
with admin -kb but you generally can't assume they don't occur.

>ASCII: 0x0D 0x0A.
>UTF-8: 0x0D 0x0A.
>UTF-16 LE: 0x0D 0x00 0x0A 0x00.
>UTF-16 BE: 0x00 0x0D 0x00 0x0A.
>CVS will not interfer with any of these.
>UTF-16LE sequence will be split within the LF char. But since the next line
>will be split at exactly the same point, this is not a problem for line
>Also, CVS behaves very fine when using CR/LF (though I regard CR/LF being
>deprecated for various other reasons), independently of the encoding (at
>least those encodings discussed here).


reply via email to

[Prev in Thread] Current Thread [Next in Thread]