info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CVS and unicode


From: Christian Hujer
Subject: Re: CVS and unicode
Date: Sun, 11 Sep 2005 15:03:37 +0200
User-agent: KMail/1.7.1

Hi,

Am Sonntag, 11. September 2005 00:22 schrieb address@hidden:
> In a message of Sat, 10 Sep 2005 17:52:09 +0200
> Received on Sat, 10 Sep 2005 18:07:17 +0200
>
> Christian Hujer <address@hidden> wrote to address@hidden
>
> >Am Samstag, 10. September 2005 16:04 schrieb Spiro Trikaliotis:
> >> Hello Christian,
> >>
> >> * On Sat, Sep 10, 2005 at 12:38:19PM +0200 Christian Hujer wrote:
> >> > Currently, CVS has extremely tolerant behaviour regarding binary files
> >> > which were accidently added as text files. As long as they do not
> >> > contain keywords (like $Id...$), they are extremely likely to still be
> >> > handled conveniently.
> >>
> >> This is true for Unix based systems, but not for systems where CR/LF is
> >> the usual line ending. Checking in a binary file from a Windows system,
> >> you have very good chances to break it if there is a CR/LF anywhere
> >> inside of it.
> >>
> >> For non-CR/LF machines, checking in binary files without -kb does
> >> not do any harm even if there are keywords ($Id$, for example) inside of
> >> it. CVS checks them in "as-is" and only expands the keywords on
> >> checkout. Thus, if you forgot doing the -kb on checkin, just set the
> >> state afterwards with cvs admin and check the file out again.
> >>
> >> As told, this is NOT true for CR/LF based systems.
> >
> >It's even true for CRLF. The CRLF byte sequences are:
>
> Christian,
>
> you are still missing the point even with Spiro's explanation.  A
> non-Unix cvs *client* will convert any CR/LF sequence of the sandbox
> file into a plain LF in the ,v repository file on checkin.  Therefore, a
> binary file not checked in with -kb will loose every 0x0D that preceedes
> a 0x0A.  And you can't restore them since you don't know which 0x0A was
> preceeded by a 0x0D and which one wasn't.  It's a binary file after all.
>
> Of course if your binary never had 0x0D 0x0A sequences then you are fine
> with admin -kb but you generally can't assume they don't occur.
I've seen many non-Unix cvs clients that will not convert CR/LF sequences.
I'm busy in various projects where software will go wild if finding CR/LF, it 
only accepts NL (formerly LF). The files are text files, they are checked in 
without -kb, and when checking out on Windows, no CR is added.
(E.g. Daimonin MMORPG data files, which must be checked out with LF only 
settings on Windows if the client has an optional CR/LF conversion setting. 
Both, the Java editor and the server will not allow CR in the data files)

This is a setting, which does not apply to non-Unix cvs clients but only to 
very specific CVS clients on those few operating systems using CR/LF instead 
of LF. And also it is a setting that can be disabled / enabled.

So if users use the wrong settings for CR/LF conversion, that's not the 
fault / problem of CVS or Unicode.

True, UTF-16 might contain byte sequences where diff might fail in general 
(see my other post).

But CR/LF is a os-specific problem that's best handled in the text editor. 
Even on Windows there are text editors capable of using NL only (gvim, 
UltraEdit, Emacs, most IDEs and various others).

CVS is a version control system, not a text processor. Let version control 
issues be handled by version control, text processing issues by text editors.


Christian




reply via email to

[Prev in Thread] Current Thread [Next in Thread]