monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Looking at the code affected in bug 9752 leaves a w


From: graydon hoare
Subject: [Monotone-devel] Re: Looking at the code affected in bug 9752 leaves a weird taste...
Date: Tue, 17 Aug 2004 22:19:47 -0400
User-agent: Opera M2/7.53 (Linux, build 737)

On Wed, 18 Aug 2004 02:15:31 +0200 (CEST), Richard Levitte - VMS Whacker 
<address@hidden> wrote:

From looking at the lua hook get_linesep_conv, it looks like it's
possible to customize the standard line ending in the database.
However, I think that would be superbly dangerous, when considering
that several databases coming from people upholding different line
ending standards would be synced together...  The thought makes me
cringe.

we had some discussion about this long ago, and if memory serves the
consensus which came out of that was that there isn't always a single
"standard line ending" for a project. sometimes you are storing a
mixture of files, some intended for windows and some for linux;
sometimes you want a checkout to be byte-for-byte identical to the
way it's stored and sometimes you want to have the VCS massage things
to fit your current platform. so we left it open to customization.

however, that was a conservative consensus, and it has clearly failed
in practise. so let's work out what it ought to do, not worry too much
about what the current (wrong) code does.

Then, looking at the function split_into_lines(), it's obvious that it
hasn't been designed with CRLF in mind.  For example, with the line
"foo\r\n", it will end up with the tokens, "foo" and "", interpreted
as two lines.  This is the actual cause of the trouble detected in bug
9752.  Actually, the boost token functions aren't enough to be able to
detect CRLF, LF and CR line endings, so something like
boost::char_separator, but taking a vector of strings instead of a
string of separators is needed.

good. or better yet, since this is a relatively well-understood task,
perhaps we can just code it as a manual loop over the string. keep in
mind that transforms.cc is some of the oldest code in monotone, and my
taste for doing things the "clever boost way" has sort of waned as
work has proceeded over the past couple years.

Unfortunaly, just changing the internals of split_into_lines() isn't
enough.  It really needs to get an external specification of what the
line endings should be on the local system, for each file.

does it? or does it simply need to have the line endings preserved as
*part of* the lines it returns? I'm not sure which is right. it depends
a lot on who calls it. perhaps we should make a list and see what each
caller expects.

This takes me back to get_linesep_conv.  Why, exactly, whould it be
able to specify what the line separator should be in the database?
Why isn't get_system_linesep enough, especially if it can be given a
file name as argument?

probably the decision was made as a mirror of the charset conversion
choice: to avoid forcing users to store their files in monotone in a
particular encoding, if the external encoding will always differ.

that's actually important with charsets, because charset coding can be
lossy: if I force everyone to use UTF-8 internally, they might actually
lose data when going from system -> monotone -> system. the same is
not really true for line separators though. or perhaps it is. I don't
actually know. anyone familiar with this want to shed some light?

-graydon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]