info-cvs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Spaces added ... and line endings in general


From: David H. Thornley
Subject: Re: Spaces added ... and line endings in general
Date: Wed, 24 Jan 2001 09:52:03 -0600

James Lyon wrote:
> 
> ES> How should a *single* CVS executable "accept/deal with" all of
> ES> the following, which it *must* do if it's to defend itself
> ES> against the kinds of abuse you want to throw at it?
> ES>   - Unix format: <LF>
> ES>   - DOS format:  <CR><LF>
> ES>   - Mac format:  <CR>
> ES>   - Files in which some lines use one of the above conventions,
> ES>     and some use another (because you edited a DOS-format file in
> ES>     vi on a Unix box, and didn't religiously type the ^v^m's)
> ES>   - Unix-format files that contain <CR>s as actual formatting
> ES>     characters -- perhaps even at the ends of lines, for doing
> ES>     overstriking, so looking specifically for <CR><LF> is unsafe
> ES>   - Record-oriented formats which use length words and have no
> ES>     terminator at all.  This is old mainframe stuff -- dying, but
> ES>     alas not dead yet.  (For an example, see below.)
> 
On the dinosaur I worked on for over a decade, lines were terminated
by two character's worth of binary zeros.  (They're retiring it now,
but its use did overlap the CVS era by quite a few years.)

> The request was to handle only the line-terminated approach, and not
> the record-orientated approach.
> 
That was a request.  It isn't the only possible request.  Since it
is a request to change CVS, it has to be looked at globally.

> You just treat *any* <LF> *or* <CR> as a line terminitor with the
> single *exception* that any <LF> that is found immediately following a
> <CR> is skipped without treating it as a line terminator.
> 
Which violates the fifth case above.  I don't know how often \r
is used for formatting, but the fact that it's got its own escape
sequence in C suggests that somebody's probably using it for 
something - or at least that's a possibility you've got to
consider.

In short, this is a change that has the potential to really
screw up somebody's files somewhere.  I don't think this is
something to do lightly.

> This simple algorithm is *very* effective except when a "formatting"
> <CR> is used with something following it other than an <LF>. But
> that's logically ambiguous anyway and so you have to tell "it" what to
> do with such situations.
> 
Um, no, that's not logically ambiguous.  In a Unix text file,
the \r is very well defined.  If you don't know whether it's a
Unix text file or not, then you've got an ambiguity problem.

> Having said all that, the real answer is to use utilities like
> unix2dos and dos2unix to make sure your files are fixed before using
> them in the particular environment... so the problem is evaded in the
> first place.
> 
Right.  Alternatively, if you know your files do not contain
embedded \ns and \rs, you can run a filter to make sure the
files have the right line-ending conventions.  This is something
that should be done on a local basis, since you can be sure of
your own file contents but not everybody else's.

(And in reply to an earlier email:  a three-l lllama is in fact
a pretty bad fire in Brooklyn.)

(Explanation for those who need it:  in the US, fires are often
measured in number of alarms, where a one-alarmer is a routine
fire, and a five-alarmer is catastrophic.  In the traditional
Brooklyn accent, "three-alarmer" would be pronounced much like
"three-l lama".  I suppose this is what I get for trying to
use a accent- and jargon-based joke in an international forum.)

-- 
David H. Thornley                          Software Engineer
at CES International, Inc.:  address@hidden or (763)-694-2556
at home: (612)-623-0552 or address@hidden or
http://www.visi.com/~thornley/david/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]