monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: Bug in CRLF conversions


From: Larry Hastings
Subject: Re: [Monotone-devel] Re: Bug in CRLF conversions
Date: Thu, 02 Feb 2006 11:37:53 -0800
User-agent: Thunderbird 1.5 (Windows/20051201)


FWIW, the Perforce documentation says they handle EOL translations like this:
  • When you add a file, you can explicitly specify what "type" it is.  Valid types in the current version are "binary", "text", "symlink", "unicode", and two types for those funny Macintosh resource files ("apple" and "resource").  The "type" is part of the per-file metadata stored in Perforce.
  • The "text" file type translates line endings; they don't say how.  "unicode" means "store the file in UTF-8 in the repository, and translate to the local code page upon sync/checkout".  (I bet "unicode" does EOL conversion too, but they don't say.)
  • If you don't explicitly specify a type, Perforce looks in the "typemap".  This is a per-depot text file, mapping wildcard filename specifications to types.  It is empty by default.
  • Failing to match the file in the "typemap", it guesses at the type by examining the first 8192 bytes.  If it discovers "nontext" bytes, it uses "binary", otherwise it uses "text".  A "nontext" byte is any byte > 127 (in other words, has its high bit set).
  • You can change a file's type at any time.
  • There are also type modifiers, like "+k" (perform keyword expansion, like $Date:$), "+x" (set execute bit), and more.
All this information was gleaned from publically available documentation from Perforce's website.  The main page of interest is the documentation on "file types", here:
    http://www.perforce.com/perforce/doc.052/manuals/cmdref/o.ftypes.html


My notes on this:
  • I'm surprised at their "nontext" heuristic; before I saw the documentation, I was guessing they'd look for characters < 32 that weren't valid whitespace characters.  A random sampling of binary files on my hard disk shows plenty of zeros in the first 256 bytes.
  • Their documentation mentions that some PDFs fail the file type guesser.  PDFs store comments first, and some wordy PDFs have > 8k of ASCII comments.  Though they ship an empty typemap, they do have a list of "recommended" entries which includes "any file ending with pdf -> binary".

I assert that no solution will do the right thing by default for everyone at any time.  But a conservative default, combined with the ability to adjust the transformation on a file-by-file basis at any time, should be Good Enough.

Cheers,


larry

reply via email to

[Prev in Thread] Current Thread [Next in Thread]