monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Looking at the code affected in bug 9752 leaves a w


From: graydon hoare
Subject: [Monotone-devel] Re: Looking at the code affected in bug 9752 leaves a weird taste...
Date: Wed, 18 Aug 2004 11:59:13 -0400
User-agent: Opera M2/7.53 (Linux, build 737)

On Wed, 18 Aug 2004 13:41:28 +0100, Bruce Stephens <address@hidden> wrote:

I'm not sure why we would need to keep track of anything but the
local line-ending.  Really.  Text is text.  Text is a vector of
lines, basically.  Line ending isn't really something that should be
considered part of the text per se.

Almost always, you're right: text is text, and you want it with the
client's native conventions.  However, now and again (because of some
broken tool, or broken process) it does matter: some tool needs the
file with some specific line-ending convention, regardless of the
platform's convention.  Such cases are sufficiently rare, though, (or
seem so to me---there may be domains where the situation is much more
common) that it wouldn't be a disaster if they were ignored entirely.

I'm afraid I can't concur with this picture of text, though it's appealing;
there are lots of files which "are text" and which don't do what you'd like.
for this discussion there are two important categories:

 1. those which treat any byte 0x0a as LF and 0x0d as CR
    (ISO-8859-x, GB, KOI, EUC, UTF-8)

 2. those which do not
    (ISO-2022-x, EBCDIC, UTF-16, UCS-4)

what we're discussing has implications at the character set and encoding
level, and I'm not comfortable forcing users to store their "text" files as
a particular character set, because such conversions can be lossy and
unwanted.

iow, it is not enough to say "since most users use text files most of the
time, we should freely munge EOLs unless users say --type=binary"; what
you are actualy saying is that "since most users use text type #1 files
most of the time, we should freely munge EOLs unless users say
--type=binary". I find that .. plausible .. but a bit less convincing.

when we last discussed this, iirc people clearly said they wanted to
retain control of the issue. so it needs to be possible for a particular
user to precisely specify which conversions are done on which charsets,
and when to do none at all. a hook needs to be involved.

that said, if the hook interprets things it finds in .mt-attrs, and a
default hook interprets a "binary" flag as meaning no-conversion, and
converts everything else, then it may be possible to make user-friendly
in the general case without losing precision in the special case.

(I should point out that while it sounds nice on paper, you cannot
 reliably infer text-ness or particular charset encoding; you can infer
 when something is *definitely* outside UTF-8 or ASCII, but not the other
 way)

-graydon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]