monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] sketch of i18n specification


From: Nathaniel Smith
Subject: Re: [Monotone-devel] sketch of i18n specification
Date: Thu, 20 Nov 2003 22:48:50 -0800
User-agent: Mutt/1.5.4i

On Thu, Nov 20, 2003 at 09:50:47AM -0500, graydon hoare wrote:
> Nathaniel Smith <address@hidden> writes:
> 
> > On Tue, Nov 18, 2003 at 11:58:30AM -0500, graydon hoare wrote:
> > ...
> > >   - if a file has the persistent attribute "charset", its value will
> > >     be used instead of the LC_CTYPE locale setting.
> > 
> > That's kind of wierd.  If I always want a file to be in some
> > particular character set, why would I turn on character conversion in
> > the first place?
> 
> *shrug* I thought some people might find themselves unhappy with
> things inferred from LC_CTYPE (or any $ENV setting); just thought I'd
> give a way to override it.

That seems like the sort of thing you want to use a hook for, not a
file attribute?  LC_CTYPE is per-user; surely the overriding
mechanism should also be per-user?  (In fact, one could just make the
default hook return $LC_CTYPE.)

> > Is there any benefit to storing files in the repository with non-UTF8
> > encodings, but still allowing character conversion?  (I guess for
> > those people whose LC_CTYPEs matched the repository encoding, this
> > would make sha1sum work again.)
> 
> I suppose there might be. some people claim -- I've only heard it from
> japanese people, but maybe there are others -- that their character
> sets aren't all very well represented in unicode. perhaps the UTF-8
> transform would be unacceptably lossy to them.
> 
> do you think it would be better *not* to specify the internal form,
> but let a hook / attribute select it? in other words, specify an
> incoming and an outgoing mapping function for each file?

Hrm, I'm really not sure.  It might be simpler and closer to what
people expect; on the other hand, you end up with more worry about
lossy round-trips, which are (supposed to be) impossible with Unicode.
(Though I dunno about normalization and such affecting SHA1s.)  And
simply saying that everything in the database is Unicode could also be
considered simpler.

Erk, here's a nasty case:  what happens if I do a checkout with one
value for LC_CTYPE, then through mishap or oversight commit using a
different value?  (Maybe the checkout happens from cron and uses a
different environment or something, I dunno, insert story here.)
Character set conversion could go really screwy.  I guess this means
that MT/ needs to keep track of what character set each checked out
file is in... though I'm not sure how this would interact with
concurrent modifications to .mt-attrs, or how new files would be
handled.

This is all getting a bit murky; I'm tempted to just punt for now?
Line ending conversion hooks mean that monotone will have the basic
infrastructure for this built in, and a particular canonical
implementation can be added later once there are actual i18n-using
users?  The fact that no other VCS (AFAIK) has this feature also tends
to make me wary of diving in and picking one; there's no experience,
and possibly not much demand...

> > >   - subject to character set and line ending conversion unless
> > >     overridden by a hook.
> > 
> > Err, so I can't put binary data in a cert?  What about test result
> > files and the like?
> 
> ... "unless overridden by a hook"

Yeah, I saw that, I just didn't understand it :-).

The idea is to have bool binary_data_okay(string cert_name)?

> keep in mind, the current idea I'm gravitating towards for tests is
> not to encode their results into certs per-se, but only the approval
> or disapproval of an external rule ("no regressions since baseline
> XXX").

Ah, I can see that.

> (this item is here mostly because I'd like for it to remain possible
>  to view most cert values -- especially changelog entries -- on a
>  terminal)

Fair enough.  What does this mean exactly, in practice?  We refuse to
create certs whose contents is not valid UTF-8, unless allowed by
hook?  We refuse to import certs whose contents etc.?  We refuse to
display certs whose contents etc.?

(The last is probably not a bad idea anyway; terminal control codes
are scary things.)

-- Nathaniel

-- 
Eternity is very long, especially towards the end.
  -- Woody Allen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]