[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] sketch of i18n specification
From: |
Nathaniel Smith |
Subject: |
Re: [Monotone-devel] sketch of i18n specification |
Date: |
Thu, 20 Nov 2003 22:48:50 -0800 |
User-agent: |
Mutt/1.5.4i |
On Thu, Nov 20, 2003 at 09:50:47AM -0500, graydon hoare wrote:
> Nathaniel Smith <address@hidden> writes:
>
> > On Tue, Nov 18, 2003 at 11:58:30AM -0500, graydon hoare wrote:
> > ...
> > > - if a file has the persistent attribute "charset", its value will
> > > be used instead of the LC_CTYPE locale setting.
> >
> > That's kind of wierd. If I always want a file to be in some
> > particular character set, why would I turn on character conversion in
> > the first place?
>
> *shrug* I thought some people might find themselves unhappy with
> things inferred from LC_CTYPE (or any $ENV setting); just thought I'd
> give a way to override it.
That seems like the sort of thing you want to use a hook for, not a
file attribute? LC_CTYPE is per-user; surely the overriding
mechanism should also be per-user? (In fact, one could just make the
default hook return $LC_CTYPE.)
> > Is there any benefit to storing files in the repository with non-UTF8
> > encodings, but still allowing character conversion? (I guess for
> > those people whose LC_CTYPEs matched the repository encoding, this
> > would make sha1sum work again.)
>
> I suppose there might be. some people claim -- I've only heard it from
> japanese people, but maybe there are others -- that their character
> sets aren't all very well represented in unicode. perhaps the UTF-8
> transform would be unacceptably lossy to them.
>
> do you think it would be better *not* to specify the internal form,
> but let a hook / attribute select it? in other words, specify an
> incoming and an outgoing mapping function for each file?
Hrm, I'm really not sure. It might be simpler and closer to what
people expect; on the other hand, you end up with more worry about
lossy round-trips, which are (supposed to be) impossible with Unicode.
(Though I dunno about normalization and such affecting SHA1s.) And
simply saying that everything in the database is Unicode could also be
considered simpler.
Erk, here's a nasty case: what happens if I do a checkout with one
value for LC_CTYPE, then through mishap or oversight commit using a
different value? (Maybe the checkout happens from cron and uses a
different environment or something, I dunno, insert story here.)
Character set conversion could go really screwy. I guess this means
that MT/ needs to keep track of what character set each checked out
file is in... though I'm not sure how this would interact with
concurrent modifications to .mt-attrs, or how new files would be
handled.
This is all getting a bit murky; I'm tempted to just punt for now?
Line ending conversion hooks mean that monotone will have the basic
infrastructure for this built in, and a particular canonical
implementation can be added later once there are actual i18n-using
users? The fact that no other VCS (AFAIK) has this feature also tends
to make me wary of diving in and picking one; there's no experience,
and possibly not much demand...
> > > - subject to character set and line ending conversion unless
> > > overridden by a hook.
> >
> > Err, so I can't put binary data in a cert? What about test result
> > files and the like?
>
> ... "unless overridden by a hook"
Yeah, I saw that, I just didn't understand it :-).
The idea is to have bool binary_data_okay(string cert_name)?
> keep in mind, the current idea I'm gravitating towards for tests is
> not to encode their results into certs per-se, but only the approval
> or disapproval of an external rule ("no regressions since baseline
> XXX").
Ah, I can see that.
> (this item is here mostly because I'd like for it to remain possible
> to view most cert values -- especially changelog entries -- on a
> terminal)
Fair enough. What does this mean exactly, in practice? We refuse to
create certs whose contents is not valid UTF-8, unless allowed by
hook? We refuse to import certs whose contents etc.? We refuse to
display certs whose contents etc.?
(The last is probably not a bad idea anyway; terminal control codes
are scary things.)
-- Nathaniel
--
Eternity is very long, especially towards the end.
-- Woody Allen