monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] sketch of i18n specification


From: graydon hoare
Subject: Re: [Monotone-devel] sketch of i18n specification
Date: 20 Nov 2003 11:17:20 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Ori Berger <address@hidden> writes:

> This list should also include '[','{',']','}' (things that need to be
> escaped in shell) if I understand the rational. Also, Windows needs to
> escape "," (comma) in the shell.

hm, ok. I think the intention was more that some filesystem APIs
actually *interpret* said characters, but I just took the list from
the boost general_name definition. ',' is pretty useful actually,
since RCS/CVS files have ",v" on the end of them.

> This, together with the FIXME line above raises the question: What
> does one do when a filename _does_ contain said characters? Refusing
> to take it into monotone is a possible solution, but I think there's a
> better idea: Use a standard escaping mechanism, e.g., the URL %xx
> escaping. This way, the file can be manipulated, tracked,
> etc. However, if you try to check it out, you'll get a legal file name.

well, there's two reasons to prohibit funny characters:

 - you're worried they will do something funny on the filesystem
 - they're structurally incompatible with manifests

so far the only characters we've found which are structurally
incompatible with manifests are EOL and NULL characters, and as njs
pointed out tools like md5sum and sha1sum already have an
(undocumented) standard way of handling those. reading the source,
they only fiddle with '\' and '\n', and only prohibit '\0'. so, I
guess we can follow that convention. it can be buried in the manifest
i/o functions.

for other "prohibited" characters, perhaps it's enough just to warn
loud (or fail) when they occur, and let the user explicitly permit
them with a command-line flag or hook. let the user shoot themselves
in the foot if they want, but make them pull the trigger.

(interestingly, the manifests we read aren't quite like sha1sum's just
 yet, since we haven't decided what to do about text vs.  binary files
 yet. sha1sum marks "binary" files with a * in the second whitespace
 column after the digest, on platforms which differentiate
 text/binary)

> Now, add a certificate for the file that certifies a translation back
> to "standard" chars, for a given operating system. Thus, only if I
> trust a key that certifies a file as "unix-shell-safe", will it
> translate back to native chars on Unix; And only if I trust a key that
> certifies a file as "win32-shell-safe", will it translate back to
> native chars on Windows.

no, this is more complex than I want to implement (also certs are no
good for this because you'd have to re-cert on each change to the
file, and certs attach to file contents not path names).

> The Windows case can be similarly fixed: If the same directory
> contains two names that map to the same case-insensitive name, both
> should be given "win32-alternate-name", or neither will be checked
> out.

yeah. perhaps it suffices to have a hook called
"prohibited_name(name)" and a pair of encoder/decoder hooks for
massaging names, as it appears we're heading with character code
modification.

> >   - as an abbreviation, setting the persistent attribute "text" with
> >     value "true" will enable both character and line ending conversion.
> Don't do that. What if text, lineconv and charconv disagree? "Text"
> could be a command line abbreviation, but it shouldn't be a real
> certificate, I think.

ok. like I said, if we're heading down the road of "a pair of
encoder/decoder functions" it won't matter anyways. the only reason
manifests are any different from file contents it that sha1sum
calculates them too.

> >   - file SHA1 values are calculated from the internal form.
> Need to add to the doc that sha1sum won't be able to verify a manifest
> in this case. That's a reasonable price for cross-platform text files.

yeah, this all needs to be spelled out in ugly detail in the manual.

> What about programs that want to run monotone and parse the output?
> There should be some "-machinereadable" switch that makes all output
> suitable for use by other programs.

certainly, or else a hook. or else just tell people to set LC_CTYPE to
some ostensibly neutral locale name like "C" for machine interfacing.

> > 6. cert values:
> >   - subject to character set and line ending conversion unless
> >     overridden by a hook.
> I'm not sure I understand this. When and how would the hook be applied?
> 
> Is binary zero a legal character in a cert value?

the hook would be applied when we're printing the cert value to the
screen, and also when reading input from a user, as a cert
value. think changelogs.

binary zero (or other bytes) would be legal if the hook said "don't do
character conversion on this, it's a binary file".

again though, this is fuzzy. maybe a pair of encoder/decoder functions
is the right route. it's certainly the most flexible.

-graydon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]