[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Arx-users] Re: [Fwd: i18n and file systems]
[Arx-users] Re: [Fwd: i18n and file systems]
Wed, 21 Dec 2005 16:00:14 -0800 (PST)
Kevin Smith <address@hidden> wrote:
> I haven't had a chance to read your repo II email, but I thought this
> was an interesting and relevant proposal on the bzr list. It hasn't yet
> drawn significant support or opposition. Feel free to move this to the
> arx list if you think it's appropriate there.
Thanks. I had seen this when it came out. I think that bzr has more
problems with filenames because it stores the weave in file names that
match the file in the working copy. ArX stores everything as
changesets, so this does not come up as much. As for dealing with
cases where a file can not be created on a particular filesystem, I
think ArX will already give a (cryptic) error message. It most likely
will die when unpacking the tarball or when creating a file during
> -------- Original Message --------
> Subject: i18n and file systems
> Date: Tue, 13 Dec 2005 16:46:58 +1100
> From: Robert Collins <address@hidden>
> To: address@hidden <address@hidden>
> CC: Alexander Belchenko <address@hidden>
> Hi, in debugging a recent problem with jbaileys automatic debian
> packages we found an interesting problem.
> When LANG=C, the test suite fails to pass:
> ERROR: test_commit_template (bzrlib.tests.test_msgeditor.MsgEditorTest)
> log from this test:
> Traceback (most recent call last):
> line 40, in test_commit_template
> working_tree = self.make_uncommitted_tree()
> TypeError: make_uncommitted_tree() takes no arguments (1 given)
> http://people.ubuntu.com/~robertc/baz2.0/tests-no-locale should trigger
> this on everyones system.
> Martin is currently disabling the specific test when it can't run (which
> is appropriate here).
> But it raises an interesting discussion we've kindof ignored. Firstly
> the background:
> Some file systems/platforms are unicode through and through - no matter
> what your terminal encoding is, the file system can still represent and
> return an unicode path. (Whether python figures this out and uses the
> appropriate apis is a good question). Examples are NTFS(on win32)
> (IIRC), and HFS+(with MacOSX). Lets call this unicode safe.
> Other file systems are 'code page' file systems - they essentially store
> just a byte string, and your user-space translation rule determines what
> that looks like. For instance linux's apis are all just byte-strings,
> the actual meaning of any file path segment is all in the eye of the
> user - on linux, try creating a unicode file name in a utf16 locale, or
> utf8 locale, and then switching to the other (or to something not even
> unicode, like one if the iso8859-x locales. all linux mounted fs's, VFAT
> are all I know about offhand. lets call this unicode-sometimes
> Theres a final category, which is platforms that cannot represent
> unicode in file paths at all - where the locale is non-unicode and you
> have a code page style file system api. buggah. non-unicode
> Now, when you access a URL or use something like FTP, it gets even
> trickier, because the encoding of the file being served by (say) apache
> may not match that that the user who wrote the file was using. This
> leads to URL's that cannot be predicted, and other such fun.
> Now for the interesting bits :).
> Firstly, I think we should be aiming to ensure that *no matter what*,
> files that bzr creates are named such that all such environments
> described above pun the filename as having the same value. Thats
> essentially 7 bit ascii (the places this breaks are sufficiently far
> between IME that we can ignore them).
> At the moment we *may* do that but we should go further:
> * We should write tests that check that regardless of revision-id
> value, or file-id value, the stores do not request non-ascii characters
> of paths from the transport layer. (Volunteers sought!) This involves
> teaching the stores to escape for the transport as part of the
> id->filename mapping *before* the url encoding is put on.
> That means that no matter where it is, a .bzr dir and its contents will
> look the same to us, so we are insulated from the coding effects.
> Secondly, the working tree is controlled by the users content, and there
> are many ways this can be broken: they can change their locale between
> runs of bzr; they can try to branch a branch that has unicode file names
> on a non-unicode platform. I think we can catch most of these errors and
> For instance, if status sees some big % of files disappear, and a large
> number of unknowns, it could try a couple of the unknowns and recode the
> relative path - that might just become valid known paths. Likewise if
> you branch a branch that needs unicode support on a non unicode platform
> we should give a good error.
> If someone wants to do up a wiki page and track the status of this, or
> even better start some tests, that would rock!
> Alexander - I explicitly copied you because I think you probably have
> the most complex setup of a bzr contributor at the moment, and are ideal
> to provide input/testing into this.
> GPG key available at: <http://www.robertcollins.net/keys.txt>.
- [Arx-users] Re: [Fwd: i18n and file systems],
Walter Landry <=