gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Increasing the filename space (Or: begging for trou


From: Tom Lord
Subject: Re: [Gnu-arch-users] Increasing the filename space (Or: begging for trouble?)
Date: Tue, 3 Feb 2004 08:12:15 -0800 (PST)

    > From: Christian =?ISO-8859-1?Q?Th=E4ter?= <address@hidden>

    >> static int
    >> contains_illegal_character (char * filename)
    [....]

    > The function above will be changed (or maybe completely
    > removed).

I'd appreciate it if it were parameterized by a regexp that can be set
in =tagging-method.  The default should not change.  The main idea
being that tree-lint really is useful as a lint-like tool for
filenames.  The secondary idea being that that is an upwards
compatible way to make the change.

As mtr put it:

   >> 3) If my assumption about the intension of the =tagging-method regexps
   >>    is right, then _those_regexps_ should be the controlling variables,
   >>    _not_ a statically compiled and very restrictive set of characters.


You remark:

   > A first review release will be available within the next few
   > days.

I realize that the bulk of this is a global change that really wants
merging all at once but it sure would be swell to break it up into
independently useful parts and independently testable parts and merge
in increments, a few days apart, both to simplify review and to get 
test-by-using at better granularity.

For example, it might make sense to do:

   ~ illegal_character generalization
   ~ hackerlab parts and escape-on-writing 
     (should, functionally, be a noop -- or able to generate 
      momentarily unreadable revisions for users who change
      the illegal characters set too early)
   ~ unescape-on-parsing  (enables the whole shebang)

(All of those can hopefully go in a single preX release -- I'm just
talking about staggering them into the head revision between preX
releases.)


   > unless you use utf-8 or a similar international encoding your
   > repository will be unportable, because it depends on the actual used
   > encoding scheme for such letters, wich might be diffrent at someone
   > who tries to check it out.  Note: utf-8 is not supported in tla yet.
   [....]
   > in short: having international characters is more problematic
   > than just a crude workaround at one place. Usually programming
   > sourcefile names should be 7-bit ascii anyways. There is a need
   > for files like documentation and so on. Unfortunally tla can't
   > handle that currently and fixing that in the way you suggested
   > might break archives/compatibility.

Yeah, it's quite a mess.

I think arch should simply stick to the stream-of-bytes view of
filenames.  That enables UTF-8 filenames to operate perfectly well.

It does mean that index files in changesets and similar data files
will be permanently ugly for UTF-8 filenames.

It means that patch logs, the output of `inventory', etc. will have a
peculiar encoding form -- users are free to put UTF-8 into headers or
body content that they write, but filenames will look odd.  That
problem, however, can be fixed by a thin-layer of transcoding on code
that spews these things for human consumption.

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]