gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] i18n filenames vs. escaping


From: Tom Lord
Subject: [Gnu-arch-users] i18n filenames vs. escaping
Date: Tue, 3 Feb 2004 10:38:38 -0800 (PST)

So, really, the question comes down to how much effort 
people want to put into working on internationalized filename support
in arch sooner rather than later.

The question is relevent because of the way that it interacts with the
work on spaces-in-filenames.


* the "later" plan

  Simply retain the restriction that filenames must not 
  contain non-ASCII characters.


* the "sooner" plan

  Arch doesn't perform file system access directly -- rather, it
  all goes through the `vu_' functions in libhackerlab.

  `vu_' allows filenames to go through a translation phase before
  being passed to functions like `open' or returned from functions
  like `readdir'.

  The right way to support extended character sets in arch is to 
  write `vu_' modules which will translate between the native 
  encodings of a local filesystem and UTF-8.   arch should use 
  utf-8 filenames internally, in changesets, in log files, in 
  index files, etc.

  This has two immediate implications:

  a) The escaping code, if it is not restricted to ASCII characters,
     should assume a UTF-8 encoding, not an
     ASCII-plus-unspecified-8-bit-extension encoding.

  b) Volunteers are needed to begin writing the `vu_' layers.
     For the most part -- for those using UTF-8 locally -- this
     should be utterly trivial.   I might be willing to work on this
     myself but I'd prefer to find a volunteer.


In light of this:

1) The ideal is that someone wants to help work on the vu_ translation 
   module and that the escaping code assumes UTF-8 from day 1.

2) Acceptable, I suppose, is that the escaping code retains the
   hard-coded 7-bit limitation.  This will mean more work (to fix
   escaping) when UTF-8 filename support is added later and also means
   that the escaping code doesn't give us "extended character set
   filenames for free".

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]