bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnulib] Re: ISSLASH on Woe32


From: Tor Lillqvist
Subject: [bug-gnulib] Re: ISSLASH on Woe32
Date: Thu, 28 Apr 2005 04:18:21 +0300

>> the byte 0x5C occurs as second byte of some multibyte characters. If such a
>> character is used inside a directory name, code that uses ISSLASH does not
>> work correctly. All gnulib modules that use ISSLASH are affected.

>Could this also be a problem on Unix systems using multibyte encoded
>(UTF-8) filesystems, if not now then in the future? 

Nope. Unix kernels/filesystems don't care at all what encoding the
file names are in. Encodings are handled in userspace. The only thing
that matters is that a '/' (0x2F) or '\0' byte can't be part of a
directory entry name. I don't think this is going to change.

>Maybe some (future) Unix systems support multi-byte encoded filenames
>containing 0x3F in the second+ byte of a multi-byte character.

0x3F is '?'. You mean '/', 0x2F? I very much doubt that.

>It's probably best to choose one internal representation of pathnames
>and stick to it, but any representation other than single 'char' is a
>lot of work, as you say!

Well, UTF-8 is the one that causes least problems for legacy code,
IMHO, although it is a variable-length multi-byte representation. As
long as the code doesn't try to do things like split file names at
random points between path component separators, or case convert
single bytes, legacy code just works.

Issues that need work when porting to Windows are the obvious:
accepting both '/' and '\\' as directory separator and handling the
multitude of roots. On Unix leading slash(es) indicate an absolute
pathname, while on Windows it can be any of \, X:\, \\server\share\,
\\?\X:\ or \\?\UNC\server\share\, where the backslashes in most
cases(?) can also be slashes. I haven't checked whether freely mixing
slashes and backslashes as in monstrosities like
//?\UNC/server\share\dir/file.foo would actually work, though.

I've never seen the \\?\ or \\?\UNC\server\share\ cases being handled
in any Open Source code, and certainly not bothered myself with them
either... But they are legal in the Unicode Win32 API (and in fact in
a sense they are the "canonical" way to specify absolute pathnames,
according to the docs), so if one is a perfectionist, one should. The
docs say that for normal paths the max length is 259 (drive letter,
colon, backslash, 256 chars), but if you prefix with \\?\, the Unicode
version of the API permits a path length of 32767.

>Maybe the wrapper functions could avoid converting to and from UTF-16
>if they are running on WinME and earlier.

That's what GLib does:

int
g_open (const gchar *filename,
        int          flags,
        int          mode)
{
#ifdef G_OS_WIN32
  if (G_WIN32_HAVE_WIDECHAR_API ())
    {
      wchar_t *wfilename = g_utf8_to_utf16 (filename, -1, NULL, NULL, NULL);
      int retval;
      int save_errno;
      
      if (wfilename == NULL)
        {
          errno = EINVAL;
          return -1;
        }

      retval = _wopen (wfilename, flags, mode);
      save_errno = errno;

      g_free (wfilename);

      errno = save_errno;
      return retval;
    }
  else
    {    
      gchar *cp_filename = g_locale_from_utf8 (filename, -1, NULL, NULL, NULL);
      int retval;
      int save_errno;

      if (cp_filename == NULL)
        {
          errno = EINVAL;
          return -1;
        }

      retval = open (cp_filename, flags, mode);
      save_errno = errno;

      g_free (cp_filename);

      errno = save_errno;
      return retval;
    }
#else
  return open (filename, flags, mode);
#endif
}

G_WIN32_HAVE_WIDECHAR_API() is a run-time test for NT-based
Windows. wchar_t is a short on Windows (well, the Microsoft C library
to be precise), and wchar_t strings are UTF-16. _wopen() is the
wide-char variant of open() in the C library. g_locale_from_utf8()
converts to the system codepage, which on Windows is either
single-byte or single/double-byte. (Unfortunately there is no UTF-8
codepage. Or actually, there is (65001), but it can't be the system
codepage.)

>Anyway, Windows 95/98/ME is history. Not a platform worth caring about any
>more.

I agree. Unfortunately, "customers" think differently. There are still
lots of people struggling along with Win98, and using GTK+-based
software like GIMP or GAIM. GTK+ 2.8, perhaps, won't run any longer on
Win9x/ME.

Cheers,
--tml





reply via email to

[Prev in Thread] Current Thread [Next in Thread]