bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] canonicalize-lgpl: Canonicalize casing too for MinGW.


From: Jan Nieuwenhuizen
Subject: Re: [PATCH v2] canonicalize-lgpl: Canonicalize casing too for MinGW.
Date: Fri, 10 Dec 2021 07:59:54 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Bruno Haible writes:

Hello Bruno,

Thanks for your quick and thougtful response.

TL;DR: Okay, well too bad.  I guess we and everyone else has to patch
their client code, then?  For the archives: a version 2 attached that
fixes two bugs.

>> * lib/canonicalize-lgpl.c (filesystem_name)[__MINGW32__]: New static
>> function.
>> (realpath_stk)[__MINGW32__]: Use it to return correct canonicalized
>> casing.
>
> I don't think this is desirable, because

Oh, too bad!  :-)

> 1) The 'realpath' function that canonicalize-lgpl.c implements is
>    specified to return "an absolute pathname that resolves to the same
>    directory entry, whose resolution does not involve '.', '..', or symbolic
>    links." [1] There is no guarantee in the spec that it prefers lowercase,
>    uppercase, or the case of the existing directory entry.

Yeah...but wasn't that documentation written long before someone could
have thought of the enormity of a "file system" that doesn't respect
casing?  That's what the name of the function suggests to me.

> 2) If we wanted to make this function consistent on all platforms, we would
>    also need to handle
>      - Linux with mounted VFAT file systems,
>      - macOS with case-insensitive HFS+,
>      - different locales on Windows (e.g. to recognize that 'ä' and 'Ä' are
>        equivalent in Windows installations with Western locales).
>    And, on macOS with HFS+, also the Unicode canonicalization (NFC vs. NFD).

If that is what we would really like, someone could work on fixing those
too, no need for this patch to fix all bugs at once?

> 3) By doing this, the function would be slowed down significantly. The
>    scandir() call that you added reads all directory entries of a certain
>    directory.

Yes, this is probably the biggest problem with this patch, although I
thought it might just have been alright: If you care a lot about speed,
you aren't using Windows anyway and I understood that STAT is already
pretty slow there.  On the other hand, in favor of this patch: If
correctness isn't all that important, I can think of a patch decreases
correctness and will dramatically speed up canonicalize_file_name ;-)

> What exactly do you want to do?

Maybe I patched the wrong function, I didn't know about realpath.  We
are using canonicalize-path in Guile, which uses canonicalize_file_name.

I think of a canonical form of something as a unique, authoritative
representation.  As long as canonicalize_file_name can return both
"/foo" and "/FOO", the name of the function is sloppy/wrong, or its
implementation (IMHO).  I guessed the latter, hence this "fix".

We are using the canonical form as an automatic include guard, to not
include the same file twice.  A user reported a bug: In a large project
they used an import of a file with alternative casing.  I was ready to
say: "Well, don't do that", but...yeah.

Instead of patching our client code and having everyone else who targets
Windows find this out and patch their client code, I wanted to help
and fix it in the Right Place(TM).

> If you want to look at the file name of
> an existing directory entry, let your program use scandir().
>
> Additionally,
>
>> +  int select_base (struct dirent const* entry)
>> +  {
>> +    return strcasecmp (entry->d_name, base) == 0;
>> +  }
>
> I would refrain from adding code that requires GCC and does not work with 
> MSVC.

Right, I will try to remember that.  That's pretty terrible here, we
would need a global (or thread local storage) I guess..  Didn't fix this
in v2, as this isn't going in anyway.

Greetings,
Janneke

>From e5e7e10927d5c20bf4ba0a20c7343b389bd192a2 Mon Sep 17 00:00:00 2001
From: "Jan (janneke) Nieuwenhuizen" <janneke@gnu.org>
Date: Thu, 9 Dec 2021 18:39:33 +0100
Subject: [PATCH v2] canonicalize-lgpl: Canonicalize casing too for MinGW.
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

* lib/canonicalize-lgpl.c (filesystem_name)[__MINGW32__]: New static
function.
(realpath_stk)[__MINGW32__]: Use it to return correct canonicalized
casing.  Also for relative file names, replace any '\\' with '/'.
* tests/test-canonicalize-lgpl.c (main)[__MINGW32__]: Test it.
---
 lib/canonicalize-lgpl.c        | 51 ++++++++++++++++++++++++++++++++++
 tests/test-canonicalize-lgpl.c | 12 ++++++++
 2 files changed, 63 insertions(+)

diff --git a/lib/canonicalize-lgpl.c b/lib/canonicalize-lgpl.c
index 92e9639720..895c335b68 100644
--- a/lib/canonicalize-lgpl.c
+++ b/lib/canonicalize-lgpl.c
@@ -41,6 +41,12 @@
 #include <intprops.h>
 #include <scratch_buffer.h>
 
+#if __MINGW32__
+#include <ctype.h>
+#include <dirent.h>
+#include <libgen.h>
+#endif
+
 #ifdef _LIBC
 # include <shlib-compat.h>
 # define GCC_LINT 1
@@ -180,6 +186,33 @@ get_path_max (void)
   return path_max < 0 ? 1024 : path_max <= IDX_MAX ? path_max : IDX_MAX;
 }
 
+#if __MINGW32__
+/* Return the basename of NAME as found on the filesystem, which may
+   or may not canonicalize the casing, or NULL if not found.  */
+static char *
+filesystem_name (char const *name)
+{
+  char base_buf[PATH_MAX];
+  strcpy (base_buf, name);
+  char *base = basename (base_buf);
+
+  int select_base (struct dirent const* entry)
+  {
+    return strcasecmp (entry->d_name, base) == 0;
+  }
+
+  char dir_buf[PATH_MAX];
+  strcpy (dir_buf, name);
+  char *dir = dirname (dir_buf);
+
+  struct dirent **name_list;
+  int i = scandir (dir, &name_list, select_base, NULL);
+  if (i == 1)
+    return name_list[0]->d_name;
+  return NULL;
+}
+#endif
+
 /* Act like __realpath (see below), with an additional argument
    rname_buf that can be used as temporary storage.
 
@@ -251,12 +284,25 @@ realpath_stk (const char *name, char *resolved,
             goto error_nomem;
           rname = rname_buf->data;
         }
+#if __MINGW32__
+      char *p = rname;
+      while (*p)
+        {
+          if (ISSLASH (*p))
+            *p = '/';
+          p++;
+        }
+#endif
       dest = __rawmemchr (rname, '\0');
       start = name;
       prefix_len = FILE_SYSTEM_PREFIX_LEN (rname);
     }
   else
     {
+#if __MINGW32__
+      if (HAS_DEVICE (rname))
+        rname[0] = toupper (rname[0]);
+#endif
       dest = __mempcpy (rname, name, prefix_len);
       *dest++ = '/';
       if (DOUBLE_SLASH_IS_DISTINCT_ROOT)
@@ -322,6 +368,11 @@ realpath_stk (const char *name, char *resolved,
             {
               buf = link_buffer.data;
               idx_t bufsize = link_buffer.length;
+#if __MINGW32__
+              char *fname = filesystem_name (rname);
+              if (fname)
+                strcpy (rname + strlen (rname) - strlen (fname), fname);
+#endif
               n = __readlink (rname, buf, bufsize - 1);
               if (n < bufsize - 1)
                 break;
diff --git a/tests/test-canonicalize-lgpl.c b/tests/test-canonicalize-lgpl.c
index c0a5a55150..afc637b073 100644
--- a/tests/test-canonicalize-lgpl.c
+++ b/tests/test-canonicalize-lgpl.c
@@ -279,6 +279,18 @@ main (void)
     free (result2);
   }
 
+#if __MINGW32__
+  /* Check that \\ are changed into / and casing is canonicalized. */
+  {
+    int fd = creat (BASE "/MinGW", 0600);
+    ASSERT (0 <= fd);
+    ASSERT (close (fd) == 0);
+
+    char *result = canonicalize_file_name (BASE "\\mingw");
+    ASSERT (strcmp (result, BASE "/MinGW"));
+    free (result);
+  }
+#endif
 
   /* Cleanup.  */
   ASSERT (remove (BASE "/droot") == 0);
-- 
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.com

-- 
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.com

reply via email to

[Prev in Thread] Current Thread [Next in Thread]