bug-global
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unicode file names on macOS


From: Bernd Rellermeyer
Subject: Unicode file names on macOS
Date: Sat, 26 Jun 2021 10:07:51 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

I have the following problem with Unicode file names on macOS.

On macOS with an APFS file system, file names are Unicode encoded, but not normalized.  That means that on the file system, file names can be either in normalization form C or in normalization form D.  Most applications have a normalization layer and use NFC internally, but save files in NFD.  Apparently this is not the case for some command line applications like GNU Global.  Open the macOS TextEdit application, type ``#define nfd;`` and save the file as a plain text file with name ``ä.c`` in an empty directory.  Now change in that directory on the command line and enter ``gtags`` and ```global -f ä.c``.  GNU Global tells you that ``ä.c`` is not a source file.  Now enter ``ls`` on the command line, copy the file name from the output and paste it as an argument to ``global -f``.  This time the file is found by GNU Global.  What happens is that the file name is in NFD on the file system, but in NFC when typing the file name on the command line. When copying the output of ``ls ä.c`` as an argument to ``global -f``, GNU Global again tells you that ``ä.c`` is not a source file.  The ``ls`` command apparently has a normalization layer that makes it accept file names in either normalization form as input.  The ``ls`` command with no arguments prints file names in their normalization form on the file system, whereas typing ``ls ä.c`` on the command line prints the file name in NFC, independent of its normalization form on the file system.  On the command line itself, files are saved with file names in NFC.  When typing ``echo "#define nfc;" > ä.c``, ``gtags`` and ``global -f ä.c`` on the command line, the file name is in NFC on the file system.

I think, GNU Global should also have a kind of normalization layer to accept both normalization forms as input.

Kind regards

Bernd Rellermeyer




reply via email to

[Prev in Thread] Current Thread [Next in Thread]