[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Unicode file names on macOS
From: |
Bernd Rellermeyer |
Subject: |
Unicode file names on macOS |
Date: |
Sat, 26 Jun 2021 10:07:51 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 |
I have the following problem with Unicode file names on macOS.
On macOS with an APFS file system, file names are Unicode encoded, but
not normalized. That means that on the file system, file names can be
either in normalization form C or in normalization form D. Most
applications have a normalization layer and use NFC internally, but save
files in NFD. Apparently this is not the case for some command line
applications like GNU Global. Open the macOS TextEdit application, type
``#define nfd;`` and save the file as a plain text file with name
``ä.c`` in an empty directory. Now change in that directory on the
command line and enter ``gtags`` and ```global -f ä.c``. GNU Global
tells you that ``ä.c`` is not a source file. Now enter ``ls`` on the
command line, copy the file name from the output and paste it as an
argument to ``global -f``. This time the file is found by GNU Global.
What happens is that the file name is in NFD on the file system, but in
NFC when typing the file name on the command line. When copying the
output of ``ls ä.c`` as an argument to ``global -f``, GNU Global again
tells you that ``ä.c`` is not a source file. The ``ls`` command
apparently has a normalization layer that makes it accept file names in
either normalization form as input. The ``ls`` command with no
arguments prints file names in their normalization form on the file
system, whereas typing ``ls ä.c`` on the command line prints the file
name in NFC, independent of its normalization form on the file system.
On the command line itself, files are saved with file names in NFC.
When typing ``echo "#define nfc;" > ä.c``, ``gtags`` and ``global -f
ä.c`` on the command line, the file name is in NFC on the file system.
I think, GNU Global should also have a kind of normalization layer to
accept both normalization forms as input.
Kind regards
Bernd Rellermeyer
- Unicode file names on macOS,
Bernd Rellermeyer <=