bug-global
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode file names on macOS


From: Shigio YAMAGUCHI
Subject: Re: Unicode file names on macOS
Date: Mon, 28 Jun 2021 12:47:30 +0900

Hello,
> I think, GNU Global should also have a kind of normalization layer to
> accept both normalization forms as input.

Rather, isn't this a bug in 'TextEdit'?
It seems that TextEdit converts a character: 'ä'(c3a4) into two characters:
'a'(61) + ' ̈'(cc88) of the file name. This doesn't make sense, since you can
use 'ä'(c3a4) directly as part of the file name on APFS (I have tested on
macOS 10.15.7). Now, I think the conversion is a bug if there is no reason.
What do you think?

Regards,
Shigio

On Sat, Jun 26, 2021 at 10:13 PM Bernd Rellermeyer
<bernd.rellermeyer@arcor.de> wrote:
>
> I have the following problem with Unicode file names on macOS.
>
> On macOS with an APFS file system, file names are Unicode encoded, but
> not normalized.  That means that on the file system, file names can be
> either in normalization form C or in normalization form D.  Most
> applications have a normalization layer and use NFC internally, but save
> files in NFD.  Apparently this is not the case for some command line
> applications like GNU Global.  Open the macOS TextEdit application, type
> ``#define nfd;`` and save the file as a plain text file with name
> ``ä.c`` in an empty directory.  Now change in that directory on the
> command line and enter ``gtags`` and ```global -f ä.c``.  GNU Global
> tells you that ``ä.c`` is not a source file.  Now enter ``ls`` on the
> command line, copy the file name from the output and paste it as an
> argument to ``global -f``.  This time the file is found by GNU Global.
> What happens is that the file name is in NFD on the file system, but in
> NFC when typing the file name on the command line. When copying the
> output of ``ls ä.c`` as an argument to ``global -f``, GNU Global again
> tells you that ``ä.c`` is not a source file.  The ``ls`` command
> apparently has a normalization layer that makes it accept file names in
> either normalization form as input.  The ``ls`` command with no
> arguments prints file names in their normalization form on the file
> system, whereas typing ``ls ä.c`` on the command line prints the file
> name in NFC, independent of its normalization form on the file system.
> On the command line itself, files are saved with file names in NFC.
> When typing ``echo "#define nfc;" > ä.c``, ``gtags`` and ``global -f
> ä.c`` on the command line, the file name is in NFC on the file system.
>
> I think, GNU Global should also have a kind of normalization layer to
> accept both normalization forms as input.
>
> Kind regards
>
> Bernd Rellermeyer
>
>


-- 
Shigio YAMAGUCHI <shigio@gnu.org>
PGP fingerprint:
26F6 31B4 3D62 4A92 7E6F  1C33 969C 3BE3 89DD A6EB



reply via email to

[Prev in Thread] Current Thread [Next in Thread]