bug-make
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Use UTF-8 active code page for Windows host.


From: Eli Zaretskii
Subject: Re: [PATCH] Use UTF-8 active code page for Windows host.
Date: Sun, 19 Mar 2023 19:01:55 +0200

> From: Costas Argyris <costas.argyris@gmail.com>
> Date: Sun, 19 Mar 2023 16:34:54 +0000
> Cc: bug-make@gnu.org, psmith@gnu.org
> 
> > OK, but how is the make.exe you produced built?
> 
> I actually did what you suggested but was somewhat confused with the
> result.    Usually I do this with 'ldd', but both msvcrt.dll and ucrtbase.dll
> show up in 'ldd make.exe' output, and I wasn't sure what to think of it.
> 
> However, your approach with objdump gives fewer results and only
> lists msvcrt.dll, not ucrtbase.dll:
> 
> C:\Users\cargyris\temp>objdump -p make.exe | grep "DLL Name:"
>         DLL Name: ADVAPI32.dll
>         DLL Name: KERNEL32.dll
>         DLL Name: msvcrt.dll
>         DLL Name: USER32.dll
> 
> So I guess MSVCRT is enough, i.e. no need for UCRT.

Yes, thanks.

> > If you try using in a Makefile file names with non-ASCII
> > characters outside of the current ANSI codepage, does Make succeed to
> > recognize files mentioned in the Makefile whose letter-case is
> > different from what is seen in the file system?
> 
> I think it does, here is the experiment:
> 
> C:\Users\cargyris\temp>ls ❎
>  src.c
> 
> There is only src.c in that folder.
> 
> Makefile utf8.mk is UTF-8 encoded and has this content that
> checks for the existence of:
> 
> ❎\src.c
> ❎\src.C
> ❎\src.cs
> 
> where ❎ is outside the ANSI codepage (1252).

That's not a good experiment, IMO: the only non-ASCII character here
is U+274E, which has no case variants.  And the characters whose
letter-case you tried to change are all ASCII, so their case
conversions are unaffected by the locale.

> If I understand this correctly, both src.c and src.C should be found,
> but not src.cs (just to show a negative case as well).

In addition, I'm not sure Make actually compares file names somewhere,
I think it just calls 'stat', and that is of course case-insensitive
(because the filesystem is on the base level).

My guess would be that only characters within the locale, defined by
the ANSI codepage, are supported by locale-aware functions in the C
runtime.  That's because this is what happens even if you use "wide"
Unicode APIs and/or functions like _wcsicmp that accept wchar_t
characters: they all support only the characters of the current locale
set by 'setlocale'.  I don't expect that to change just because UTF-8
is used on the outside: internally, everything is converted to UTF-16,
i.e. to the Windows flavor of wchar_t.

> > Btw, there's one aspect where Make on MS-Windows will probably fall
> > short of modern Posix systems: the display of non-ASCII characters on
> > the screen.
> 
> Indeed, some thoughts on that:
> 
> 1) As you know, this is only affecting the visual aspect of the logs, not the
> inner workings of Make.    This could confuse users because they would
> be seeing "errors" on the screen, without there being any real errors.
> Perhaps a mention in the doc or release notes could remedy that.
> 
> 2) To some extent (maybe even completely, I don't know) this can be
> mitigated with using PowerShell instead of the classic Command Prompt.
> This seems to be working in this case at least:

This could be just sheer luck: PowerShell uses a font that supports
that particular character.  The basic problem here is that "Command
Prompt" windows don't allow to configure more than one font for
displaying characters, and a single font can never support more than a
few scripts.  If PowerShell doesn't allow more than a single font in
its windows, it will suffer from the same problem.

> If anything, it could be worth a mention in the doc.

Yes, of course.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]