Re: Unicode string literals

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode string literals

From:	Daniel Richard G.
Subject:	Re: Unicode string literals
Date:	Fri, 01 May 2020 20:11:28 -0400
User-agent:	Cyrus-JMAP/3.3.0-dev0-351-g9981f4f-fmstable-20200421v1

Hi everyone, I've been watching this discussion.

On Fri, 2020 May  1 18:52-04:00, Bruno Haible wrote:
> 
> Yes, this is unlikely. In a world where people routinely do a "git pull" from
> upstream repositories and send patches or pull requests upstream, every
> automated downstream manipulation of the source code - even as small as
> transforming CR/LF to LF - becomes a PITA.

Much agreed.

For what it's worth, I'll mention the following points:

* XLC on z/OS does not appear to support u8"..." strings, either in my
  tests or in the documentation I've searched. The most I can confirm is
  support for u"..." (UTF-16) and U"..." (UTF-32) literals.

* When source code is brought in to a z/OS system for compilation, it is
  typically blanket-converted to e.g. IBM-1047 (which maps one-to-one to
  Latin-1) as the first step. Same for scripts and other files
  (binary blobs become a headache, yes). It is possible to coerce XLC to
  compile C source in ASCII encoding, but this never happens in
  practice, because the shell/make interpreters will choke on ASCII
  input well before that point.

* UTF-8 characters in a source file is an awkward situation anyway,
  because the z/OS user environment itself does not support multibyte
  encodings. The typical (EBCDIC) encodings used are all single-byte.
  UTF-EBCDIC exists but it is not a thing on z/OS.

* The general assumption is that programs running on z/OS may process
  UTF-8 data (multibyte functions are provided, iconv knows about UTF-8,
  etc.), but their interaction with the user environment is entirely
  through a single-byte encoding.

* Obviously, the set of users who interact with a mainframe directly
  through a Unix shell is very small, which is why encoding support in
  the z/OS user environment feels like a throwback to 1999.

* I'm not aware of many cases where string-literal encodings have been
  an issue in z/OS; the immediate example that comes to mind is e.g.
  gnulib/tests/test-iconv-utf.c, which requires test strings to be ASCII-
  encoded. You can see the use of XLC's "#pragma convert()" there. But
  routine scenarios, like getopt() option letters, don't need to do
  anything special to work as intended.

* If there are any tricky encoding-related issues you are trying to
  solve, I'm of course happy to try out proposed solutions :-)

--Daniel

-- 
Daniel Richard G. || address@hidden
My ASCII-art .sig got a bad case of Times New Roman.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Unicode string literals, Bruno Haible, 2020/05/01
- Re: Unicode string literals, Paul Eggert, 2020/05/01
  - Re: Unicode string literals, Bruno Haible, 2020/05/01
    - Re: Unicode string literals, Daniel Richard G. <=

Prev by Date: Re: Add gl_list_remove_last to list/xlist
Next by Date: Re: pure and const function attributes
Previous by thread: Re: Unicode string literals
Next by thread: [PATCH] Add poke to users.txt
Index(es):
- Date
- Thread