help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: any plans for command substitution that preserves trailing newlines?


From: Koichi Murase
Subject: Re: any plans for command substitution that preserves trailing newlines?
Date: Tue, 1 Jun 2021 11:55:13 +0900

2021年6月1日(火) 10:34 Christoph Anton Mitterer <calestyo@scientia.net>:
> Problem is that the trailing x solution doesn't seem to work reliably,
> see https://unix.stackexchange.com/a/383390/474076, section "About a
> trailing x.".

It seems the solution is also given there; set temporary LC_ALL=C
(though it is pointed out that this doesn't work with yash).

> Their idea is basically that adding an x might yield a new character
> (e.g. when added to a previously invalid UTF-8 string), and then cannot
> be removed afterwards.

There is no problem in UTF-8 where "x" will never appear as a valid
trailing byte in multibyte characters. The StackExchange answer you
linked to mentions the character encoding BIG5, GB18030 and BIG5HKSCS.

> But I couldn't reproduce their problems and for me the sentinel value
> just worked, though I only tried this in a UTF-8 locale.

As I've written already, UTF-8 doesn't have a problem.

> Can someone (Chet?) confirm that the solution with adding *any*
> character and removing it later on works (i.e. with any locale and any
> valid variable content, which is, AFAIU, anything but NUL)?

Do you count misencoded strings as "valid" variable content? As far as
the data is correctly encoded in the current LC_CTYPE, it should
always work as expected.

> Or does this work with just some characters like claimed in some posts
> on stackoverflow?

Another StackExchange answer says that "x" is affected but "." isn't
affected (as far as the answering person tried in Debian, FreeBSD, and
Solaris), but this is not really a robust statement. In theory,
ISO/IEC 2022 encoding allows to change the meaning of C0 (\x00-\x1F),
GL (\x21-\x7E), C1 (\x80-\x9F), and GR (\xA0-\xAF) by locking shift
escape sequences. In particular, all the bit combinations (i.e. bytes)
in GL which contain ASCII "." and "x" can be used for trailing bytes
of 94^n character sets (such as LC_CTYPE=ja_JP.ISO-2022-JP). The only
two bit-combinations that are unaffected by the ISO/IEC 2022 shifts
are SP (space \x20) and DEL (^? or \x7F). But actually, the encodings
that are fully ISO/IEC 2022 have hardly used as user locales because
most utilities have problems in dealing with such context-dependent
encoding schemes.

> Does anyone know whether this is just a feature of bash or works in any
> sh compatible shell?

In the StackExchange answer you provided, it is mentioned that it
fails with zsh (though it is also reported in the comment that zsh
doesn't fail). It is also mentioned that the LC_ALL workaround doesn't
work in yash.

--
Koichi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]