help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: any plans for command substitution that preserves trailing newlines?


From: Chet Ramey
Subject: Re: any plans for command substitution that preserves trailing newlines?
Date: Thu, 27 Jan 2022 10:39:16 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.4.1

On 1/26/22 9:39 PM, Christoph Anton Mitterer wrote:
On Wed, 2022-01-26 at 19:04 -0500, Chet Ramey wrote:
But AFAIU, these shells would then violate POSIX in that aspect?!

No, there's no requirement. POSIX lists LC_ALL, LC_COLLATE, and
LC_CTYPE
in the `evironment variables' section of the `sh' description, saying
they
affect the shell's behavior. That's the standard description for
environment variables that affect setlocale().

They're also listed in  2.5.3 Shell Variables[0]:

The following variables shall affect the execution of the shell:

which I'd have interpreted as "during runtime"?

Sure, if you inherit them from the environment. There's no requirement that
the shell update its idea of the locale based on assignments. I'd argue
that it's a quality of implementation issue, but the shells that don't are
just as convinced the shell doesn't need to do it.


It further says:

e.g.:
LC_COLLATE
     Determine the behavior of range expressions, equivalence classes,
     and multi-character collating elements within pattern matching.
LC_ALL
     The value of this variable overrides the LC_* variables and LANG,
     as described in XBD Environment Variables.

Since that's just defined to override the others, it's why I thought
restoring the original state of LC_ALL would be perfectly enough (in
the sense of: no more override -> back to what ever was there before)

I think you missed the part below where I agreed with that, as long as you
just do it around the expansion that strips the final byte.

If you want to modify it earlier, you have to check whether or not it's
exported because you'll affect the execution environment of programs the
shell invokes.


Having read your last paragraph, I think we might just have a big
misunderstanding.

Maybe.

AFAIU, there is a subtle difference between the LANG/LC_* shell
variables on the one side  and  setlocale() respectively the process'
real "internal" locale state on the other side.

I think the difference is in what the system considers to be the "default
locale."



- With the shell variables, both are stored, the default/overriding
   LANG/LC_ALL as well as the "real" categories LC_* (all but ALL).

- With setlocale() however, LC_ALL means basically just to go over each
   "real" category and set them,... so only the "real" categories are
   stored and internally LC_ALL isn't kept.

Right so far?

I see what you mean, for some value of "kept."


Now for any shell (that supports locales in a proper/sane way):

- When the shell starts it sets its default locale (for each category)
   in some implementation defined manner. E.g. by calling
   setlocal(LC_ALL, "").
   But it doesn't have to set the real values into any of the LANG/LC_*
   shell variables. If any of these is there, than because it was in
   the environment.

Correct.

   At least glibc seems to only use LC_ALL, LC_* and LANG (in that
   order) for that, so most likely some combination of them *is*
   actually set in the environment and thus also as shell variable.

Not always.

   I couldn't find what happens when for a category, no value can be
   determined (e.g. LANG, LC_ALL and LC_CTYPE unset)... but I guess it
   falls back to "C"?!

"the empty string "" (which denotes the native environment)"

   So if no shell variable LANG/LC_* exists, the locale should be C?!

Not necessarily. If you don't do anything -- no setlocale() call -- the
locale starts as "C" and stays there. But if you use "" as the locale
argument to setlocale(), you get the "native environment" in the absence
of any environment variables. You could, for instance, set that native
environment, or at least a native preferred language, in some preference
pane.

- Whenever one sets/unsets any LANG/LC_* (shell) variable, a shell
   has to call setlocale(category, localevalue).

   And for localevalue it has to use the right value from the *shell*
   variables:
   1st from LC_ALL
   2nd from LC_<the one that was set>
   3rd from LANG

   (*) When LC_ALL was unset, it has to do it for every "real" category,
   with 2nd and 3rd.

Right so far.

   (unless it updated its own environment before then it could simply
   use "")?

Nobody does that.


If that's still right, then I fail to construct a case, where simply
setting LC_ALL before the stripping and restoring it right afterwards
wouldn't work (no function scope, no `local` variables):

As I said: "The only thing you really need to do is to set and reset LC_ALL
around the single assignment statement that removes the last byte from the
string."

You don't need to mess with setting LC_ALL to anything earlier in the
script, and you don't need to worry about hypotheticals like the shell
doing some character conversion on assignment. Nor do you need to worry
about the effect of adding a byte to some incomplete multibyte character.

I skimmed your message to the austin-group mailing list, and I don't
really see any of these concerns as making a difference.

But if you do set LC_ALL earlier in the script, as one of your previous
examples showed, you need to understand the effects.


Or is there a portable way to query the internal locale state of
ashell?

Not from outside the shell, no.

Well, I mean from inside.

Sure. Interrogate the state of the relevant shell variables and apply the
appropriate precedence rules. If none are set, run `locale' and parse its
output, for example

locale | sed -n 's/^LANG="\(.*\)"/\1/p'

That will give you a pretty good idea of the native environment.


Exactly. But if LC_ALL was in the environment when this shell
instance
started, you'll be modifying the locale that child processes will
see.
So now you have to remember the export state and restore that too.

Uhm, I found no portable way to get the export state.

Parse the output of `export'.


But is that even necessary? If the (shell) variable was unset before
(which I can remember), then it's not necessary to remember its export
state.
If it was set before, than I just assign it's old value and the export
status will remain the same?

Unless you want to modify it earlier but not have that modified state
affect child processes.


And the trick of:
local LC_ALL=C
in some function, shouldn't work either, cause it would also set
all
locale categories shell-wide?

It will, but they'll be restored when the function returns. That's
what I
meant by letting the shell do it for you.

How does that restoring work?

By the local LC_ALL going away and the one from the outer scope
effectively being set, which causes setlocale() again as above?

Yes. But as I said, you don't really need it as long as you restrict
setting and resetting/unsetting LC_ALL to the single assignment statement.


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]