help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: any plans for command substitution that preserves trailing newlines?


From: Christoph Anton Mitterer
Subject: Re: any plans for command substitution that preserves trailing newlines?
Date: Thu, 27 Jan 2022 03:39:09 +0100
User-agent: Evolution 3.42.2-1

On Wed, 2022-01-26 at 19:04 -0500, Chet Ramey wrote:
> > But AFAIU, these shells would then violate POSIX in that aspect?!
> 
> No, there's no requirement. POSIX lists LC_ALL, LC_COLLATE, and
> LC_CTYPE
> in the `evironment variables' section of the `sh' description, saying
> they
> affect the shell's behavior. That's the standard description for
> environment variables that affect setlocale().

They're also listed in  2.5.3 Shell Variables[0]:

> The following variables shall affect the execution of the shell:

which I'd have interpreted as "during runtime"?

It further says:

e.g.:
> LC_COLLATE
>     Determine the behavior of range expressions, equivalence classes,
>     and multi-character collating elements within pattern matching.
> LC_ALL
>     The value of this variable overrides the LC_* variables and LANG,
>     as described in XBD Environment Variables.

Since that's just defined to override the others, it's why I thought
restoring the original state of LC_ALL would be perfectly enough (in
the sense of: no more override -> back to what ever was there before)



Having read your last paragraph, I think we might just have a big
misunderstanding.




AFAIU, there is a subtle difference between the LANG/LC_* shell
variables on the one side  and  setlocale() respectively the process'
real "internal" locale state on the other side.


- With the shell variables, both are stored, the default/overriding
  LANG/LC_ALL as well as the "real" categories LC_* (all but ALL).

- With setlocale() however, LC_ALL means basically just to go over each
  "real" category and set them,... so only the "real" categories are
  stored and internally LC_ALL isn't kept.

Right so far?


Now for any shell (that supports locales in a proper/sane way):

- When the shell starts it sets its default locale (for each category)
  in some implementation defined manner. E.g. by calling
  setlocal(LC_ALL, "").
  But it doesn't have to set the real values into any of the LANG/LC_*
  shell variables. If any of these is there, than because it was in
  the environment.

  At least glibc seems to only use LC_ALL, LC_* and LANG (in that
  order) for that, so most likely some combination of them *is*
  actually set in the environment and thus also as shell variable.
  I couldn't find what happens when for a category, no value can be
  determined (e.g. LANG, LC_ALL and LC_CTYPE unset)... but I guess it
  falls back to "C"?!
  So if no shell variable LANG/LC_* exists, the locale should be C?!

- Whenever one sets/unsets any LANG/LC_* (shell) variable, a shell
  has to call setlocale(category, localevalue).

  And for localevalue it has to use the right value from the *shell*
  variables:
  1st from LC_ALL
  2nd from LC_<the one that was set>
  3rd from LANG

  (*) When LC_ALL was unset, it has to do it for every "real" category,
  with 2nd and 3rd.
  

  (unless it updated its own environment before then it could simply
  use "")?


If that's still right, then I fail to construct a case, where simply
setting LC_ALL before the stripping and restoring it right afterwards
wouldn't work (no function scope, no `local` variables):

1) Example 1, none of LANG/LC_* was set
- any internal category is C
- I set LC_ALL=C => any internal category is set to C
- I restore (i.e. unset LC_ALL) => all categories will be still C

2) Example 2, LANG=foo, LC_CTYPE=bar
- ctype internal category=bar, any other internal category is foo 
- I set LC_ALL=C => any internal category is set to C
- I restore LC_ALL (unset) => LANG and LC_CTYPE shell vars are still
  there, and the internal state should be set back to exactly what it
  was before

3) Example 3, LANG=foo, LC_CTYPE=bar, LC_ALL=baz
- any internal category is baz 
- I set LC_ALL=C => any internal category is set to C
- I restore LC_ALL (=baz) => LANG and LC_CTYPE shell vars are still
  there, but again overridden by LC_ALL, and the internal state should
  be set back to exactly what it was before


In all cases, when the shell invoked setlocale() with the right value,
it should be back to where it was before?

Unless it wouldn't do (*) above, but why not?


So where's the situation that, when not using a `local` variable in a
function, that any of the other unrelated LC_* would need to be
restored?


> > Or is there a portable way to query the internal locale state of
> > ashell?
> 
> Not from outside the shell, no.

Well, I mean from inside.


> 
> 
> 
> Exactly. But if LC_ALL was in the environment when this shell
> instance
> started, you'll be modifying the locale that child processes will
> see.
> So now you have to remember the export state and restore that too.

Uhm, I found no portable way to get the export state.

But is that even necessary? If the (shell) variable was unset before
(which I can remember), then it's not necessary to remember its export
state.
If it was set before, than I just assign it's old value and the export
status will remain the same?



> 

> > And the trick of:
> > local LC_ALL=C
> > in some function, shouldn't work either, cause it would also set
> > all
> > locale categories shell-wide?
> 
> It will, but they'll be restored when the function returns. That's
> what I
> meant by letting the shell do it for you.

How does that restoring work?

By the local LC_ALL going away and the one from the outer scope
effectively being set, which causes setlocale() again as above?

If (*) above is done, then calling setlocale() for each category,
should turn everything back.


But if (*) wasn't done... how would it restore any of the other LC_*?



> The only thing you really need to do is to set and reset LC_ALL
> around the single assignment statement that removes the last byte
> from the
> string. If you have a shell that understands locale variables, that
> will do
> the right thing. If you don't, well, then that shell probably
> performs all
> its word expansion operations on bytes anyway.

Do you mean "locale variables" (i.e. LANG/LC_*) or "local variables
(i.e. local to a function)?


Cheers,
Chris.



[0] 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_05_03



reply via email to

[Prev in Thread] Current Thread [Next in Thread]