bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Should the readline *-meta flags reset when $LANG changes?


From: Chet Ramey
Subject: Re: Should the readline *-meta flags reset when $LANG changes?
Date: Thu, 11 Aug 2022 15:22:00 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.1.2

On 8/10/22 10:59 PM, Koichi Murase wrote:
2022年8月10日(水) 23:21 Chet Ramey <chet.ramey@case.edu>:
Does it mean custom values of these readline variables will be lost
every time LANG or LC_{CTYPE,ALL} is changed even if a user or program
intentionally sets them up?

It means those settings will now mirror the locale.

We often temporarily change LANG or LC_* to perform some binary
operations [such as counting the number of bytes of data and safely
removing trailing x from the result of $(command;printf x)].

Do you often do this in interactive shells?

Yes, but I don't mean I directly type the above kinds of commands in
the command line and run them, but I use them in the functions called
through `bind -x'.  Also, the above cases (counting bytes and removing
trailing x) are just examples; I set locale variables for various
purposes in the actual codes.  For example, I often type and run
commands of the form

   LANG=C some-commands-or-functions

to get the default error messages that are not locale-specific (though
I could use LC_MESSAGES=C instead, yet LANG=C is easier to type for
me).  I normally use the locale LANG=ja_JP.UTF-8 by default, so the
commands output error messages in Japanese by default.  This is not
useful when I would like to search for the solution on the internet
because there is almost no information on the Japanese error message.

So let's talk through these, since it doesn't seem like these things will
be affected by the realistic available solutions.


Often enough to make a difference?

My `bind -x' functions use `LC_ALL=' and `LC_CTYPE=C' for every
keystroke, for example, in combination with `builtin read'.  They also
use `LC_ALL=' for other purposes for mostly every keystroke.  Some vi
binding also uses `LC_CTYPE=C'.  My completion functions also change
`LC_ALL` and `LC_CTYPE`.  For example, `LC_CTYPE=C' is used in
calculating a PJW hash code of a given string.  I haven't carefully
checked, but there are probably other cases of changing `LC_CTYPE'.
Also, `LC_ALL=' is used everywhere.

So you're using `read -e'? Otherwise, these suggest that solution 4 is
most appropriate.


Across multiple calls to readline?

I think I am missing the point.  What does ``multiple calls to
readline'' mean?  Is the situation different from a single call to
readline?

It informs the solution. If I choose option 4, for instance, none of these
matter. They will all happen as part of a single call to readline, and the
normal shell execution will ensure that the modified locale variables are
temporary.


Hmm, I think I first need to make it clear that the behavior of my
code, which is supposed to be sourced in an interactive session by
users, is unaffected by these readline settings.

OK.

I just do not want
to break or change the existing user settings inside the functions
that I provide.  The behavior of my functions is unaffected (except
for « bind -x '"\M-x":....'  » which is affected by `convert-meta',
for which I already implemented a workaround) because it doesn't try
to communicate with readline inside a single call of `bind -x'.  The
problem is that, with the new automatic adjustment of these readline
variables, the settings by users can be lost after using `LC_ALL=' or
`LC_CTYPE=C' inside my functions.

Only if those functions recursively call readline() (which is a bad idea
anyway) somehow, or leave the modified settings in the user's environment
for the next call to readline(). This is the point of my question.


I believe this is a general problem for writers of Bash
configurations. `bash_completion' also uses `LC_CTYPE=C' and
`LC_ALL=C'.  The behavior of such configurations itself will be
unaffected by the change of readline settings, but they need to
implement special treatment to preserve the user settings if the user
settings will be lost by changing locales.

This scenario is not relevant with option 4, unless bash-completion leaves
its modified LC_CTYPE and LC_ALL settings in the user's environment after
the call to readline() completes. If it did, I imagine people would have
complained by now.


And, if the change is intended to be temporary, why would you not
want the relevant readline variables to reflect the locale when you
were finished?

Because I would not like to break the users' settings.  In general, a
third-party Bash configuration should not overwrite the users'
settings as far as the configuration does not need the setting.

So that argues against option 3, and in favor of option 4.


Also, if these readline variables would be cleared every time, it
seems to me that these readline variables would be effectively
unconfigurable and would lose the point of their existence, or we
could not touch LANG or LC_* at all after the initial setup.

The one caveat we would have to add is to tell users they have to
restore custom values of these readline variables if they change LC_ALL,
LC_CTYPE, or LANG from one call to readline to the next. They're already
auto-set when readline starts up, before reading the readline init file.
For instance, if you set

LANG=C
bind 'set output-meta on'

in a bash startup file, you would have to run (or maybe you wouldn't, but
you'd have to think about it) the appropriate bind commands if you later
executed

LANG=en_US.UTF-8

I agree that we should somehow change the current behavior that the
default values of *-meta settings are determined by the locale on the
startup of Bash, but the proposed change will break the opposite
scenario while it solves Alan's scenario.

The locale-appropriate values are already determined when readline is
called the first time. That's not going to change.


The combination (UTF-8 & 7bit-mode) doesn't make much sense, so we
might force (UTF-8 & 8bit-mode) for UTF-8 or similar for multibyte
character encodings with 8-bit bytes.  [ Note: Here, 7bit/8bit-mode
means « convert-meta on/off » and « {input,output}-meta off/on »,
respectively. ] However, on the opposite side of the single-byte
character encoding (e.g. for C), I think combinations (C & 7bit-mode)
and (C & 8bit-mode) are both possible, so users can still set «
convert-meta off » or « {output-meta,input-meta,meta-flag} on ».

That is the existing startup behavior.

Where I think we're converging is to use option 4, and -- as long as
LC_ALL/LC_CTYPE/LANG don't change -- not modifying these variables when
readline() is called. I can document that these variables are dependent on
the current locale, and if the locale changes, those variables will need
to be adjusted. If the locale doesn't change between calls to readline(),
you don't need to do anything.


Maybe a large change should be considered for bash-5.3, but I still
think three states is one possible implementation that is a real
superset of the previous behavior:

I already said I was not going to make that change while we're at this
point in the release process.


If these readline variables should always be uniquely determined by the
current locale and the users actually should never set them to the
different side, I think another option might be just to remove these
readline variables (though I'm not sure if this really makes sense):

I'm not going to do either of these things.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]