bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash leaks the old var when using =~ in a function with local BASH_R


From: Chet Ramey
Subject: Re: bash leaks the old var when using =~ in a function with local BASH_REMATCH
Date: Thu, 2 Jun 2022 16:23:28 -0400
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0

On 5/28/22 4:08 PM, Emanuele Torre wrote:

Bash Version: 5.1
Patch Level: 16
Release Status: release

Description:
        If `[[ $str =~ $re ]]' is executed from a function in which
        `BASH_REMATCH' is local, bash will "leak" the old *global*
        `BASH_REMATCH' variable.

         This happens because in `sh_regmatch()', bash calls these two
        functions:

            unbind_variable_noref ("BASH_REMATCH");
            rematch = make_new_array_variable ("BASH_REMATCH");

         `unbind_variable_noref()' will unbind and `free()' the first
         variable it can find named "BASH_REMATCH" (giving priority to
        local variables).

         While "BASH_REMATCH" will add a new variable named
        "BASH_REMATCH" to the global variables.

Thanks for the report. This is an issue new in bash-5.1; before that, the
BASH_REMATCH variable was readonly and you couldn't create a local
variable with that name in the way you do here. You could create the
variable yourself and add attributes before you performed any regexp
matches, but your settings would be discarded and the variable set to be
readonly the first time you did.

I changed it in bash-5.1 so the value could be saved and restored during
traps (specifically the DEBUG trap).

Fix:
        The obvious fix is to use, instead of `unbind_variable_noref()',
        a similar function that uses `global_variables' instead of
        `shell_variables'.

        That will remove the "variable leak", but it is still not great:
        declaring a local `BASH_REMATCH' makes it impossible to access
        the matches of `[[ $str =~ $re ]]' because bash will set the
        global `BASH_REMATCH' instead of the local one, and
         `"${BASH_REMATCH[@]}"' will expand to local `BASH_REMATCH'.

I doubt there's very much code out there that tries to make BASH_REMATCH
local, since it throws an error before bash-5.1. So let's talk through a
change that's as backwards-compatible as possible.

Clearly there's a mismatch between allowing the local variable to be
created (and unbinding it) and setting it in the global scope. As you say,
the easiest thing to do is to act exclusively on the global scope and
warn the user -- at least in the documentation -- that local copies of the
variable aren't going to be used. That would be the closest to preserving
the pre-bash-5.1 behavior.


         I think allowing `BASH_REMATCH' to be local-ised should be
         considered: it would be nice. (Also, it's a little confusing
        that `MAPFILE', `REPLY', `COPROC', etc. can be localised, but
        `BASH_REMATCH' cannot.)

Those are all default variables that can be overridden by names the user
chooses; that's one difference.


        bash will currently (once the unbind part is fixed) try to
        remove the global `BASH_REMATCH' and replace it with a brand new
        array variable that contains the matches.

The idea is that there is one BASH_REMATCH instance, and the shell always
sets it to reflect the match status of the most recent regexp match. It was
easy to make that guarantee when it was readonly.


        It could instead replace the local `BASH_REMATCH' variable with
        a new local array variable (if a local `BASH_REMATCH' variable
        of any type was present.)

         I am not sure if bash has any specific reason to use this
        technique instead of just using `find_or_make_array_variable()'
        like other features in bash that use arrays do.

Bash uses this technique all over the place for variables that are set as a
side effect of performing some other action. COMP_WORD, COMP_LINE,
COMP_POINT, COMP_CWORD, POSIXLY_CORRECT, IGNOREEOF, etc. Anything that's
supposed to be immune from something weird the user does. When the variable
was readonly, this was also easy to guarantee.


        I think bash could just make `[[ $str =~ $re ]]' use
         `find_or_make_array_variable()' like other bash features that
         use arrays do; If the variable that already exists has
         incompatible attributes (i.e. -A and -r) it could just print an
         error message (while still returning 0/1 depending on the result
         of the match, BASH_REMATCH not being settable should not
        influence the exit status of `[[ $str =~ $re ]]'), or simply not
        set BASH_REMATCH silently.
         This would also allow to use attributes like -l, -u with
        BASH_REMATCH (`declare -l BASH_REMATCH') which may be useful.
That is less backwards compatible than other options, but you could make an
argument that it's more consistent with the new non-readonly behavior.

One thing about BASH_REMATCH is that it's always set to whatever regexec
returns in the `matches' array, whether or not the match itself succeeds.
The return value from regexec only affects $?. It's always behaved this
way.

It's not clear that allowing other attributes to affect what gets put into
BASH_REMATCH is desirable: it's supposed to be exactly what you get from
regexec.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    chet@case.edu    http://tiswww.cwru.edu/~chet/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]