help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: how to (safely) escape an arbitrary string for use in PS1


From: Koichi Murase
Subject: Re: how to (safely) escape an arbitrary string for use in PS1
Date: Sat, 22 May 2021 08:23:22 +0900

2021年5月22日(土) 6:13 Christoph Anton Mitterer <calestyo@scientia.net>:
>
> On Fri, 2021-05-21 at 07:44 +0900, Koichi Murase wrote:
> > As I have mentioned, there are two unescaping. The first processing
> > is
> > \u to usernames, \$ to # or $, \\ to \, etc. The second processing
> > (promptvars) includes \\ to \, \$ to $, \` to `, and \" to " as well
> > as expansions of ${}, $(), $(()). You can understand it step-by-step.
> Oh, so it (un)escapes \\ twice? Okay I didn't get that at first.
> Seems odd, why does it do that?
>
> Anyway,.. a bit sad, that this cannot be used safely in bash :-(
>
>
>
>
>
> My next idea would have been to at least do the following:
>
> a) *If* the string to be added to PS1 contains none of the "dangerous"
> characters,... include it directly.
>
> b) If it does, include the variable that contains the string (and let
> it expand)... of course this would still break, if someone then turns
> of promptvars.
>
>
>
> For a) I played a bit around with what's better... match for all
> characters which are safe, or match for all which are not.
>
> All which are safe, could be something like (plus maybe a few more like
> _ . and so on):
>   [[ "$str" =~ ^[[:alnum:]]*$ ]]
>
> where I think it wouldn't even matter, that this is subject to
> nocasematch.
>
> But the problem here is, that this still doesn't get much unicode and
> that unicode even works only if a proper LC_CTYPE is active.
> Even if I:
>    ( LC_ALL=C.UTF-8; [[ "$str1" =~ ^[[:alnum:]]*$ ]]; echo $?)
> that is set it explicitly in a subshell, there is no guarantee that the
> locale is really there, and there doesn't seem to be any easily
> catchable error, if it's not (there goes something so stderr, but the
> actual test may still yield 0.
>
>
>
> For b) I'd have done something like:
> ( shopt -u nocasematch ; LC_ALL=C ; [[ "$str" =~ [\$\`\'\"\\] ]] ) ; echo $?
>
> [...]
>
> The shopt is probably irrelevant, cause I anyway match only characters
> without case.
> The LC_ALL is probably irrelevant, too.

I think so.

> What I don't understand:
> - whether I use \$ or just $ in the bracket seems irrelevant
> - if I put quotes around the whole regexp, like "[\$\`\'\"\\]" it
>   doesn't work
> - while bash manual says:
>      "Bracket expressions in regular expressions must be treated
>       carefully, since normal quoting characters lose their meanings
>       between brackets."
>   I seem to need to quote ' and " inside the bracket expression
> - interestingly: [\$\`\'\"\] (i.e. not double-\ at the end ... does
>   seem to be parsed, but doesn't match \ ... shouldn't that be a syntax
>   error?

First, one needs to understand that 1) there is also double unescaping
here: quote removal by Bash and escape sequences of regular
expressions. 2) To complicate matters further, from Bash 3.2, the
strings obtained by quote removal are again escaped for regular
expressions so that they will be treated literally in the regex
processing. This behavior can be turned off by `shopt -s compat31'.
See Bash Reference Manual 3.2.5.2:

https://www.gnu.org/software/bash/manual/bash.html#Conditional-Constructs
> Any part of the pattern may be quoted to force the quoted
> portion to be matched as a string.

3) Also, it should be noted that Bash uses ERE-style regex (Extended
Regular Expressions) from POSIX <regex.h> in which special characters
such as \ $ ^ ? + *, etc. don't have special meaning but are treated
literally, e.g., [\n] in ERE matches with a backslash '\' or a letter
'n'. Your quotation above "Bracket expressions ... between brackets."
describes this fact.

- [\$\`\"\'\\] will be recognized as [<quoted $`"'\>] by Bash, and
escaped for regular expressions as [$`"'\] (actually, there is no need
of escaping because none of $`"'\ have special meaning in bracket
expressions in ERE). Then, [$`"'\] will be passed to the regex engine.

> - whether I use \$ or just $ in the bracket seems irrelevant

- When you use just "$", the combination of "$\" in $\`... doesn't
form a parameter expansion so "$" is treated literally. So,
[$\`\"\'\\] will be recognized as [$<quoted `"'\>] and then escaped as
[$`"'\] before it is sent to the regex engine.

> - if I put quotes around the whole regexp, like "[\$\`\'\"\\]" it
>   doesn't work

- It will be recognized as <quoted [$`\'"\]> by Bash, and then escaped
as \[\$`\\'"\\\] before it is sent to the regex engine.

> - while bash manual says:
>      "Bracket expressions in regular expressions must be treated
>       carefully, since normal quoting characters lose their meanings
>       between brackets."
>   I seem to need to quote ' and " inside the bracket expression

The above sentence in the manual describes the regular expression
passed to the regex engine. Besides that, we need to quote ' and " for
Bash syntax.

> - interestingly: [\$\`\'\"\] (i.e. not double-\ at the end ... does
>   seem to be parsed, but doesn't match \ ... shouldn't that be a syntax
>   error?

- Oh, this example is interesting. I guess [\$\`\'\"\] is treated as
[<quoted $`'"]> and "escaped" as [$`'"], but it's just a naive guess.

----

Since it seems you care about various possible shopt settings, I
recommend you to anytime store the regular expression in a variable
and use that variable on the right-hand side of =~ operator. In this
way, the regular expressions work the same for either side of "shopt
-s/-u compat31"

local regex="[\$\`'\"\\]"
if [[ $str =~ $regex ]]; then
[...]

But actually, in this case, we can actually simply use the glob
pattern (operator ==) instead of regular expressions.

if [[ $str == *[\$\`\'\"\\]* ]]; then
[...]

----

> And the bigger question:
> Which characters do I actually need to check for in order to be safe
> (i.e. to determine whether an arbitrary string might be subject to any
> expansions/substitutions/etc. and can thus not directly be used inside
> PS1)?
> - $ and ` are clear
> - I guess I won't need to check for ( ) { } | ; & && || as there should
>   be no pipes, subshells, etc. possible when PS1 is evaluated
> - I'm not so sure about whether I need to check for \ ' "
>   I've included them now cause I feel they seem evil ;-)

For safety (against code injection), I think excluding $ and ` is
enough, but if one wants to get the same results with `promptvars'
on/off, \ should also be excluded.

if [[ $str == *['$`\']* ]]; then
  # b)
  global_var1=$str
  PS1='...${global_var1}...'
else
  # a)
  PS1="...${str}..."
fi

--
Koichi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]