bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: variable assignment with -v behavior changed for string that looks l


From: Ed Morton
Subject: Re: variable assignment with -v behavior changed for string that looks like a strongly typed regexp
Date: Fri, 16 Apr 2021 08:00:24 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.1

Good to hear there is a workaround, thanks. It still means there's a magic pattern of `@/.../` that changes the behavior of `-v var=value`

awk -v s='@/\t/' 'BEGIN{print s}'
\t

such that:

a) it gets interpreted by awk as something other than the string you wrote,
b) you can't just print the contents of the variable and see what you set it to, c) it doesn't interpret `\t`, for example, as a literal tab like all other uses of `-v var=value`
d) it breaks backward compatibility and portability

Fortunately `@/../` isn't going to be a common string so I doubt it'll come up often in real usage but it's still an ugly bump that I do wish you'd reconsider implementing some other way, e.g. as only taking effect when used as `-v s='\@/\t/` which is otherwise undefined and so you can interpret however you like. That would also establish a pattern you could use if/when any other such otherwise magic strings would become useful for other purposes - start them all with an escaped @ or some other currently literal character.

But whatever you decide is fine, of course, I just wanted to make you aware of the issue.

    Ed.

On 4/16/2021 2:34 AM, arnold@skeeve.com wrote:
$ gawk --version
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.0.1, GNU MP 6.1.2)

$ gawk -v s='\x40/str/' 'BEGIN {
print s, typeof(s)
}'
@/str/ string

'nuff said.

Ed Morton <mortoneccc@comcast.net> wrote:

I just came across this behavior which I think is odd if not plain
wrong. Lets say I have input data separated by `/<tab>/` and I want to
find lines that contain one such separator (I know there are other ways
to do that, that's not the point):

     printf 'foo/\t/bar\n'
     foo/    /bar

     printf 'foo/\t/bar\n' | awk -v str='/\t/' 'BEGIN{print typeof(str)
     ": <" str ">"} index($0,str)'
     string: </      />
     foo/    /bar

Now lets say the separator is `@/<tab>/`:

     printf 'foo@/\t/bar\n'
     foo@/   /bar

     printf 'foo@/\t/bar\n' | awk -v str='@/\t/' 'BEGIN{print
     typeof(str), str} index($0,str)'
     regexp \t

     printf 'foo@/\t/bar\n' | awk -v str='@/\\t/' 'BEGIN{print
     typeof(str), str} index($0,str)'
     regexp \\t

     printf 'foo@/\t/bar\n' | awk -v str=$'@/\t/' 'BEGIN{print
     typeof(str), str} index($0,str)'
     regexp
     foo@/   /bar

None of that is intuitive, especially if you're not even aware of the
strongly typed regexp gawk extension, none of it functions as you'd
expect if you simply wanted to use a string that happened to be `@/\t/`,
and it would break existing code that relied on a string simply being a
string and `-v` interpreting escape sequences.

There are existing constructs that don't interpret escape sequences
(e.g. populating a variable from ARGV[] or ENVIRON[]) so it's not clear
why the behavior of `-v` changed to NOT interpret them when awk thought
the string being passed is a strongly typed regexp. I also don't see an
obvious way to turn off that behavior, e.g. by escaping the `@` (one
escape functions but gives a warning while 2 escapes don't give a
warning but don't function):

     printf 'foo@/\t/bar\n' | awk -v str='\@/\t/' 'BEGIN{print
     typeof(str), str} index($0,str)'
     awk: warning: escape sequence `\@' treated as plain `@'
     string @/       /
     foo@/   /bar

     printf 'foo@/\t/bar\n' | awk -v str='\\@/\t/' 'BEGIN{print
     typeof(str), str} index($0,str)'
     string \@/      /

I would think that if you wanted to allow assignment of variables to be
strongly typed regexp constants using `-v` then using `-v
str='\@/.../'`, i.e. starting with an escaped `@`,  would be a better
way to go since there wont be any existing scripts that start with `\@`
(because that would have produced the usual escape sequence warning) so
the extension is something people can turn on if they want it rather
than something that's on by default, has surprising effects like
disabling escape sequence interpretation, and breaks existing behavior.

      Ed.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]