[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Concatenation of strongly typed regexp constants
From: |
Stuart Ferguson |
Subject: |
Re: Concatenation of strongly typed regexp constants |
Date: |
Mon, 30 Jan 2023 17:02:54 +0000 |
Sorry, I put the comments with the print statement outputs in the
wrong places in the code sample. Fixed below:
On Mon, 30 Jan 2023 at 16:19, Stuart Ferguson <stuart.fergs@gmail.com> wrote:
>
> Hello.
>
> This email is not a request for help, but a suggestion.
>
> One of the fairly recent features of gawk that I like very much is the
> inclusion of strongly typed regexp constants. Aside from the benefits
> set out in section 6.1.2.2. of the gawk user's manual, strongly typed
> regexp constants can be easier to construct than dynamic regexp
> strings -- there is no need for double-backslashes in escape
> sequences. As an example, the following variable assignments have
> equivalent dynamic regexps:
>
> regex_string = "\\[\\s+\\w+\\s+\\]"
> strong_regex = @/\[\s+\w+\s+\]/
>
> I have found it useful to construct complicated regular expressions by
> combining sub-elements, where each regexp sub-element may be useful in
> its own right. The code below shows an admittedly simplistic example:
>
> BEGIN {
> regx = @/[a-z]+\s+[a-z]+\s*/
> regy = @/[0-9]+\s+[0-9]+\s*/
>
> strx = stry = strxy = "cat dog 123 456"
>
> print typeof(regx) # > regexp
> print typeof(regy) # > regexp
> print typeof(regx regy) # > string
>
> sub(regx, "replaced ", strx)
> sub(regy, "replaced ", stry)
> sub(regx regy, "replaced ", strxy)
>
> print strx # > replaced 123 456
> print stry # > cat dog replaced
> print strxy # > replaced
> }
>
> The concatenation of regx and regy, both of which are regexp
> variables, produces a string. This is as expected according to section
> 6.1.2.2. of the gawk user's manual. Furthermore, the concatenated
> regexp string works precisely as intended in the sub() function.
>
> Nevertheless, my suggestion is as follows: if a concatenation involves
> only regexp typed variables, the result should be regexp typed. Hence:
>
> regxy = regx regy
>
> should produce variable regxy with type "regexp".
>
> Section 6.1.2.2 of the user's manual shows that regexp variables used
> as the third argument to a sub() function retain their type. It seems
> to me useful and consistent that concatenation of regexp variables
> should achieve the same end.
>
> Cheers
>
> Stuart