help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Concatenation of strongly typed regexp constants


From: Stuart Ferguson
Subject: Re: Concatenation of strongly typed regexp constants
Date: Mon, 30 Jan 2023 17:02:54 +0000

Sorry, I put the comments with the print statement outputs in the
wrong places in the code sample. Fixed below:

On Mon, 30 Jan 2023 at 16:19, Stuart Ferguson <stuart.fergs@gmail.com> wrote:
>
> Hello.
>
> This email is not a request for help, but a suggestion.
>
> One of the fairly recent features of gawk that I like very much is the
> inclusion of strongly typed regexp constants. Aside from the benefits
> set out in section 6.1.2.2. of the gawk user's manual, strongly typed
> regexp constants can be easier to construct than dynamic regexp
> strings -- there is no need for double-backslashes in escape
> sequences. As an example, the following variable assignments have
> equivalent dynamic regexps:
>
> regex_string = "\\[\\s+\\w+\\s+\\]"
> strong_regex = @/\[\s+\w+\s+\]/
>
> I have found it useful to construct complicated regular expressions by
> combining sub-elements, where each regexp sub-element may be useful in
> its own right. The code below shows an admittedly simplistic example:
>
> BEGIN {
>   regx = @/[a-z]+\s+[a-z]+\s*/
>   regy = @/[0-9]+\s+[0-9]+\s*/
>
>   strx = stry = strxy = "cat dog 123 456"
>
>   print typeof(regx)    # > regexp
>   print typeof(regy)    # > regexp
>   print typeof(regx regy)    # > string
>
>   sub(regx, "replaced ", strx)
>   sub(regy, "replaced ", stry)
>   sub(regx regy, "replaced ", strxy)
>
>   print strx    # > replaced 123 456
>   print stry    # > cat dog replaced
>   print strxy    # > replaced
> }
>
> The concatenation of regx and regy, both of which are regexp
> variables, produces a string. This is as expected according to section
> 6.1.2.2. of the gawk user's manual. Furthermore, the concatenated
> regexp string works precisely as intended in the sub() function.
>
> Nevertheless, my suggestion is as follows: if a concatenation involves
> only regexp typed variables, the result should be regexp typed. Hence:
>
> regxy = regx regy
>
> should produce variable regxy with type "regexp".
>
> Section 6.1.2.2 of the user's manual shows that regexp variables used
> as the third argument to a sub() function retain their type. It seems
> to me useful and consistent that concatenation of regexp variables
> should achieve the same end.
>
> Cheers
>
> Stuart



reply via email to

[Prev in Thread] Current Thread [Next in Thread]