help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Concatenation of strongly typed regexp constants


From: Stuart Ferguson
Subject: Concatenation of strongly typed regexp constants
Date: Mon, 30 Jan 2023 16:19:15 +0000

Hello.

This email is not a request for help, but a suggestion.

One of the fairly recent features of gawk that I like very much is the
inclusion of strongly typed regexp constants. Aside from the benefits
set out in section 6.1.2.2. of the gawk user's manual, strongly typed
regexp constants can be easier to construct than dynamic regexp
strings -- there is no need for double-backslashes in escape
sequences. As an example, the following variable assignments have
equivalent dynamic regexps:

regex_string = "\\[\\s+\\w+\\s+\\]"
strong_regex = @/\[\s+\w+\s+\]/

I have found it useful to construct complicated regular expressions by
combining sub-elements, where each regexp sub-element may be useful in
its own right. The code below shows an admittedly simplistic example:

BEGIN {
  regx = @/[a-z]+\s+[a-z]+\s*/
  regy = @/[0-9]+\s+[0-9]+\s*/

  strx = stry = strxy = "cat dog 123 456"

  print typeof(regx)    # > regexp
  print typeof(regy)    # > regexp
  print typeof(regx regy)    # > string

  sub(regx, "replaced ", strx)    # > replaced 123 456
  sub(regy, "replaced ", stry)    # > cat dog replaced
  sub(regx regy, "replaced ", strxy)    # > replaced

  print strx
  print stry
  print strxy
}

The concatenation of regx and regy, both of which are regexp
variables, produces a string. This is as expected according to section
6.1.2.2. of the gawk user's manual. Furthermore, the concatenated
regexp string works precisely as intended in the sub() function.

Nevertheless, my suggestion is as follows: if a concatenation involves
only regexp typed variables, the result should be regexp typed. Hence:

regxy = regx regy

should produce variable regxy with type "regexp".

Section 6.1.2.2 of the user's manual shows that regexp variables used
as the third argument to a sub() function retain their type. It seems
to me useful and consistent that concatenation of regexp variables
should achieve the same end.

Cheers

Stuart



reply via email to

[Prev in Thread] Current Thread [Next in Thread]