[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Concatenation of strongly typed regexp constants
From: |
Stuart Ferguson |
Subject: |
Concatenation of strongly typed regexp constants |
Date: |
Mon, 30 Jan 2023 16:19:15 +0000 |
Hello.
This email is not a request for help, but a suggestion.
One of the fairly recent features of gawk that I like very much is the
inclusion of strongly typed regexp constants. Aside from the benefits
set out in section 6.1.2.2. of the gawk user's manual, strongly typed
regexp constants can be easier to construct than dynamic regexp
strings -- there is no need for double-backslashes in escape
sequences. As an example, the following variable assignments have
equivalent dynamic regexps:
regex_string = "\\[\\s+\\w+\\s+\\]"
strong_regex = @/\[\s+\w+\s+\]/
I have found it useful to construct complicated regular expressions by
combining sub-elements, where each regexp sub-element may be useful in
its own right. The code below shows an admittedly simplistic example:
BEGIN {
regx = @/[a-z]+\s+[a-z]+\s*/
regy = @/[0-9]+\s+[0-9]+\s*/
strx = stry = strxy = "cat dog 123 456"
print typeof(regx) # > regexp
print typeof(regy) # > regexp
print typeof(regx regy) # > string
sub(regx, "replaced ", strx) # > replaced 123 456
sub(regy, "replaced ", stry) # > cat dog replaced
sub(regx regy, "replaced ", strxy) # > replaced
print strx
print stry
print strxy
}
The concatenation of regx and regy, both of which are regexp
variables, produces a string. This is as expected according to section
6.1.2.2. of the gawk user's manual. Furthermore, the concatenated
regexp string works precisely as intended in the sub() function.
Nevertheless, my suggestion is as follows: if a concatenation involves
only regexp typed variables, the result should be regexp typed. Hence:
regxy = regx regy
should produce variable regxy with type "regexp".
Section 6.1.2.2 of the user's manual shows that regexp variables used
as the third argument to a sub() function retain their type. It seems
to me useful and consistent that concatenation of regexp variables
should achieve the same end.
Cheers
Stuart
- Concatenation of strongly typed regexp constants,
Stuart Ferguson <=