+prohibit_doubled_words_ = \
+ the then in an on if is it but for or at and do to
+# expand the regex before running the check to avoid using expensive captures
+prohibit_doubled_word_expanded_ = \
+ $(shell echo $(prohibit_doubled_words_) | sed -r 's/\b(\S+)\b/\1\\s\+\1/g')
I bet GNU make has builtins that could do this operation without forking
to $(shell). This stage results in a variable containing:
the\s\+the then\s\+then ...
Maybe:
$(join $(prohibit_doubled_words_),$(addprefix
\s\+,$(prohibit_doubled_words_)))
prohibit_doubled_word_RE_ ?= \
- /\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims
+ /\b(?:$(subst $(space),|,$(prohibit_doubled_word_expanded_)))\b/gims
At any rate, you want to end up with the perl regex:
\b(?:the\s\+the|then\s\+then|...)\b/gims
prohibit_doubled_word_ = \
-e 'while ($(prohibit_doubled_word_RE_))' \
$(perl_filename_lineno_text_)
At any rate, I doubt my make fine-tuning matters, and you are definitely
correct that avoiding back-references makes perl regexes more efficient.