bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] regex: fix backreference matching


From: Egor Ignatov
Subject: Re: [PATCH] regex: fix backreference matching
Date: Tue, 29 Jun 2021 11:51:13 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0

Well, then I have a few questions about matching and capturing
groups.

1. "ab" -> "^(a*)*(.)"
So, from your test case I can assume that:
regs[0] = (0, 2]
regs[1] = (0, 1]
regs[2] = (1, 2]

But if we add backref at the end:
2. "ab" -> "^(a*)*(.)\1"
check_matching matches the whole string "ab",
this means that the first group accepted 'a' but in fact is empty,
other vice it could not match backref later on.
What is the correct match here? Is check_matching wrong and
should match only "a" in the 2nd group (as it would be with
"^(a*)(.)\1")? or should set_regs check for this and shrink the
match?

Next,
3. "aaba" -> "^(a*)*(.)\1"
Again check_matching matches "aaba", then the first group
is "a", and were the 2nd 'a' goes?

In PCRE2 they save empty string for an optional groups like
"(a*)*", and I assume this is because capturing group saves the
last match and empty string matches. So in this case they would
match only "aab".

So please tell me how all 3 cases should match, this will
help me to fix the initial issue with backrefs and implement the
correct matching.

Thanks.

--
Egor




reply via email to

[Prev in Thread] Current Thread [Next in Thread]