bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate


From: Dimitry Andric
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Date: Mon, 27 Mar 2023 19:54:27 +0200

On 27 Mar 2023, at 11:14, Koen Claessen <koen@chalmers.se> wrote:
> 
> Running the command:
> 
>  echo a | grep -E -w '((()|a)|())*'
> 
> does not terminate, and uses a LOT of processor time, for all versions of
> grep I have tried.
> 
> This is the smallest case that could be found; simplifying anything in the
> input and/or expression leads to correct behavior.

Reproducible with GNU grep 3.7 on Ubuntu 22:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep -E 
-w ((()|a)|())*

It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), 
and it is that implementation which ends up in an endless loop.

It loops between lines 1415, 1417 and 1443, but idx and cur_node never change 
from 0:

  1378  static reg_errcode_t
  1379  __attribute_warn_unused_result__
  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t 
nmatch,
  1381            regmatch_t *pmatch, bool fl_backtrack)
  1382  {
...
  1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
  1416      {
  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
  1418
  1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
  1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
...
  1442        /* Proceed to next node.  */
  1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, 
prev_idx_match,
  1444                                      &idx, cur_node,
  1445                                      &eps_via_nodes, fs);
  1446
  1447        if (__glibc_unlikely (cur_node < 0))
...
  1465          }
  1466      }

-Dimitry

P.S.: Interestingly this does not reproduce with BSD grep, which returns 
immediately with "a".

Attachment: signature.asc
Description: Message signed with OpenPGP


reply via email to

[Prev in Thread] Current Thread [Next in Thread]