bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: dfa is slower than regex in some cases


From: Norihiro Tanaka
Subject: Re: dfa is slower than regex in some cases
Date: Sun, 03 Dec 2017 14:23:50 +0900

I attached a test case on this mail.

On Sun, 03 Dec 2017 14:22:23 +0900
Norihiro Tanaka <address@hidden> wrote:

> Hi,
> 
> A case that dfa is slower than regex on bug-gawk mailing list is
> discussed 4 months ago.
> 
> http://lists.gnu.org/archive/html/bug-gawk/2017-07/msg00002.html
> 
> Although gawk is fixed to not use dfa in some cases, it looks like a
> temporary treatment. (11d4ea518166ffbc0c2fe85d090723e8f299486c)
> 
> --------
> $ env LC_ALL=C ./gawk --version
> GNU Awk 4.2.0, API: 2.0 (GNU MPFR 3.1.5, GNU MP 4.3.2)
> Copyright (C) 1989, 1991-2017 Free Software Foundation.
> 
> This program is free software; you can redistribute it and/or modify
> it under the terms of the GNU General Public License as published by
> the Free Software Foundation; either version 3 of the License, or
> (at your option) any later version.
> 
> This program is distributed in the hope that it will be useful,
> but WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> GNU General Public License for more details.
> 
> You should have received a copy of the GNU General Public License
> along with this program. If not, see http://www.gnu.org/licenses/.
> $ time -p env LC_ALL=C ./gawk -f test.awk test.inp >/dev/null
> real 1.38
> user 1.17
> sys 0.11
> $ time -p env LC_ALL=C GAWK_NO_DFA=1 ./gawk -f test.awk test.inp >/dev/null
> real 0.51
> user 0.43
> sys 0.07
> --------
> 
> This problem seems to be due to dfa.c.  In fact, grep is also slower
> than not to use dfa in the same cases.
> 
> --------
> $ env LC_ALL=C src/grep --version
> grep (GNU grep) 3.1
> Copyright (C) 2017 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> Written by Mike Haertel and others, see 
> <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
> $ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
> real 1.06
> user 0.97
> sys 0.08
> --------
> 
>   # grep does not have control by GREP_NO_DFA environment, so I changed
>   # as following and rebuilded, and run the same test as above.
> 
> ========
> diff --git a/lib/dfa.c b/lib/dfa.c
> index 2b9c80e..f234b88 100644
> --- a/lib/dfa.c
> +++ b/lib/dfa.c
> @@ -3474,21 +3474,7 @@ dfacomp (char const *s, size_t len, struct dfa *d, 
> bool searchflag)
>    dfaparse (s, len, d);
>    dfassbuild (d);
> 
> -  if (dfa_supported (d))
> -    {
> -      dfaoptimize (d);
> -      dfaanalyze (d, searchflag);
> -    }
> -  else
> -    {
> -      d->dfaexec = dfaexec_noop;
> -    }
> -
> -  if (d->superset)
> -    {
> -      d->fast = true;
> -      dfaanalyze (d->superset, searchflag);
> -    }
> +  d->dfaexec = dfaexec_noop;
>  }
> 
>  /* Free the storage held by the components of a dfa.  */
> ========
> 
> --------
> $ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null
> real 0.43
> user 0.37
> sys 0.05
> --------
> 
> However, I do not still know why dfa is slower than regex in this case.
> 
> Thanks,
> Norihiro
> 

--
 田中 紀洋 (Norihiro TANAKA)
 E-mail : address@hidden

Attachment: test.inp
Description: Binary data

Attachment: test.awk
Description: Binary data

Attachment: test.pat
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]