bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time

From:	Eli Zaretskii
Subject:	bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Date:	Mon, 27 Sep 2021 08:14:09 +0300

> Date: Mon, 27 Sep 2021 00:43:05 +0000
> From: Gregory Heytings <gregory@heytings.org>
> cc: 50733@debbugs.gnu.org, dgutov@yandex.ru, mardani29@yahoo.es
> 
> >> rg O_CREAT takes 0.18 seconds
> >> gid O_CREAT takes 0.02 seconds
> >> rg O.?CREAT takes 0.18 seconds
> >> gid O.?CREAT takes 0.93 seconds
> >> rg O.*CREAT takes 0.19 seconds
> >> gid O.*CREAT takes 1.73 seconds
> >>
> >> Isn't idutils the one that doesn't scale?
> >
> > No.  You compare apples with oranges.
> 
> No.  I compare apples with apples.  I compare regexp searches in a code 
> base with regexp searches in a code base.

There are different implementations of regexp searches in a code
base.  Are you saying that they all are "apples" for the purpose of
comparing scalability?  Then why not compare IDUtils with Grep?

To get back to the issue at hand: we are talking (or at least I was
talking) about scalability of an algorithm, not about some particular
implementation of the algorithm.  Ripgrep is a multithreaded program,
whereas idutils is single-threaded.  So for a fair comparison of
scalability of these two main ideas: file-based search vs DB search,
you need at the very least to limit ripgrep to a single thread.  And
then you need to run each program on code bases of various sizes,
preferably those which differ by orders of magnitude or close to that,
and see their O(n) behavior.  And exclude from your comparison
command-line options that require IDUtils to access the files in
addition to the DB.  That would be at least an approximation to
comparing apples to apples.  For an even better approximation, one
should analyze the implementations of regexp search in each program
and see if there aren't any significant differences there, or in any
other aspects of the implementation that could significantly affect
the speed.  If there are such aspects, then the study of scalability
should either account for the differences or implement the same regexp
etc. algorithms in both programs before comparing them.

But frankly, I don't understand why this all would be needed at all,
because it should be absolutely clear that searching the files in the
filesystem will always scale worse than reading a well-indexed DB.
IDUtils is an example of the latter, and it beats many utilities that
search the files, including ripgrep, as long as it doesn't need to
access the files themselves.  But even if it doesn't always beat them
(which you didn't yet demonstrate), it just means the ideas of its
design should be taken further and/or implemented better, that's all.

> Because this is a thread about regexp searches in a code base.

As should be clear from what I wrote at the beginning of this
sub-thread, it is about scalability of the various algorithms for such
searches, not about specific programs:

> > Btw, I don't understand why we focus on general-purpose text-searching 
> > tools for these features.  Why not focus on packages like ID Utils 
> > instead, they are so much faster.  Daniel, could you time the same 
> > search in that large tree when xref-search-program is 'gid'?  (You'd 
> > need to run 'mkid' first, to create the ID database, but that is 
> > one-time, and is very fast.)  As I told many times, I think this is the 
> > future: program language sensitive tools that use a precomputed DB.
> 
> It should now be clear that idutils is not "so much faster", it is 
> marginally faster in one case, and slower in all other cases.  And it 
> doesn't do what project-find-regexp needs, because it ignores (most, but 
> not all) tokens in comments (oh, BTW, including tokens in comments has 
> been on its TODO for at least 20 years).  Creating the ID database is also 
> not "very fast", and the ID database cannot be updated incrementally (oh, 
> BTW, incremental database updates has been on its TODO list for at least 
> 20 years).  In short, it's an outdated tool, that isn't maintained 
> anymore, and that can't be the future.

I said that such tools are the future, not that IDUtils itself is
necessarily the future (though it could be, if someone picks up its
development).  Again, this is about looking for the best tools for
this job, and I still stand by my opinion: focusing only on
general-purpose search tools is sub-optimal.  We should be on the
lookout for the other kind, and support those of them that exist at
least as well as we support Grep and its ilk.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time, (continued)

Prev by Date: bug#50811: 28.0.50; Misleading Docstring for read-string function
Next by Date: bug#50837: 27.2; ccrypt encryption prompt isn't recognized by comint-password-prompt-regex
Previous by thread: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Next by thread: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Index(es):
- Date
- Thread