bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time


From: Gregory Heytings
Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Date: Mon, 27 Sep 2021 00:43:05 +0000


Out of curiosity, because of your "it doesn't scale" remark, I just compared the efficiency of ripgrep and idutils on the latest Linux kernel tarball (1.4 GB in 78464 files):

mkid takes 31 seconds

rg O_CREAT takes 0.18 seconds
gid O_CREAT takes 0.02 seconds
rg O.?CREAT takes 0.18 seconds
gid O.?CREAT takes 0.93 seconds
rg O.*CREAT takes 0.19 seconds
gid O.*CREAT takes 1.73 seconds

Isn't idutils the one that doesn't scale?

No.  You compare apples with oranges.


No. I compare apples with apples. I compare regexp searches in a code base with regexp searches in a code base. Because this is a thread about regexp searches in a code base. It's you who started talking about oranges instead, namely searching for identifiers in a code base.

The only case in which idutils is faster (if one does not take the time that was spent to build the database into account, and if one considers that it's okay to ignore some matches in comments) is a plain identifier; from a user viewpoint getting an answer in 0.2 seconds on such a big code base is as good as getting an answer in 0.02 seconds. It's slower, much slower in all other cases, whenever a regexp is used --- which is what project-find-regexp is all about.

See what I mean?  Even when it's better, it's worse.  Perfect reasoning.


Perfect reading. Nowhere did I say that it's worse when it's better. I said that from a user viewpoint, a tool that is 155 ms faster in one (and only one) case, and slower in all other cases, is worse, and that from a user viewpoint this single "155 ms faster case" does not matter enough to justify the use of a more complex tool.

Note that Emacs takes some time (55 ms for a search for O_CREAT on the Emacs trunk) to read, process and display the output, which must be taken into account to calculate the perceived difference between search tool candidates.

Some more detailed numbers:

1. on Emacs' trunk (4759 files, 174 MB)

gid O_CREAT : 10 ms
gid O[A-Z_]CREAT : 75 ms
gid O.?CREAT : 70 ms
gid O.*CREAT : 70 ms

rg O_CREAT : 25 ms
rg O[A-Z_]CREAT : 25 ms
rg O.?CREAT : 25 ms
rg O.*CREAT : 25 ms

rg -w O_CREAT : 30 ms
rg -w O[A-Z_]CREAT : 30 ms
rg -w O.?CREAT : 30 ms
rg -w O.*CREAT : 30 ms

2. on the latest Linux kernel tarball (78464 files, 1.4 GB)

gid O_CREAT : 25 ms
gid O[A-Z_]CREAT : 1375 ms
gid O.?CREAT : 930 ms
gid O.*CREAT : 1730 ms

rg O_CREAT : 180 ms
rg O[A-Z_]CREAT : 185 ms
rg O.?CREAT : 185 ms
rg O.*CREAT : 185 ms

rg -w O_CREAT : 185 ms
rg -w O[A-Z_]CREAT : 190 ms
rg -w O.?CREAT : 190 ms
rg -w O.*CREAT : 190 ms

I initially reacted to your paragraph:


Btw, I don't understand why we focus on general-purpose text-searching tools for these features. Why not focus on packages like ID Utils instead, they are so much faster. Daniel, could you time the same search in that large tree when xref-search-program is 'gid'? (You'd need to run 'mkid' first, to create the ID database, but that is one-time, and is very fast.) As I told many times, I think this is the future: program language sensitive tools that use a precomputed DB.


It should now be clear that idutils is not "so much faster", it is marginally faster in one case, and slower in all other cases. And it doesn't do what project-find-regexp needs, because it ignores (most, but not all) tokens in comments (oh, BTW, including tokens in comments has been on its TODO for at least 20 years). Creating the ID database is also not "very fast", and the ID database cannot be updated incrementally (oh, BTW, incremental database updates has been on its TODO list for at least 20 years). In short, it's an outdated tool, that isn't maintained anymore, and that can't be the future.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]