|
From: | Gregory Heytings |
Subject: | bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time |
Date: | Mon, 27 Sep 2021 00:43:05 +0000 |
Out of curiosity, because of your "it doesn't scale" remark, I just compared the efficiency of ripgrep and idutils on the latest Linux kernel tarball (1.4 GB in 78464 files):mkid takes 31 seconds rg O_CREAT takes 0.18 seconds gid O_CREAT takes 0.02 seconds rg O.?CREAT takes 0.18 seconds gid O.?CREAT takes 0.93 seconds rg O.*CREAT takes 0.19 seconds gid O.*CREAT takes 1.73 seconds Isn't idutils the one that doesn't scale?No. You compare apples with oranges.
No. I compare apples with apples. I compare regexp searches in a code base with regexp searches in a code base. Because this is a thread about regexp searches in a code base. It's you who started talking about oranges instead, namely searching for identifiers in a code base.
The only case in which idutils is faster (if one does not take the time that was spent to build the database into account, and if one considers that it's okay to ignore some matches in comments) is a plain identifier; from a user viewpoint getting an answer in 0.2 seconds on such a big code base is as good as getting an answer in 0.02 seconds. It's slower, much slower in all other cases, whenever a regexp is used --- which is what project-find-regexp is all about.See what I mean? Even when it's better, it's worse. Perfect reasoning.
Perfect reading. Nowhere did I say that it's worse when it's better. I said that from a user viewpoint, a tool that is 155 ms faster in one (and only one) case, and slower in all other cases, is worse, and that from a user viewpoint this single "155 ms faster case" does not matter enough to justify the use of a more complex tool.
Note that Emacs takes some time (55 ms for a search for O_CREAT on the Emacs trunk) to read, process and display the output, which must be taken into account to calculate the perceived difference between search tool candidates.
Some more detailed numbers: 1. on Emacs' trunk (4759 files, 174 MB) gid O_CREAT : 10 ms gid O[A-Z_]CREAT : 75 ms gid O.?CREAT : 70 ms gid O.*CREAT : 70 ms rg O_CREAT : 25 ms rg O[A-Z_]CREAT : 25 ms rg O.?CREAT : 25 ms rg O.*CREAT : 25 ms rg -w O_CREAT : 30 ms rg -w O[A-Z_]CREAT : 30 ms rg -w O.?CREAT : 30 ms rg -w O.*CREAT : 30 ms 2. on the latest Linux kernel tarball (78464 files, 1.4 GB) gid O_CREAT : 25 ms gid O[A-Z_]CREAT : 1375 ms gid O.?CREAT : 930 ms gid O.*CREAT : 1730 ms rg O_CREAT : 180 ms rg O[A-Z_]CREAT : 185 ms rg O.?CREAT : 185 ms rg O.*CREAT : 185 ms rg -w O_CREAT : 185 ms rg -w O[A-Z_]CREAT : 190 ms rg -w O.?CREAT : 190 ms rg -w O.*CREAT : 190 ms I initially reacted to your paragraph:
Btw, I don't understand why we focus on general-purpose text-searching tools for these features. Why not focus on packages like ID Utils instead, they are so much faster. Daniel, could you time the same search in that large tree when xref-search-program is 'gid'? (You'd need to run 'mkid' first, to create the ID database, but that is one-time, and is very fast.) As I told many times, I think this is the future: program language sensitive tools that use a precomputed DB.
It should now be clear that idutils is not "so much faster", it is marginally faster in one case, and slower in all other cases. And it doesn't do what project-find-regexp needs, because it ignores (most, but not all) tokens in comments (oh, BTW, including tokens in comments has been on its TODO for at least 20 years). Creating the ID database is also not "very fast", and the ID database cannot be updated incrementally (oh, BTW, incremental database updates has been on its TODO list for at least 20 years). In short, it's an outdated tool, that isn't maintained anymore, and that can't be the future.
[Prev in Thread] | Current Thread | [Next in Thread] |