bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time


From: Dmitry Gutov
Subject: bug#50733: 28.0.1; project-find-regexp can block Emacs for a long time
Date: Thu, 23 Sep 2021 02:09:16 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0

On 23.09.2021 00:58, Daniel Martín wrote:
Dmitry Gutov <dgutov@yandex.ru> writes:

IIRC you are using macOS. I received another report recently that
find/grep based tooling, and project-find-regexp in particular, are
pretty slow on that OS.

Yes, this is on macOS.


When you say "block for a long time", how long are we talking about?

To try it, evaluate

   (benchmark 1 '(project-find-regexp "new-collection"))

I usually work on a monorepo with ~67000 tracked files (many of them big
binary files).  Here's what I get when using ripgrep as the xref search
program:

Elapsed time: 36.087181s (8.067474s in 22 GCs)

Thanks for testing. Did the switch to ripgrep help much?

I wonder if we should advertise this setting and recommendation more prominently, at least until we get auto-detection.

Running the same search with ripgrep from the command line takes around
6 seconds.

Is that with an SSD?

Your project sounds respectable. The torvalds-linux repo I have checked out here is also 70000 files, but I guess your files are bigger.

Another benchmark to try is

   (benchmark 1 '(project-files (project-current)))

Elapsed time: 1.590223s (0.432372s in 1 GCs)

That's a while (I wonder if you find 'project-find-file' usable with this kind of performance), but still better than I might have expected.

Here's an ELisp profile of the first benchmark:

         8696  78% - command-execute
         8696  78%  - call-interactively
         8493  76%   - funcall-interactively
         8480  76%    - eval-expression
         8479  76%     - eval
         8479  76%      - project-find-regexp
         8227  74%       - xref--show-xrefs
         8227  74%        - xref--show-xref-buffer
         5584  50%         - #<compiled 0x140b5a40100bafc6>
         5584  50%          - apply
         5584  50%           - project--find-regexp-in-files
         5574  50%            - xref-matches-in-files
         3016  27%             - xref--convert-hits
         3000  27%              - mapcan
         2992  27%               - #<compiled -0x6cdcd56218925c3>
         2734  24%                - xref--collect-matches
         2094  18%                 - xref--collect-matches-1
          800   7%                  + xref-make-match
          774   7%                  + xref-make-file-location
          104   0%                   xref--find-file-buffer
           80   0%                   file-remote-p
           51   0%                   xref--regexp-syntax-dependent-p
          906   8%             + xref--process-file-region
          331   2%               sort
         1413  12%         + xref--analyze
         1230  11%         + xref--show-common-initialize
          249   2%       + project-files
            3   0%       + project-current
            9   0%    + minibuffer-complete
            4   0%    + execute-extended-command
          203   1%   + byte-code
         2314  20% - ...
         2314  20%    Automatic GC
           27   0% + timer-event-handler

When you have a lot of matches, at some point Lisp overhead is going to show up. E.g., the searches seem almost instantaneous with up to several thousand matches here, but 10000s and 100000s - yeah, I have to wait.

Help with optimizations in that area (around/in xref-matches-in-files and xref--convert-hits) is welcome, but I'm not sure how much more we can squeeze.

The search time is reduced when I use a more specific search term,
presumably because the number of results is lower and the Elisp
post-processing takes less time.  Here's what I got, for example, when I
search for something with results from only one file:

Elapsed time: 6.859815s (0.864738s in 2 GCs)

Compared to the time taken by the same query from the command line
(6.5s) shows that the Elisp post-processing time is probably negligible
in this scenario.

It's a good result. A little suspicious, though: given that project-find-regexp calls project-files first, and the latter takes 1.5s, the difference should ~ that time. But I guess rg also needs to traverse the directory tree, and spends some time on doing that too.

What else can be done -- again, if someone wants to investigate an asynchronous/nonblocking API for Xref (or using threads) -- welcome. The case when most of the time is spent in the subprocess is a good match. But I don't think we'll manage this for the upcoming release.

Another thing you can do is set up the additional ignores for the project. If those big binary files are not something you are interested in searching and touching, you could add ignore entries for them. When the vc project backend is in use (default), it is currently done via .dir-locals.el: the variable is project-vc-ignores, it's a list of strings that should be globs. See its docstring and the explanation in project-ignores's docstring.

Note that ignores also affect project-find-file.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]