[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Should project delegate project-find-regexp?
From: |
Dmitry Gutov |
Subject: |
Re: Should project delegate project-find-regexp? |
Date: |
Mon, 18 Apr 2022 06:01:37 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 |
On 08.04.2022 11:40, Joel Reicher wrote:
Dmitry Gutov <dgutov@yandex.ru> writes:
On 07.04.2022 14:48, Joel Reicher wrote:
It seems to me that, at least in the case of git, 'git grep' offers a superior
implementation to anything offered by the generic implementation of
project-find-regexp.
Last I checked, there was no way to make 'git grep' search in
untracked files.
There's a --untracked option, at least now.
Thanks, that works. And we could try to support it. "ignore patterns"
would require some code duplication, but that's doable. Not "error
patterns", sorry, that was a typo.
But I've benchmarked searching through a large project (200000 files),
and the results seem mixed.
--untracked does slow it down noticeably.
Examples:
$ time git grep -z -e symlinks >/dev/null
________________________________________________________
Executed in 1,11 secs fish external
usr time 2,16 secs 720,00 micros 2,16 secs
sys time 3,65 secs 192,00 micros 3,65 secs
$ time git grep -z --untracked -e symlinks >/dev/null
________________________________________________________
Executed in 1,81 secs fish external
usr time 2,42 secs 0,00 micros 2,42 secs
sys time 4,00 secs 938,00 micros 4,00 secs
At the same time, if I pipe the results of 'git ls-files' to ripgrep:
$ time git ls-files -z -c -o --exclude-standard | xargs -0 rg --null
--no-messages -g '!*/' -nH -e symlinks >/dev/null
________________________________________________________
Executed in 2,50 secs fish external
usr time 2,91 secs 1,40 millis 2,90 secs
sys time 3,02 secs 0,37 millis 3,02 secs
...it looks a little worse. But what if I add some forced parallelism?
$ time git ls-files -z -c -o --exclude-standard | xargs -0 -P8 rg --null
--no-messages -g '!*/' -nH -e symlinks >/dev/null
________________________________________________________
Executed in 1,08 secs fish external
usr time 4,03 secs 1,50 millis 4,03 secs
sys time 3,60 secs 0,42 millis 3,60 secs
...it shows better performance. Unfortunately, using the -P argument of
xargs for grepping because of synchronization problems, but I've wrote
about this to ripgrep's issue tracker
(https://github.com/BurntSushi/ripgrep/issues/273#issuecomment-1100792783),
and we might get such feature there natively someday.
YMMV, but on this machine at least this seems to demonstrate that 'git
grep' isn't always better, at least. And its '--threads' argument
doesn't seem to make any difference.
Now, the default searcher (grep) is a little slower than ripgrep, but at
least we have a faster option present.
Now, when it comes to Emacs, we also lose a fair amount of time on
parsing the list of files internally (the output of 'git ls-files')
before sending it to 'xargs rg' or 'xargs grep'.
There are a few approaches how to deal with this. Maybe we'd have a
generic function which constructs the shell command (which we'd simply
concatenate when constructing the shell command for search). Or we'd
have 'project-files' return some opaque value with a bunch of accessors
which would allow parsing the list of files lazily, and simply reuse the
output buffer as input without parsing it (this would save ~500ms in my
measurements in this scenario). Or we'd cache the list of files, and cut
the whole 1s with that.
We've discussed some of this before (like the caching thing) but so far
it's up in the air.
But given the possibility of being able to choose a faster search
problem, I'm not sure about making the search a project method (which
would lock such projects into one search implementation). I'd rather try
to work on other inefficiencies first.
Do try installing ripgrep, though. The search program is configured
through the xref-search-program defcustom.