emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A project-files implementation for Git projects


From: Dmitry Gutov
Subject: Re: A project-files implementation for Git projects
Date: Thu, 3 Oct 2019 16:19:04 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 03.10.2019 11:33, Tassilo Horn wrote:

+(cl-defmethod project-files ((project (head vc)) &optional dirs)
+  (cl-mapcan
+   (lambda (dir)
+     (let (backend)
+       (if (and (file-equal-p dir (cdr project))
+                (setq backend (vc-responsible-backend dir))
+                nil
                    ^^^

So this disables the VC operation.  I've removed it, and the speed
improvement is good here.  This is my test case (the Emacs repository):

Yes, sorry. Used this for comparative testing and forgot to take it out.

The Emacs repository is the one I've mostly tested on as well.

--8<---------------cut here---------------start------------->8---
(let* ((dir "~/Repos/el/emacs")
        (p (project-current nil dir))
        f1 f2)
   (let ((t1 (benchmark-run 10
              (setq f1 (project-files p))))
        (t2 (benchmark-run 10
              (setq f2 (project--files-in-directory
                        dir (project--dir-ignores p dir))))))
     (message "Files: %d (VC) vs. %d (find)" (length f1) (length f2))
     (message "VC) Elapsed time: %fs (%fs in %d GCs)"
             (car t1) (nth 2 t1) (nth 1 t1))
     (message "Find) Elapsed time: %fs (%fs in %d GCs)"
             (car t2) (nth 2 t2) (nth 1 t2)))
   (let ((d1 (cl-set-difference f1 f2 :test #'string=))
        (d2 (cl-set-difference f2 f1 :test #'string=)))
     (message "Files found by VC but not by find:")
     (dolist (f d1)
       (message "  %s" f))
     (message "Files found by find but not by VC:")
     (dolist (f d2)
       (message "  %s" f))))
--8<---------------cut here---------------end--------------->8---

Here is the output:

--8<---------------cut here---------------start------------->8---
VC) Elapsed time: 1.379560s (0.308720s in 6 GCs)
Find) Elapsed time: 4.397054s (0.200695s in 4 GCs)
Files found by VC but not by find:
   /home/horn/Repos/el/emacs/doc/lispintro/cons-1.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/cons-2.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/cons-2a.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/cons-3.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/cons-4.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/cons-5.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/drawers.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/lambda-1.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/lambda-2.pdf
   /home/horn/Repos/el/emacs/doc/lispintro/lambda-3.pdf
   /home/horn/Repos/el/emacs/etc/refcards/Makefile
   /home/horn/Repos/el/emacs/etc/refcards/gnus-logo.pdf
   /home/horn/Repos/el/emacs/lib/_Noreturn.h
   /home/horn/Repos/el/emacs/lib/stdalign.in.h
   /home/horn/Repos/el/emacs/lib/stddef.in.h
   /home/horn/Repos/el/emacs/lib/stdint.in.h
   /home/horn/Repos/el/emacs/lib/stdio-impl.h
   /home/horn/Repos/el/emacs/lib/stdio.in.h
   /home/horn/Repos/el/emacs/lib/stdlib.in.h
   /home/horn/Repos/el/emacs/m4/__inline.m4
   /home/horn/Repos/el/emacs/test/data/xdg/mimeinfo.cache
   /home/horn/Repos/el/emacs/test/lisp/progmodes/flymake-resources/Makefile
   /home/horn/Repos/el/emacs/test/manual/etags/Makefile
   /home/horn/Repos/el/emacs/test/manual/etags/make-src/Makefile
   /home/horn/Repos/el/emacs/test/manual/indent/Makefile

The difference is that the 'find' based method does not support whitelist entries yet.

When it does, that might make its performance slightly worse, but probably not in gtk or gnulib repos.

Files found by find but not by VC:
   /home/horn/Repos/el/emacs/aclocal.m4
   /home/horn/Repos/el/emacs/config.status
   /home/horn/Repos/el/emacs/configure
   /home/horn/Repos/el/emacs/info/dir
--8<---------------cut here---------------end--------------->8---

Then I did it on a clean checkout of the gtk repository and got this
result:

--8<---------------cut here---------------start------------->8---
Files: 4774 (VC) vs. 4774 (find)
VC) Elapsed time: 1.721054s (0.461112s in 9 GCs)
Find) Elapsed time: 0.634624s (0.152549s in 3 GCs)
Files found by VC but not by find:
Files found by find but not by VC:
nil
--8<---------------cut here---------------end--------------->8---

So here, Git has been much slower that find!

Interesting! I haven't seen that result before, but it sounds plausible. IME it's ignore rules that make 'find' work slower. Git optimizes that logic somehow. So on projects that have few ignore rules 'find' could be faster.

I've also tried the gtk repo, and the performance ratio over here is the same, although in my case 'git ls-files' here is faster than 'git ls-files' in Emacs's repo (and 'find' is twice faster still).

And again with gnulib:

--8<---------------cut here---------------start------------->8---
Files: 9936 (VC) vs. 9936 (find)
VC) Elapsed time: 3.444869s (0.902124s in 16 GCs)
Find) Elapsed time: 1.380269s (0.285082s in 5 GCs)
Files found by VC but not by find:
Files found by find but not by VC:
--8<---------------cut here---------------end--------------->8---

Again Git was slower.  What my gtk and gnulib repositories have in
common is that they are clean, i.e., no build artifacts which would be
matched by the exclude args passed to find...

gtk has only one .gitignore entry, gnulib has 8, but fairly simple ones.

So, what should we do here? Maybe:

1. Implement whitelist rules support for 'find'.

2. Add a defcustom project-vc-list-files-method? With a value 'auto' which would check the backend and Git version. Maybe the presence of 'find' as well. Other possible values would be 'find' and 'vc'.

If you have time, could you compare the performance of 'find' and 'git ls-files' in the command line? Because when simply redirecting to a file I'm seeing a different result:

$ bash -c "time git ls-files >test"

real    0m0,011s
user    0m0,005s
sys     0m0,006s

$ bash -c "time find . >test2"

real    0m0,026s
user    0m0,008s
sys     0m0,018s

That could indicate some inefficiency in processing the output in Emacs.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]