TRAMP problem with large repositories

Hello,

Sorry if this is not the right place to post, feel free to redirect me as needed.

While helping someone for a projectile issue (https://github.com/bbatsov/projectile/issues/1480), it seems that when `shell-command-to-string` tries to execute `git ls-files -zco --exclude-standard` over TRAMP on a repository that has 85K files it takes forever to complete.

Here's a stacktrace:

https://user-images.githubusercontent.com/81829/70549675-72b07f00-1b29-11ea-90f6-91fe0c36b0f4.png

We see that `tramp-wait-for-output` calls `tramp-wait-for-regexp` which calls `tramp-check-for-regexp`, and when looking at the source:

(defun tramp-wait-for-output (proc &optional timeout)
  "Wait for output from remote command."
  (unless (buffer-live-p (process-buffer proc))
    (delete-process proc)
    (tramp-error proc 'file-error "Process `%s' not available, try again" proc))
  (with-current-buffer (process-buffer proc)
    (let* (;; Initially, `tramp-end-of-output' is "#$ ".  There might
	   ;; be leading escape sequences, which must be ignored.
	   ;; Busyboxes built with the EDITING_ASK_TERMINAL config
	   ;; option send also escape sequences, which must be
	   ;; ignored.
	   (regexp (format "[^#$\n]*%s\\(%s\\)?\r?$"
			   (regexp-quote tramp-end-of-output)
			   tramp-device-escape-sequence-regexp))
	   ;; Sometimes, the commands do not return a newline but a
	   ;; null byte before the shell prompt, for example "git
	   ;; ls-files -c -z ...".
	   (regexp1 (format "\\(^\\|\000\\)%s" regexp))
	   (found (tramp-wait-for-regexp proc timeout regexp1)))
      .... snip ...

My understanding is that it does a loop that reads a bit of what the commands outputs then tries to parse end of lines (or '\0') and repeats until the process died or that it found one. Because the command returns a huge string (85K files), this process of read-regexp-repeat takes all the CPU (compared to reading the whole chunk in one go and then trying to check for the regexp).

My questions are the following:

Did I understand the problem right? Is this something known?
Is there something to be done about this? Or maybe it would it require too much refactoring / faster implementation?

Kind regards,

Philippe

From:	Philippe Vaucher
Subject:	TRAMP problem with large repositories
Date:	Wed, 11 Dec 2019 20:46:11 +0100