guix-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Outreachy] 'guix git log' slowness?


From: zimoun
Subject: Re: [Outreachy] 'guix git log' slowness?
Date: Fri, 29 Jan 2021 01:04:12 +0100

Hi Magali,

(It is a slightly edited version I sent you.  Since the aim of Outreachy
is also to interact with Community, let enjoy the French proverb: «more
crazy people we are, more fun we have». :-))

On Thu, 28 Jan 2021 at 00:53, Magali <magalilemes00@gmail.com> wrote:

> Another thing is that the command is a bit slower than 'git log' itself.
> Thoughts on how that could be improved?

The command is “slow”.  A first quick analysis about the meaning of “slow”.

Basically, I have run twice:

--8<---------------cut here---------------start------------->8---
for n in 60000 10000 5000 1000 500 100 50 10 5 1;
do
   time ./pre-inst-env guix git log --oneline \
       | head -n $n > /dev/null ;
done
--8<---------------cut here---------------end--------------->8---

to have kind of warm cache.  And again for the equivalent Git command.
Then, bit of Emacs edit processing to transform the output in the buffer
of ’M-x shell’ to something in the Python file:

tguix = np.array([2.871, …
tgit  = np.array([0.013, …

(Well, to be correct, it is not twice but a couple of times to have an
average.)

Let normalize by removing the additive constant and run a classic
linear regression:

  t ~ B n ^ a => log(t) ~ a log(n) + b

where ’a’ and ’b’ have to be estimated.

Well, I am surprised by ’a ~ 0.5’ which means that “guix git log” runs
with a sublinear time complexity.  However, “git log” is linear.  Maybe
I am doing something wrong.

Initially I thought the tree was badly walked but a quick:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix git log --channel-cache-path
guix
  
/home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
$ git -C 
/home/simon/.cache/guix/checkouts/pjmkglp4t7znuugeurpurzikxq3tnlaywmisyr27shj7apsnalwq
 \
      log --oneline | wc -l
72791
$ ./pre-inst-env guix git log --oneline | wc -l
72791
--8<---------------cut here---------------end--------------->8---

shows it is correct.  Hum?!  Well, I suspect noise on the data and the
normalization is bad here.  Running the experiment in a batch of 10
times then averaging them should give an analysis more meaningful.  Hey,
that’s a quick one. :-)

The conclusion here, it scales well enough… for now.

Therefore, the real the question is about the «additive constant».  On
my machine, it is ~2.9s and this is where there is room of improvement,
I guess.

I exported ’get-commits’ to have it in the REPL and I did:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix repl
GNU Guile 3.0.5
Copyright (C) 1995-2021 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guix-user)> ,use(guix scripts git log)
scheme@(guix-user)> (define (compute) (begin (get-commits) 'ok))
scheme@(guix-user)> ,time (compute)
$1 = ok
;; 2.533936s real time, 3.099156s run time.  0.901027s spent in GC.
scheme@(guix-user)>
--8<---------------cut here---------------end--------------->8---

And I let you run “,profile (compute)”.  Well, this function should be
optimized.  IMHO.

Initially, I thought about “stream” but I do no think it is the issue
here.  Well, I think that the ’repo’ is open at each commit when
folding.  Instead, it should be open before, keep alive and close at the
end of the ’fold’.  Somehow.  WDYT?

I will give a closer look to ’commit-closure’ because I am not convinced
it is useful here, neither.


Bonus, attached the Python script and the plot. :-)


Cheers,
simon

# coding: utf-8


#
import numpy as np
import matplotlib.pyplot as plt

n = np.array([1., 5., 10., 50., 100., 500., 1000., 5000., 10000., 60000.])
tguix = np.array([2.871, 3.006, 2.987, 2.892, 2.991, 3.038, 3.088, 3.315, 
3.565, 5.634])
tgit  = np.array([0.013, 0.006, 0.013, 0.016, 0.014, 0.029, 0.047, 0.12, 0.233, 
1.283])

normalize = lambda x: x - x[0]
logify    = lambda x: np.log10(x[3:])

x = logify(n)
y = logify(normalize(tguix))
z = logify(normalize(tgit))

X = np.concatenate((x.reshape((len(x), 1)), np.ones((len(x), 1))), axis=1)

guix, res, rank, s = np.linalg.lstsq(X, y,  rcond=None)
git,  res, rank, s = np.linalg.lstsq(X, z, rcond=None)

predict = lambda sol: sol[0] * x + sol[1]

plt.plot(x, y, 'bo', label='guix')
plt.plot(x, z, 'ro', label='git')
plt.plot(x, predict(guix), 'bx-')
plt.plot(x, predict(git),  'rx-')
plt.legend(loc='lower right')
plt.xlabel("log(N) where N in `| head -n N`")
plt.ylabel("log(real time)")
plt.show()

Attachment: perf.png
Description: scale.png


reply via email to

[Prev in Thread] Current Thread [Next in Thread]