gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cvs2arch (was Re: [Gnu-arch-users] an hack.. one night long)


From: Tom Lord
Subject: Re: cvs2arch (was Re: [Gnu-arch-users] an hack.. one night long)
Date: Sat, 23 Aug 2003 10:13:05 -0700 (PDT)


    > From: wave++ <address@hidden>

    > I wrote:
    > > Another point is speed. While continuosly committing, tla slows down
    > > considerably. I know you can archive-cache, but this may be a problem 
(even
    > > space-wise).

    > I modified a bit cvs2arch to produce some stats about the conversion in
    > progress.

    >   http://www.yuv.info/~wavexx/hacks/cvs2arch-perf.png

    > I managed to place all useful informations in the same graph.
    > We have:

    >   red: the time (in milliseconds) needed to update source using cvs.
    >   green: the time needed by tla to commit the tree.
    >   blue: the size (in kilobytes) of the whole arch working tree (includes
    >         {arch} size).
    >   magenta: the size of the sources (same as blue, but without {arch}).

    > On X we have the current pachset being worked on.

    > As we see, cvs times tend to descend. This is probably thanks to
    > the reversed patch format that RCS uses. tla instead tends to
    > take a time that's roughly linear to the patchset number (not
    > really affected by the sources: note the spike at patchset #50).

Your understanding of tla performance is imperfect -- commit
operations are not _normally_ dominated by any costs that are
proportional to the patch number.  An exception: if a _bug_ (in either
cvs2arch or tla) has caused your pristine trees to not function
properly that's a different story -- and that _may_ be what we're
seeing here.

Here's my take on the graph; I see three plausible explanations as
starting points for investigation:

First: 

Looking only at the part of the graph to the right of patch 50, tla is
slowing down linearly at almost exactly the same rate as the
space-on-disk size of the tree is growing.   This strongly suggests
that tla time is dominated here by the time it takes to read two
copies of the tree off the disk.

At first I thought this explanation was implausible because although
the tree size is growing, in absolute terms, its not large.  Odds are
your cache is much larger.  But in this case -- you're alternating tla
operations with cvs operations.  Those cvs operations are, each time,
reading much of the entire history of your tree from the RCS files.
So, there's plausibly plenty of competition there for cache space.

But if that's the case, note that the inode-signature optimization
will help a lot with that -- and there's even a first cut at that
optimization in the patch queue already.


Second:

Another problem with the I/O-bound explanation, and the fact that
makes me wonder about a bug in cvs2arch or tla, is indeed the spikes
in source and tree size around patch-50.  There is no similar spike in
tla speed and tla appears to be linear from about patch-10 onward.
Now _maybe_ that's just an artifact of i/o subsystem smoothing things
out a bit -- or maybe your commits aren't able to use an up-to-date
pristine tree for some reason (and that would be a bug, either in
cvs2arch or tla).

If the commit operation isn't able to use (and then update) a pristine
tree, then it must build one from scratch on every commit.  That will,
indeed, take time proportional to the number of preceeding changes in
this case (since I presume you aren't archive caching in there or
updating a revision library).  It should not, however, be happening.
If that's what's happening, then either cvs2arch is doing something in
a dumb way, or there's a tla bug.  It would be handy if you could look
into whether or not that's the case.

(I doubt that's the case here, though.  If it were, I'd expect the
arch graph to be much _steeper_.  I wouldn't rule it out, though --
some of the effects of this kind of bug might be hidden by caching
effects.)


Third: 

Perhaps you have plenty of cache and pristines are functioning
reasonably -- tla is being dominated not by I/O, but by the user-time
costs of comparing the files.    If that's the case, here too the
inode-signature foo will fix it.




    > arch @ patch 280 is roughly 8 times slower than arch @ patch 5 in this
    > case.

Which is a good reason to suspect that the correlation with tree size
is more interesting than the correlation with patch number.   It's not
275 times slower, right?

Look at the point at which the arch time cross the "500" mark on the Y
axis.   The tree size is, what, about 900K at that point?   At the far
right of the graph, arch time is at "3250" and tree size 4600K.

So, over a couple of hundred revisions, the tree size grew steadily by
x5.1 and arch speed slowed steadily by x6.5 -- which is pretty
consistent with the I/O bottleneck that the inode hack will fix, and
that alternating cvs with tla operations will highlight (as compared
to what you might expect during normal development use).

But, again, it would be worth double checking that both your script
and tla is behaving correctly wrt. pristine trees.

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]