monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: results of mercurial user survey


From: Graydon Hoare
Subject: [Monotone-devel] Re: results of mercurial user survey
Date: Sat, 29 Apr 2006 10:47:51 -0700
User-agent: Thunderbird 1.5.0.2 (Windows/20060308)

Bruce Stephens wrote:

I was just doing a quick estimate, and I think it's likely that the
SHA1 and RSA cost for checking everything in the current venge.net
repository is a minute or two rather than an hour or two.

If monotone were to give up verification, then it would have to be
because that would avoid some other aspects of work: reconstructing
files, reversing deltas, or whatever.

Two points:

First, boring though it feels, please stick to using profiles; do not make up performance stories. The profiles sometimes mention SHA1, but they almost always mention things which account for a lot more than it too. Inlining opportunities, combinatorial explosions, bad buffering, pessimistic cache behavior, etc. Please stick to what the profiles tell you.

Second, there is no specific part of monotone which you can point to and say "this is where we do verification"; the concept is spread all through the program's design. And it's really not so much that we "verify"; as Nathaniel pointed out, the things specifically marked as "sanity checking" or "verifying" code rarely dominate any profiles.

However, there's a kernel of truth in here: the fact is that we "do work" in between the network and the disk. What work?

  - Selecting the right information to send.
  - Transforming from the format we store in to the format we send.
  - Transforming back to the format to store in.
  - Integrating the received information into a uniform store.

These design decisions are deeply embedded in the program. The storage format is intended not to leak out. I'm confident that we can make the existing structure a fair bit faster -- there is still a lot to tune -- but without extensive redesign there will be a limit to the speed, and it will be a lower limit than our competitors. The reason is simple: our competitors decided to use the opposite design:

  - Their transmission format is identical to their storage format.
  - Their storage units are pre-separated into bundles representing the
    types of transmission you might like to make.

These decisions mean that their networking often reduces to something like sendfile(). The decisions also imply some negatives:

  - They are forced to separate branches into separate locations, and
    cannot easily do fine-grained access control or mix branches the
    way we can.
  - By avoiding reconstruction of the storage format very often, they
    are more likely to let global or structural inconsistencies sit
    without noticing them.
  - By coupling the storage and transmission formats, they make it
    harder to adjust one without adjusting the other. We have more
    flexibility there.
  - Since we're synthesizing the storage format on the fly anyways,
    we can do things like repacking and rearranging the delta graph
    as we write.
  - Their repositories contain lots of files, typically, rather than
    our single sqlite file.

You might, by analogy, think of it as the difference between a CGI-driven website and one serving static content. Which is better? The CGI-driven site can do more stuff, and do more *detailed* stuff, because it has more logic in it. The static site can serve the fixed set of pages it has much faster. Can you make a slow CGI run faster? Often. But seldom as fast as a static site. The logic of sendfile() is hard to beat.

There is some work -- called "monotone dumb" -- to make monotone have an "externalization form" which can be retrieved at sendfile() speed. It will carry some of the same limitations of our competitors, but maybe those limitations will prove acceptable. The difficulty lies in the fact that the monotone *client* will still need to integrate the externalized information into its database. None of the normal monotone commands know how to work with such externalized forms. They all expect there to be a database. So the client will remain a bottleneck in such a situation, though only "half a bottleneck" compared to today.

-graydon





reply via email to

[Prev in Thread] Current Thread [Next in Thread]