monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Question on layering


From: William Uther
Subject: Re: [Monotone-devel] Question on layering
Date: Thu, 22 Feb 2007 09:12:36 +1100

I'm also going to re-order things a little :)

On 21/02/2007, at 8:28 PM, Nathaniel Smith wrote:

On Wed, Feb 21, 2007 at 09:42:42AM +1100, William Uther wrote:
It was to get around a firewall that I was using the ssh:// based
syncing (until I found out is isn't recommended for general use).

Eh, it should work fine (modulo bugs! :-)), it's just not as scalable
as the "real" server mode.

  2) it is less scalable, because pushing requires a write lock on the
     database (while the netsync server is happy to simultaneously
     multiplex arbitrarily many readers and writers).  Removing this
     restriction would be awesome for all sorts of reasons, but I have
     _no_ clue how to do it.

What happens at the moment when you're running mtn serve on a db _and_ using it to check out, etc? What happens at the moment if you mtn serve something twice? (I just started two servers on the same db and they both started - didn't try to sync to them.) Is this a case of "don't do that"?

My next thought is to write something like sshns:// (ssh+netsync) where it uses ssh to tunnel to the remote machine, and then connects to a netsync server already running. It would be really cool if ssh would commit these changes:

http://www.25thandclement.com/~william/projects/streamlocal.html

Then you could start a single netsync server on a unix domain socket, and forward to it. Let ssh/unix handle all the auth.

  i) anonymous pull over http(s).

Not understanding it very well,
I think there is a problem in that 'pull' isn't integrated into the
normal client

The main hurdle to pushing this into the mainline binary is that
all HTTP and SSL libraries suck :-(.  Curl seems the best, and last I
checked it couldn't even do pipelining.  (And don't get me started on
SSL libraries...)

It might be worth just picking one and using it. The best is the enemy of the good?

  iii) get ssh:// access working as a 1st class system.
  iii) I think part of what is needed here is testing.  I'll try to
write some test cases.  I was thinking of implementing an "sh://"
protocol, which is just like "ssh://" but local.  That should be
possible to write tests for.

Do you mean like the file:// protocol that we already have? :-) (It
just spawns a child mtn process and talks to it over stdio.)

er, yeah.  Like that :).

There are basically two ways that ssh:// is not first-class:
  1) it might have bugs.  If you can trigger them reproducibly, then
     awesome :-).  (Though like I think I mentioned, it would probably
     be easier to just rewrite things.)

At the moment netsync is working reliably, so I'm assuming the problem is with the piping to a child process on windows.

Features for interacting with current hosting services:

  iv) A format for storing a monotone repository in a subversion
server.

I think you're mixing up two distinct things here -- mirroring between
a mtn history and a svn history, and using an svn server as a generic
file store for something like mtn-dumb.

Yes. Deliberately. Imagine you want to use sourceforge as a host for a monotone repository. You're going to want to have both. Although that does suggest a solution:

Use the hypothetical mtn_svn to sync to a local monotone db on sourceforge, and then use a hook there to use n.v.m.dumb to push back into another branch of the same repository.

Then you can use a normal svn client on the svn branch, and a normal mtn client on the mtn branch (to check out at least).

So likely both parts of this have to be external.  This is probably
not _too_ hard to do, though.

Not highest on my list of priorities though. It's just a "that would be nice". It would help migrate open source projects to mtn - you could use standard hosting services.

  vi) You need partial pull as a transparent cache.

Hrm, simple things first :-).  Getting partial pull to work at _all_
is highly non-trivial, and requires giving up some nice properties of
the existing design... (in particular, for the first time ever, we
will be receiving data we cannot exhaustively consistency-check.
Sadness.)

The design on the wiki is a local cache, except it's one that you have
to explicitly backfill if you want it.  (Many common operations will
happen to touch history farther back, which makes implicit backfilling
quite problematic -- every time the user hit 'log' or 'annotate' or
whatever, we'd have to go load the whole history over the network.)
So the idea is that commands stop when they hit your local history
horizon, but you can always push your horizon back farther if you want
to, at least as far back as the horizon on the server you are talking
to.

I guess I was assuming that writing it as a cache might be easier than the horizon stuff. It doesn't change the logic, it just changes where you get your data from. If it isn't local, then grab it over the network. I would expect log and annotate to hit the network and be ugly. That is normal for a centralised VCS (which you're emulating here). As long as they grab the data in order and output as they grab, so that when the user has gone back as far as they need, they can stop the process and not have grabbed extra data, then everything should be ok. In that case it would be just like any centralised system - you have to have a net connection for log.

But I certainly bow to greater wisdom here. :)

  vii) convenience commands.  mtn clone == mtn pull --partial && mtn
co (and puts the partial repository in the _MTN directory of the
working copy).  mtn pcmp == mtn pull && mtn commit && mtn merge &&
mtn push.  mtn pu == mtn pull && mtn up
vii) All of these can be done pretty quickly (even as local scripts).

I am doubtful that we will ever make partial pull the default.  In
most cases you don't need it, and making it _not_ the default gives a
level of robustness that... umm... basically no other computer program
I have ever used had.  We _likes_ our reliability around here, we
does.

hrm. I like your reliability too. Hence why I'm here rather than annoying the DARCS people :).

However, an mtn working copy is 17M, and my mtn.db (with n.v.m.* in it) is 112M. You don't want to be hitting new users with 112M when they could be downloading 17M. Most clone commands are going to be first time users who want to try the bleeding edge source, not developers who are going to want the whole history.

How about making --partial the default for pulls and clones, and full transfer the default for syncs. That captures the both the "I'm just testing bleeding edge" and "I'm actually a developer" use cases. You clone the first time and get a partial db. The first time you sync, you get the rest.

I'm not sure we can get an ideal UI for clone ATM -- the problem is
that somehow you need to specify both a branch (to check out) and a
branch pattern (for future pulls -- people really should be using
'foo.bar*' style patterns in the vast majority of cases), and that
makes everything ugly.  I _hope_ that policy branches will help with
this, we'll have to see, and they're a ways off in any case...

I don't grok 'policy branches' yet. I was assuming you'd just do a pull on the one branch. If you wanted you could add a * to the end to get the pattern. :)

I was going to start with iii), v) and vii).

Cool :-).  Let us know if you have any more questions... we try to be
pretty friendly, and the mailing list and IRC are both good ways to
chat.

You've been very friendly so far.  Is good :).

Another suggestion: In the log, when a revision is a merge, if either of the parent revisions is from a different branch, then tell me which branch it is from.

Finally, is there a good place to list these suggestions? I looked on the wiki for a page with "suggest" in the title, but couldn't find one.

Cheers,

Will       :-}






reply via email to

[Prev in Thread] Current Thread [Next in Thread]