monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Question on layering


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Question on layering
Date: Wed, 21 Feb 2007 01:28:30 -0800
User-agent: Mutt/1.5.13 (2006-08-11)

On Wed, Feb 21, 2007 at 09:42:42AM +1100, William Uther wrote:
> It was to get around a firewall that I was using the ssh:// based  
> syncing (until I found out is isn't recommended for general use).

Eh, it should work fine (modulo bugs! :-)), it's just not as scalable
as the "real" server mode.

> Here are some features I'd like to see.  I know I'm new here, so feel  
> free to ignore these comments :), but then again maybe a new user  
> perspective is useful.  I'll try to scratch some of these itches  
> myself :).  I'll try and scratch them myself :).  Possible  
> implementations are discussed at the end.
> 
> Features I'd like to see for setting up your own repos without lots  
> of admin help:

Re-ordering your email for convenience in replying :-):

>   i) anonymous pull over http(s).  By http(s), I'm referring to an  
> already established server.  i.e. I should be able to stick a  
> directory of "monotone files" somewhere already published, and people  
> can pull from it (efficiently).  I don't want to have to plug in to  
> apache, or start a monotone-specific server.  The "pull from http"  
> needs to be in the standard client.  The "export to directory as  
> repository" needn't be in the standard client.
>
>   i) "serve over http".  It seems like this is pretty close to  
> happening.  There is the n.v.m.dumb stuff.  This doesn't have a good  
> readme at the moment.  I've had a brief glance, but I haven't spent  
> the time to figure it out properly.  Not understanding it very well,  
> I think there is a problem in that 'pull' isn't integrated into the  
> normal client, and isn't in a form where it could easily be  
> integrated.  Good proof of concept though.

The main hurdle to pushing this into the mainline binary is that
all HTTP and SSL libraries suck :-(.  Curl seems the best, and last I
checked it couldn't even do pipelining.  (And don't get me started on
SSL libraries...)

Ironically, pushing is actually easier -- the sftp protocol is
well-enough designed that it's easy to just talk to an sftp server
directly.

The rest is just SMOP stuff.

>   ii) anonymous push over email attachments.  Something like "mtn  
> push-email repos > attachment".  Then you email the attachment.  The  
> receiver uses "mtn read < attachment".  The push-email command needs  
> to be in the standard client.  The mtn doing the read will probably  
> be the server from i (doesn't need to be the standard client).
>   ii) "mtn push-email" should figure out what to push just like a  
> normal push, but then write the packets to std-out rather than  
> netsync.  That doesn't seem too hard (he says waving hands and not  
> knowing the code at all :) ).

Yes, this would be great, and you're right, it should be perfectly
straightforward.  (The hardest bit would be that you have to break the
netsync protocol a bit, because ATM there's no way to _tell_ the
server "hey, I'm just going to pretend to push but not actually,
okay?".  But netsync is going to break a few times between now and 1.0
anyway, just to get it into a long-term maintainable state, so
*shrug*.)

>   iii) get ssh:// access working as a 1st class system.
>   iii) I think part of what is needed here is testing.  I'll try to  
> write some test cases.  I was thinking of implementing an "sh://"  
> protocol, which is just like "ssh://" but local.  That should be  
> possible to write tests for.

Do you mean like the file:// protocol that we already have? :-) (It
just spawns a child mtn process and talks to it over stdio.)

There are basically two ways that ssh:// is not first-class:
  1) it might have bugs.  If you can trigger them reproducibly, then
     awesome :-).  (Though like I think I mentioned, it would probably
     be easier to just rewrite things.)
  2) it is less scalable, because pushing requires a write lock on the
     database (while the netsync server is happy to simultaneously
     multiplex arbitrarily many readers and writers).  Removing this
     restriction would be awesome for all sorts of reasons, but I have
     _no_ clue how to do it.

> Features for interacting with current hosting services:
> 
>   iv) A format for storing a monotone repository in a subversion  
> server.  This could just be i) as a first pass, but in general I'd  
> like to be able to do a full sync with just the normal mtn client  
> (possibly only when configured to link against the svn client libs).   
> Bonus points if you can do it so that ordinary svn clients still work.
>   iv) A normal subversion server has a fixed directory layout with  
> "branches/", "tags/" and "trunk/".  If you link with the svn  
> libraries, then you could use them to access the server, add another  
> directory there, "mtn/".  That would hold a n.v.m.dumb tree.  The  
> tricky part is also looking for changes in trunk/ since the last  
> change to mtn/ and moving them across into the mtn/ repository.  It  
> wouldn't be too hard in a special program, like mtn_cvs, but you'd  
> really want it in the normal client so that changes would be synced  
> with every sync :).

I think you're mixing up two distinct things here -- mirroring between
a mtn history and a svn history, and using an svn server as a generic
file store for something like mtn-dumb.

Talking to a subversion server in interesting ways _probably_ requires
linking to the svn library stack.  This is very problematic; the svn
libraries have a really hairy dependency chain a mile deep.  Also,
their license is not GPL compatible.  (It is basically "BSD plus some
trivial obnoxious restrictions including a pseudo-advertising
clause".  It would be really nice if they changed this, and I think
they may even do copyright assignments to make that possible, but I'm
not holding my breath.)

So likely both parts of this have to be external.  This is probably
not _too_ hard to do, though.

>   v) mtn_cvs :)
>   v) well, that's already going really well. :)  Just needs to be  
> tested and merged.  I've already got some extra test cases here.   
> I'll post them in a bit.

Yep :-).

> Other things that would have put me off monotone if I wasn't a VCS  
> junkie:
> 
>   vi) You need partial pull.  This should be seamless.  By that I  
> mean that the 'partial' repository should have a pointer to another  
> 'parent' repos, and so on.  When one db doesn't have the data you  
> need, it should seamlessly fall back to the parent, and the parent's  
> parent, etc.  Think of the repository as a local cache, rather than a  
> full repository.  In fact, it would be nice if it were possible to  
> _optionally_ drop data from the local cache if there are newer  
> revisions and it hasn't been accessed recently.  It should also be  
> possible to say "always push data to my parent rather than me".   
> Finally, you might even want to specify different parents for  
> different branches.
>   vi) There is a partial-pull branch, but it looks like it has just  
> started.  It also seems from the wiki that the concept is more  
> "partial-pull once and lose history" rather than "use a local db as a  
> cache for a remote db", but I may have misunderstood.  I prefer the  
> "hierarchy of caches" approach.

Hrm, simple things first :-).  Getting partial pull to work at _all_
is highly non-trivial, and requires giving up some nice properties of
the existing design... (in particular, for the first time ever, we
will be receiving data we cannot exhaustively consistency-check.
Sadness.)

The design on the wiki is a local cache, except it's one that you have
to explicitly backfill if you want it.  (Many common operations will
happen to touch history farther back, which makes implicit backfilling
quite problematic -- every time the user hit 'log' or 'annotate' or
whatever, we'd have to go load the whole history over the network.)
So the idea is that commands stop when they hit your local history
horizon, but you can always push your horizon back farther if you want
to, at least as far back as the horizon on the server you are talking
to.

A separate issue is the convenience of auto-syncing at various times
(e.g., pushing at commit time).  There was a little discussion of this
at the summit, but it was inconclusive; I still tend to think it'd be
a nice bit of UI, just need to iron out details.  (How to detect/react
gracefully when the network is down, etc.)

>   vii) convenience commands.  mtn clone == mtn pull --partial && mtn  
> co (and puts the partial repository in the _MTN directory of the  
> working copy).  mtn pcmp == mtn pull && mtn commit && mtn merge &&  
> mtn push.  mtn pu == mtn pull && mtn up
>   vii) All of these can be done pretty quickly (even as local scripts).

I am doubtful that we will ever make partial pull the default.  In
most cases you don't need it, and making it _not_ the default gives a
level of robustness that... umm... basically no other computer program
I have ever used had.  We _likes_ our reliability around here, we
does.

I'm not sure we can get an ideal UI for clone ATM -- the problem is
that somehow you need to specify both a branch (to check out) and a
branch pattern (for future pulls -- people really should be using
'foo.bar*' style patterns in the vast majority of cases), and that
makes everything ugly.  I _hope_ that policy branches will help with
this, we'll have to see, and they're a ways off in any case...

"pu" would be covered automatically if we started doing automatic
syncs at strategic times, this being a natural strategic time :-).

> I was going to start with iii), v) and vii).

Cool :-).  Let us know if you have any more questions... we try to be
pretty friendly, and the mailing list and IRC are both good ways to
chat.

-- Nathaniel

-- 
"Of course, the entire effort is to put oneself
 Outside the ordinary range
 Of what are called statistics."
  -- Stephan Spender




reply via email to

[Prev in Thread] Current Thread [Next in Thread]