monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Cvssync status


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Cvssync status
Date: Wed, 21 Sep 2005 03:59:00 -0700
User-agent: Mutt/1.5.9i

On Mon, Sep 19, 2005 at 01:35:07PM +0200, Christof Petig wrote:
> Nathaniel Smith schrieb:
> > On Sun, Sep 18, 2005 at 01:23:07AM +0200, Christof Petig wrote:
> > 
> >>So I would consider it ready for inclusion into mainline (to get more
> >>testing by other people)
> 
> It had been ready for review.
> 
> > Cool!  Looking over the code:
> > 
> > = General issues =
> > 
> > What are these files?:
> >  - WhyThisWay
> >  - CVS_prot
> >  - pipe_test_client.c
> >  - tests.cvs/*
> 
> These should remain in the side branch (deleted in mainline). I deleted
> WhyThisWay, pipe_test_client.c was a planned pipe test for Win32 and was
> not yet used :-(, CVS_prot still holds some notes needed for branch
> support implementation. And tests.cvs predates tests/t_*.at . I deleted
> all but CVS_prot and pipe_test_client.c .
> 
> > What does the cvs_admin command do?
> 
> ... looking into the code ...
> 
> monotone cvs_admin manifest <revision>
> 
> gives you each CVS revision and path name of a manifest. [I mostly used
> this for debugging, it's handy to have. But if you think it unnecessary
> I will drop it (or otherwise document it).]

Hmm, nothing wrong with having debugging commands, but should probably
give _some_ description somewhere, and put it in the "debug" section
of help output.  Maybe name it something that will better indicate to
average end users that they don't want it, like "cvssync_debug" or
something.

> > Can you give more details on how you reconstruct whole tree-states
> > from CVS's file-state information?  I ask in particular because I
> > notice code that looks like the old cvs importer, but the cvs importer
> > was recently rewritten because the tree state reconstruction stuff
> > turned out to be fundamentally broken.  (Graydon is the person to talk
> > to for details.)
> 
> I will document it, but I did only take cvs_import as a design pattern,
> my code is not identical.

Cool.

> > == Formatting ==
> > 
> > Please format code appropriatly (= GNU style).  Really.  I know it
> > seems trivial, and I hate to be annoying about it.  But it's
> > important, and it _is_ trivial, so I hope this isn't a big deal.
> > Code on mainline needs to be easily maintainable by anyone.  
> 
> I agree, but changing this by hand seems a dull task. Can you recommend
> a program to do this task? [I will see if astyle can convert it easily.]

I don't know anything other than astyle or indent.  Sorry.

> > = cvs_client.cc =
> > There are some commented out headers in cvs_client.cc that can
> > probably be deleted, as well.
> > 
> > Why do cvs_client::writestr() (and friends) use zlib directly, instead
> > of via Botan?
> 
> I need _stream_ compression and decompression (for -zN accessing cvs)
> and did not find that possible with botan. I'd like to get proven wrong
> here.

> > I am dubious about the workaround in cvs_client.cc:push_back2insert_cl
> > where you make a struct with members marked mutable to avoid some
> > warning from the compiler.  Are you sure you're not just silencing the
> > warning while preserving the brokenness the compiler was trying to
> > warn you against?  What exactly is going on here?
> 
> For the record: I was trying to avoid a temporary variable for the
> adaptor (which mandates specifying the exact type of the adaptor) by
> passing a temporary adaptor (which manipulates a non temporary
> container)). So I cheated by making a const object (which gives no
> warning on passing by reference to strtok) manipulate the mutable
> reference. ["Use a hack in _one_ place to make it more easy to use this
> adaptor"]
> 
> I decided to give up tricking here by adding another temporary in one place.

Okay.

It looks like this stringtok thing should just be designed to take an
output iterator, and then you could pass a std::inserter or a
std::back_inserter depending on whether it should write into a set or
a sequence...

> > 
> > +// the creator function (so you don't need to specify the type
> > ^^ this comment is missing a close paren
> > 
> > +// "  AA  " s=2 e=3
> > ^^ this comment could probably be more helpful to the naive reader
> > :-).
> 
> better this way?
> 
> // "  AA  " gives start=2 end=3, so substr(2,1) is correct

Thanks.

> > Why are some methods in cvs_client capitalized and some not?
> 
> Because I like creativity ;-) ?  No good reason I suspect.

Hmm.  Creativity is bad. ;-)

> > Your putenv/getenv stuff in timezone2time_t does not look portable.
> > It is simply not true that WIN32 has putenv while everywhere else has
> > setenv...
> 
> I agree. But making this function portable to platforms I don't know
> anything about was beyond my interest (to be honest). All I want to do
> here is convert a broken down timestamp in UTC (!!!) to a time_t.
> Perhaps boost::date can offer something here? (I did not find anything)
> 
> To be honest I was glad when this function worked on Win32 and Linux
> (but has unwanted side effects on Win32 [it does not restore the
> originial TZ]).

Ah, there's a much better trick -- take a time_t, run gmtime, run
mktime, compare the resulting time_t to the original time_t.  This
gives you the timezone offset; just apply this correction factor to
the result of running mktime on your original time.

Example code, with more detailed description:
  http://lists.debian.org/deity/2002/04/msg00082.html

> > In monname2month, would it have been so horrible to just have a 12-way
> > if (x == "Jan") ... else if (x == "Feb") ... etc.? :-)
> 
> No, but it's faster this way ;-)

Err... is this a bottleneck, then? :-)

> > Generally, it'd be nice if there were a few more comments in
> > cvs_client.cc explaining what's going on, for anyone trying to find
> > their way around that has spent less time staring at the CVS protocol
> > docs (such as they are).  Again a question of, mainline code is public
> > code that needs to be publically maintainable...
> 
> Agreed. But writing comments to an imaginative audience is no fun
> (especially if you have to redesign some times because of CVS server flaws).

Yah.  Hopefully you don't expect to have to redesign again now?

> > This is especially important for any needed quirks/workarounds that
> > were only discovered through testing against real servers; e.g.,
> > there's a comment:
> > +// cater for encountered bugs ...
> > +// actually this code should not be necessary with the current options
> > +// without -C some revisions interacted with each other badly
> > Reading this, I have no idea what it means, except I'm scared to touch
> > the next code, because I have no idea how to even tell whether my
> > changes broke anything.
> 
> The CVS server has some flaws! Each revision has it's own set. What
> started as a nice idea ended in a nightmare. Some commands work as
> advertized on 1.11, some on 1.12. I can not fix every running CVS server
> in the world so I had to get creative.

Yeah, I understand this part.  It's what would make me terrified to
touch the code, because all I know is that there are mysterious Weird
Bugs out there that are being worked around, but I have no idea what
they actually are, and thus can't tell what parts of the code are
workaround, which are themselves bugs, etc.

The only way that I know to deal with this sort of situation is to
build up communal knowledge -- every time someone finds a weirdness,
write it down, so the next guy will know too.  That way, while you
never really know whether your code is correct, you at least know that
each version is _more_ correct than anything that came before it,
because you can see that you still handle all the old cases, plus the
new one.

> An especially the fact that the CVS server was designed to handle
> exactly one command per connection (gasp!) is deadly to anyone trying to
> contact it efficiently.

Oh, ew.  How does one do anything efficient at all, then?

> > This file really needs a big comment at the top giving a rough
> > overview of how things work.  Similar to the ones in, e.g., paths.hh,
> > or netsync.cc, etc.
> > 
> > = cvs_repository.cc =
> > 
> > This file also needs at least one big huge comment explaining how it
> > all works.

> > +#ifdef WIN32
> > +#define sleep(x) _sleep(x)
> > +#endif
> > ^^ Ick.  If this is a problem, put a sleep() in platform.hh.  New code
> > should, basically, never ever say "#ifdef WIN32".
> 
> This damned Win32 platform compatibility has caused more than half of
> the work on this branch. I'm not sure I want to invest more to make a
> platform work that I don't need at all.

Well, that makes sense... though this particular fix is pretty
trivial.

In the broad scheme of things, I'm not quite sure what to say.  I
don't really want to drop more things on our few valiant win32
maintainers; they already do a heroic enough job keeping things
running at all... but perhaps someone else will find this
functionality intriguing enough to fix it up.

In any case, basically, it's fine if the cross-platform support is a
bit shaky for an experimental feature, but this will probably have to
be fixed up at some point before it becomes "non-experimental"...

> > +{ // I assume that at least TAB is uncommon in path names - even on Windows
> > +  std::string content=host+":"+root+"\t"+module+"\n";
> > ^^ not quite sure what this means, but it makes me nervous.  Can you
> > elaborate?
> 
> Yes, I used to separate root and module by a space. Space in pathnames
> is a common disease on windows :-(. Since ; : etc are not uncommon in
> strange path names either I decided that nobody in his sane mind would
> use tabs in CVS path names and used that as the new separator.

Oh, okay.  So, making sure I understand here, the idea is -- we have
these certs that describe how to match up a given monotone revision
with a CVS revision.  These certs are primarily for machine reading,
but all else being equal, it'd be nice if humans could read it too.

Therefore, \t is chosen as a delimiter.  If a CVS repo directory (or
module name or branch name, but I doubt(?) this is possible) ever
contains a tab, then we will dump some corrupted data that we
cannot read?  (What is 'host', by the way?  Could it ever contain a
colon, say if we were talking to the CVS repo via ssh?)

I'm a little leery of the guess-and-pray method of lossless string
quoting.  The safest character to use as a split delimiter is \0 --
though, of course, that breaks textual display.  Your other usual
options are length-prefixing (e.g., djb's "netstrings"), or some sort
of proper quoting -- maybe basic_io would be the best thing to use
here, it has proper quoting already.  It's also _much_ easier to
figure out from looking at -- I can't figure out what the full
cert_cvs() format is from looking at the code, never mind looking at
some arbitrary strings strung together by tabs...

But I guess you're worried about size?  If so, probably worth running
the numbers to see what the overhead would be...


Is there any checking in the code for the possibility of multiple
conflicting cvs certs?  Is there a test to make sure that this case is
properly detected and handled/reported?

> > + // because some manifests might have been absolute (not delta encoded)
> > +  // we possibly did not notice removes. check for them
> > ^^ what is this about?  (whether a manifest is delta-encoded or not
> > should be completely hidden anywhere except database.cc, should it?)
> 
> This talks about CVS manifests. And since CVS manifests are stored in
> certificates

Ah.  Missed that.  It's a bit confusing ATM...

> > Why are all of cvs_edge's fields marked 'mutable'?  There are hardly
> > _any_ good reasons to use 'mutable', certainly none that don't also
> > have explanatary comments...
> 
> Will have to take a look ...

Thanks.

> > = options.hh =
> > 
> > +#define OPT_SINCE 99 // use a custom number ...
> > ^^ huh?
> 
> I had to change this number too often on merges, so I decided to take
> one from an unused domain. Should get 37 once included. Or better can we
> already reserve a number for netsync in mainline ;-) ?

Ah, of course.  I'd rather not go editing mainline right now to
reserve a spot, it adds non-local state and easy enough to fix up
after cvssync is merged in...

> > = netxx_pipe.cc =
> > 
> > Would it make more sense to add these into netxx itself?  (E.g., make
> > Probe itself handle pipes?)  Normally modifying included libraries is
> > a bit distasteful, but netxx is orphaned upstream and we already have
> > some local enhancements included.
> 
> So pushing it into netxx seems logical. But PipeCompatibleProbe is
> designed to be minimally invasive, so inclusion might recommend a
> rewrite of both (which is beyond my current interest).

Nod.

> > We already require a unix shell environment on windows for building
> > and testing, so your test can assume the existence of "ls" there.
> > 
> > Perhaps it would be better to use 'cat' (with no args) as the program
> > spawned for the test, then you could test writing to stdin, reading
> > from stdout, probe blocking until a timeout when there is no output,
> > maybe others?
> 
> Perhaps the above mentioned pipe_test_client.c program is a good idea
> since it can test for binary transparency (i fear problems with line
> endings), input output dependance etc. I simply ran out of motivation
> for the Win32 platform.

I don't understand why you can't do just as much with 'cat' -- see if
it losslessly pases \n, \r\n, \r?

Such a test would be just as important on POSIX as Win32, really.

> > (And maybe also "echo", to test command line passing to the spawned
> > program?)
> > 
> > Your command-line quoting for win32 is wrong (as your comment notes).
> > win32/process.cc already has correct win32 quoting code, duplicating
> > this finicky task is quite silly, we should find a way not to do that.
> 
> I did not want to make netxx/* candidate code rely on win32/*. But this
> might be a better idea.

Yeah, I'm not quite sure what the best layout would be, but it can't
be that hard to find some sort of reasonable one...

> > = Overall =
> > 
> > This is a big chunk of code, something like 10% of the total size of
> > monotone, and it attempts a very complex and error-prone task, that if
> > it goes wrong can corrupt people's production CVS repos and confuse
> > users utterly.  Merging it to mainline means the development team as a
> > whole assuming a responsibility to support and maintain it, so this is
> > a big risk to take.
> 
> The worst problems I have encountered in daily work is that cvs_pull was
> unable to fetch a revision from the CVS server because of some protocol
> oddity. Theoretically monotone could go amok and commit unwanted (or
> broken) revisions (or insane changelogs) but it never touches the CVS
> files directly, so this should be reversible.

Yes.  But IMO we need to be _extremely_ cautious here.  There is no
sin worse than losing/corrupting data, and many users are not
prepared/capable of backing out unwanted changes to CVS, even if they
are reversible in principle.  (I know _I_ wouldn't feel very
comfortable doing that.)

Or if that's not convincing, consider that cvssync's selling point is
sneaking monotone into groups from the grassroots, and dumping a
bunch of bad data into the shared repo would give all the other
non-monotone-using developers a _great_ first impression of monotone's
reliability etc....

> >   3) understandable code -- this mostly means enough commentary and
> >      explanation in the code that we can figure out roughly what's
> >      going on.
> >   4) really good docs, clearly laying out the use cases, relation
> >      between cvs_import and cvssync, limitations of cvssync, big fat
> >      EXPERIMENTAL: MAY EAT HISTORY warnings all over the place until
> >      well battle-tested, etc.
> 
> I thought that the docs should be in a good shape for a starting point.
> This was the reason for my post (I should have cleaned up the code
> first, though)

Yes, definitely; and appreciated -- as I mentioned later, just think
could use a bit more expansion to make sure we answer all of people's
questions.

> >   5) a good set of test cases, covering (at least) all the major
> >      possibilities
> 
> This is nearly impossible since most problems arise against different
> (buggy) versions of CVS. And writing a CVS bug emulator is not appealing.

Erm, yeah, I can see that.  People will run the test suite with
different local versions of CVS installed, though, at least...

> > Not all of this needs to happen before it gets into mainline for
> > early-adopter testing.  (1) and (2) are necessities.  (4) doesn't have
> > to be perfect, but I think needs a bit more work (I know I _will_ have
> > to field support questions on this code, even if marked EXPERIMENTAL
> > and with you doing all actual hacking...).  (3) and (5) are not as
> > critical so long as the code is marked experimental, but do need to be
> > fixed sooner or later (preferably sooner, to make the code more
> > accessible to others wanting to improve it).  All code in mainline
> > needs to either fulfill (3) and (5) eventually or be removed, as a
> > rule of thumb, and until cvssync meets this I personally wouldn't feel
> > confident in committing to support it in the long term.

> > Overall, while I've been focusing a lot on negative things, I also
> > want to say thanks a lot for working on this!  It's clearly a huge
> > investment of effort, and your dedication is really impressive, and
> > appreciated.  Now let's see if we can figure out how to get it over
> > this hump...
> 
> Thank you for such a thorough review of my code (before I had reviewed
> it <blush>). I hope we can address the problems, most likely some of the
> code has to get restructured once git + other VCS synchronization go
> mainline (SVN seems most interesting next).

Yes, interesting point... maybe we should get the current stuff stable
before tackling that ;-).

> Can we please agree to tackle win32 after inclusion in mainline? I don't
> want to spend hours on that again.

Okay.  In that EXPERIMENTAL warning we need to add, need to note that
support may be variable on different platforms.

> I will contact Graydon after cleaning up and commenting the code. This
> way it is easier to talk about file-state -> tree-state conversion. We
> might also want to share some test code. Note that I currently ignore
> tags (I planned to separate CVS tags from monotone revisions) and side
> branches.

Okay.

Cheers,
-- Nathaniel

-- 
So let us espouse a less contested notion of truth and falsehood, even
if it is philosophically debatable (if we listen to philosophers, we
must debate everything, and there would be no end to the discussion).
  -- Serendipities, Umberto Eco




reply via email to

[Prev in Thread] Current Thread [Next in Thread]