[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: resolving ambiguity in action stamps

From: Eric S. Raymond
Subject: Re: resolving ambiguity in action stamps
Date: Sun, 14 Sep 2014 13:12:11 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Stephen J. Turnbull <address@hidden>:
> Eric S. Raymond writes:
>  > It is still a technical fact that no git translation containing SHA1s
>  > can be built without passing through a VCS-independent representation
>  > of commit refs on the way.
> Fact!?  I would use the bzr revid, and insert the revid, SHA1 pair
> after I commit each new revision in git on Pass 1.  What am I missing?

For one thing, variant forms of commit reference.  Somewhere in there
we'll need the equivalent of a canonicalization pass for the references.

If you go the database-of-pairs route, what you're actually doing is
temporarily creating a VCS-independendent reference ID that mimics a
bzr reference number.  A subtle point, I know - but in principle
there's no actual win in the database-of-pairs that you wouldn't also
get from unique inline reference cookies generated in an intermediate

In practise, the way my toolkit works, I basically have to have
something like a revision-stamp inline in intermediate versions (that
is, the database-of-pairs approach is out) even if it's massaged into
a SHA1 in the final version.  This is because my tools are an ecology
of import-stream processors built on the assumption that the stream
captures all relevant metadata.

Your instinct may be to come back that this approach is too limiting,
but there are very good reasons for it (beginning with the cross-VCS
portability of the stream files) and 22KLOC of algorithmically dense
tool code built around those reasons.  If you want a high-quality
conversion in reasonable time rather than an open-ended R&D project,
your odds of doing better are effectively nil.

> Speaking of databases: since AFAIK you're basically done creating git
> blobs and trees (ie, except for new commits to the public repo), I
> assume you are using a pre-primed object db when you run your
> conversion?  If not, you should get a 20% speed up or so.  You might
> be able to get a lot more speed up if you could just work with bzr log
> and git filter-branch.  (That's a pretty crazy idea and quite possibly
> not at all worth the work even if possible.  But let me throw it out
> there....)

It's not crazy, but it is too much work.  I'd effectively have to throw
away the rest of my tools.
>  > > Actually, I disagree.  It would be a really good thing if they
>  > > are precise.  Do you really want to put anybody through the
>  > > trouble of translating randomized format cookies, which may point
>  > > to any of several commits, again?  Then revising their scripts
>  > > every time a new variant shows up?
>  > 
>  > It has yet to be demonstrated that this is a problem in a real use
>  > case.  And, actually, I already checked this; the Emacs history
>  > doesn't have any version-stamp collisions in actually referenced
>  > revisions.
> That's not what I'm talking about.  I'm talking about
> 2014/09/address@hidden vs. 2014-09-15/address@hidden vs.
> 9/15/2014!esr vs. ....  People *will* handwrite those references,
> precisely because they're more or less human-readable.

Engineering is tradeoffs. Readability (which is a good thing) 
comes with this price.

>  > > Existence proof comes before characterization, please.
> Ie, I suppose you don't get any collisions in referenced revisions.
> But we know that there could be.  Maybe "almost correct" is good
> enough for you, but I think Emacs deserves better from its VCS.  Worse
> is not better when best already exists.

Engineering is tradeoffs.  "Best" by what metric? Readability and 
portability are not trivial features.

One significant disadvantage of building in SHA1s that I haven't mentioned
yet is that they make references brittle. Editing metadata invalidates 
all hashes downstream of it invalid.

Yes, this is a real problem which I have experienced before in big messy 
conversions like this one!  So, we put up a brand shiny new repo - and
a few days (or weeks, or months) later someone spots a conversion bug
that has to be fixed.  

It might be easy for you to say "oh, we just regenerate all the commit
references, then".  Actually doing that is a nasty, picky job even
with best-in-class tools like mine, especially on a repo this size.

I'm not sure anyone on this list but me properly groks the complexity scale 
of this conversion wgen they talk so casually about changing how
it's done.  To get some idea, fetch 


and skim all 1018 lines of it - which doesn't count 2.5 Klines of program-
generated stuff included.

When I said this was the biggest, nastiest conversion I've ever done, I
wan't kidding.  Nothing else has even come close.
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]