Re: Revision control

bug-hurd
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Revision control

From:	Arne Babenhauserheide
Subject:	Re: Revision control
Date:	Sat, 28 Jun 2008 12:25:36 +0200
User-agent:	KMail/1.9.9
Am Samstag 28 Juni 2008 04:24:50 schrieb olafBuddenhagen@gmx.net:
> Well, as you seem to have used Mercurial much more and longer, that
> isn't really surprising :-) If I learned Mercurial now, I'd most likely
> hold exactly the opposite opinion... That doesn't really tell much.

Please tell me how it works for you, once you have to use it. 

It would be interesting for me to see your experience from the Git side of 
DVCSs. 

(yes I know that sounded silly ;) )

> Honestly, how often did you actually need to GC Git repositories so
> far?... :-)

I didn't yet have to, but what I heared from other people is about once a 
month for a halfways active project, and some claim to do it once a day. 

It would be interesting to have stats about garbage collecting the linux 
kernel. 

> Actually, this is much easier with a sanitized history. It's always
> clear when and why a change was introduced -- 

It is only clear that it happened during "implementing feature x", but not 
that it happened during "hacking that damn network table - quick fix to get 
it working". 

> while otherwise, with 
> omission and later amends, mistakes and later reverts, prototypes and
> later cleanups, experiments and later switching approches, it is much
> harder to make sense of it. It's just useless data getting in the way.

You just let your system search through a batch changesets at once, then it 
looks the same for the reviewers as modified history, but it offers more 
information if you need it. 

But while we're at it: I really like the approach to apply one clear summary 
to a set of changes, and that wasn't easy in Mercurial till now. 

So I took a dive into hacking Mercurial for the first time yesterday. 

It took me some time to get familiar with the code structure, but after some 
dabbling and discussing in IRC to refine my concept, I came up with the group 
extension: 

-> http://selenic.com/pipermail/mercurial/2008-June/019884.html

It still is only a proof of concept, but it works for me and supplies the same 
functionality (hiding away steps which aren't useful for most people) while 
preserving history. 

What it does: 
- Enable me to put changesets into a group. 
- Hide all grouped changesets from the log and show their groups instead. 
- Only look at groups. 
- Also group groups. 

I can now say 
hg commit -m "A" 
-> rev 1
hg commit -m "B"
-> rev 2
hg commit -m "C"
-> rev 3
hg commit -m "undo B"
-> rev 4

hg group -m "AC" 1 2 3 4


And instead of many small commit messages which can't convey a bigger picture, 
people can see all the changes I grouped with one clear description. 

On the long run the concept can allow groups to be overridden, removed, 
modified and inspected (show all contained changesets). 

You got me quite enthusiastic with the concept of better readable history :) 

And I was happy to see that writing Mercurial extensions (which can change 
every aspect of the user interaction) isn't that hard. 

More exactly: I could do it, so it can't be that hard ;-) 

> I don't think that when Linus designed Git, he was thinking "it must be
> easy to change history". Rather, I suspect that the flexible design of
> Git just made changing history easy as a byproduct, along with many
> other things considered uncommon up till then. 

Which is something I am really glad for (but it could also be connected to Git 
being decentral, advanced _and_ free software, so people knew they could 
dabble). 

> Well, such a discussion will always tend to touch more general question.
> However, when I was pointing out that it doesn't matter for the Hurd
> repository, I was obviously not trying to make a statement about GC in
> general, but to come back to the actual topic at hand :-)

Then thanks for that. 

Words often miss the intentions and backgrouns, even though those are often he 
most important part of a message :-)

I miss Telepathy for that :) 

> By requireing to learn more stuff from the beginning, Git on the other
> hand makes every user into an expert -- thus indirectly helps to use the
> tool really efficiently...

Or makes users frustrated or afraid to touch anything they don't know. 

This is what often happens when people work on a system where they don't 
understand how their commands affect the system. 

--- short philosophical insertion ---
System here is "the changes", not "the storage". The changes are what you get 
as result. The storage is just the technical solution and only matters, where 
it affects (means: limits or forces to modify) interactions with the changes. 

 o <- me
 |      <- my interactions
[-] <- changes
 |      <- interactions of changes and storage
[-] <- storage

The storage only is of interest, where it changes my and others interactions 
with the changes. (yes, I actually drew this on paper to visualize what 
version tracking systems offer me, so I could clear up my thoughts :) )

Naturally, it does necessarily affect them at every stage, but that should be 
limited to speed of actions (and that also as little as possible), and it 
should not force me to do a certain action - but that's only my personal 
philosophy. Since yours is different, I assume you get a different 
conclusion -> again different priorities, and again, that is a good thing I 
think :) 
We just have to be aware of our different priorities - and that is one of 
results a discussion can always bring, and I think it's worth more than most 
people think. 
--- end philosophical insertion ---

Mercurial makes reading up on stuff very easy, so it isn't an uphill battle, 
but rather the joy of learning easily (mainly due to the very good hgbook). 
Instead of "bah, I don't know how to do this", it rather is "cool, I can do 
that, too!"

And that's where the benefits are, and they are pretty clear, I think. 

And you only need the basic usage in most cases, so you almost only need to 
read up on things, if you really try something new. 

You can't know every nuance from the beginning without losing a vast amount of 
time. 

--- I don't know how much the following applies to Hurd contributors. It's one 
of my general reasons to prefer Mercurial ---

I learned about that on much lower scale from my wife who works in a bank. 
Some of her collegues are simply computer DAUs (dumbest thinkable user - 
dümmster anzunehmender User), who only do exactly what they were told to do, 
because they fear that they could break anything if they do something wrong. 

I know this sound like encouraging to "force every user to learn", but rather 
the opposite is true: As soon as the basic usage is (or just feels) so 
complicated that a user gives up on understanding it, trying to "force people 
to become experts" makes them DAUs instead, who don't feel comfortable in 
their own environment. 

And everyone is conservative or afraid to fiddle somewhere, and it can hit you 
in the damnest of areas when you just have a bad day when starting to learn 
it. Having a lowest possible frustration potential helps reducing the chance 
of it, though. 

> the other variant. Or perhaps it is considered less annoying as a
> default action. (Commiting too much by accident if "-a" was default
> would be worse than the command simply failing if forgetting the "-a" as
> it is now. It's actually something that bothered me with CVS quite
> often.)

At least for me, having to add the -a wouldn't help me remembering. 

After having adde it about 20 times, my hands would do it automatically 
without me even noticing it. 

And different from Mercurial I'd have to listen to the Git people 
saying "well, you added the -a switch, so it's your problem; we worked to 
avoid it", but sadly not in a way which works with humans as I know them :) 

> I am sure that with most *any* kind of workflow, "-a" is more often used
> than not. So it is *not* optimizing for a particular workflow. 

It is optimizing, just not done consciously. 
And optimizing for a workflow not used by most people. 

> Rather, 
> it's simply a manifestation of Git's interface being very direct;
> avoiding abstraction where it's not important to have it. It's a
> manifestation, in fact, of *not* optimizing for specific workflows.

I would argue against that :) 

With Mercurial I have good aliases to start with, and when I happen to work at 
the computer of a collegue, I can use the same commands instantly, and the 
same goes for him working on my computer. 

What I think about is: Where most workflows differ, it's useful to tell the 
user to just create his own aliases. 

But in most places the workflows will just be very similar or the same, or 
there will be a few competing workflows which a huge percentage of developers 
use. 

And consciously offering precreated aliases for efficiently working in these 
workflows leads to good usability and a tool you can learn very quickly. 

Git on the other hand also has these easy commands, but they don't look 
consciously added to me, but historically grown. 

> > But checkout doesn't work the same in subversion, which is what I was
> > used to before switching to Mercurial.

> I really don't care about Subversion. I can see some little merit in
> trying to be similar to CVS, because that's the least common
> denominator, what has been used for ages, what almost everybody knows.
> Subversion on the other hand is nothing else but just the least useful
> one among the newer systems.

While this isn't really the point here, I want to object. 

Subversion might not be as advanced as Git or Mercurial, but it is the system 
many large software projects switched to, and were very happy with, and it 
seems good enough that it is easy to switch onwards from it. 

I like subversion, even though I definitely prefer Mercurial and Git (I 
wouldn't use subversion at home, but it seems a huge step ahead of cvs). 

> > "svn checkout" gets a repository onto your disk (like "git clone"/"hg
> > clone") and "svn update" updates the data (like "git checkout"/"hg
> > update").
> >
> > But "svn update" and "cvs update" both update the working directory.
>
> And what about switching branches? In CVS, this is usually done with
> "cvs update" as well, although the action is actually more similar to an
> initial checkout...

hg update <branch>

the same as 

hg update <tag>

and 

hg update <revision number, hex or short>

They update the working repository to some specific state (and in the case of 
a branch that state just didn't only differ linearly but went "sideways", 
too). 

> "git-checkout" requires getting used to, but it's a fact that it is more
> consistent and logical, and helps in the long run. Which IMHO is true
> for many things in the Git UI.

In that aspect it seems similar to "hg update", but "git checkout" has some 
nasty pitfalls. One of them already ate several hours of my time - I already 
talked about that one in here.

> > How is $ git checkout
> >
> > the same as $ git checkout .
> >
> > The former says what I changed, the latter gives me changes I pulled
> > beforehand.
>
> Actually, they do *almost* the same: Both check out files from the
> repository to the working copy, but abort and warn if there are local
> changes. The only difference is handling of files missing in the working
> copy: The first treats them as changes and fails, while the second just
> checks them out.

> It happens that in your specific situation, because of this slight
> difference, the one command failed with a warning, while the other did
> work. That doesn't make them fundamentally different actions.

What happened was: 
"git checkout" worked quite well. It didn't warn or anything. It just 
said "there are missing files, sucker." 

And it does the same for changed files, etc. 

"git checkout ." on the other hand gave me back the files. 

Technically these two might be quite similar, but from the user interaction 
standpoint, they are vastly different. 

> > If a tool doesn't feel familiar after 15 min (or rather after some
> > hours), then I will have to wrap myself around the tool, and it is
> > likely to be inefficient on the long run, even though I might not even
> > notice it anymore, because I got used to it.
>
> This is a baseless claim. 

It is based on my personal experience. 
And please mind the "likely" :) 

I don't claim that it has to be inefficient on the long run. 

> "Ich weiß nicht, ob es besser wird, wenn es anders wird, ich weiß nur,
> dass es anderes werden muss, wenn es besser werden soll." -- Georg
> Christoph Lichtenberg

That's what I added with the second part: If it proves to be bad on the long 
term, it isn't good either. But a tool should manage both, else it's very 
likely that it skews my perseption. 

And not every different is better, though it's almost always useful to try 
different things to find out, where to change next. 

And that citation also says, you should try Mercurial :) 

> I doubt the others on this list are still reading this thread :-)

*gg* 

> > To finish it, I created a small side by side comparision of Git and
> > Mercurial which I hope I managed to keep neutral.
>
> [...]
>
> >     Hg      |       vs              |       Git
>
> [...]
>
> >     +       | documentation |
>
> I don't agree here. IMHO the Git documentation is perfectly good.

Then this is likely skewed by my own experience. Let's add it to 
the "different opinion" list :) 

> >             = Usability =
> >
> >     Hg      |       vs              |       Git

[...]

> >     +       | just works    |
>
> I don't fully agree on your conclusions, but more importantly, I don't
> agree at all to what constitutes "usability" in your list :-)

Could you create a different list, so we can merge (or get two lists for 
different sets or priorities)? 

> And what is "just works" supposed to mean, anyways?...

That means: it didn't yet surprise me with unexpected behaviour, which is 
something I also heared from many other Mercurial users. 

> The main committers over the past few years have been Thomas Schwinge
> and Samuel Thibault.
>
> Thomas set up the Git-based wiki, so I guess his preference is clear :-)
>
> I asked Samuel now, and he said he doesn't care, though on further
> questioning he admitted a preference for Mercurial.
>
> Neal Walfield has been commiting a lot recently in the hurd-l4 module.
> As this is totally seperate though, there is no reason why it must use
> the same VCS... (I have no idea about his preference.)
>
> We also have the GSoC students presently, which are devided as well.
>
> So this leaves us still in the same place I fear...

Damn! :-) 

But at least, we are now better informed, why we are in this place ;-)

*gg*

Aside from being funny, this makes it easier to know, when we can safely move 
on. 

I can't think of a useful finish right now, so I'll just leave it at this... 

Or, maybe this one fits: When free tools which programmers use get spread 
enough (and there isn't something in place to stop people from contributing), 
it will almost always come down to philosophy and basic concepts in the end, 
because everything else can be fixed by throwing person-hours at it :) 

Maybe next someone will invent incremental garbage collection for git (can be 
done at each pull), just like Mercurial gets simpler rebasing at the moment, 
and step by step the projects will push each other onward. I hope they will 
exist side by side for a long time, so they can surpass any proprietary VCS 
in every aspect. 

Having friendly competition can be a huge gain for a project (if it's 
friendly), since ideas in each project also bear fruits in the other one. :) 

Best wishes, 
Arne
-- 
Unpolitisch sein
Heißt politisch sein
Ohne es zu merken. 
- Arne Babenhauserheide ( http://draketo.de )

-- Weblog: http://blog.draketo.de
-- Infinite Hands: http://infinite-hands.draketo.de - singing a part of the 
history of free software. 
-- Ein Würfel System: http://1w6.org - einfach saubere (Rollenspiel-) Regeln

-- Mein öffentlicher Schlüssel (PGP/GnuPG): 
http://draketo.de/inhalt/ich/pubkey.txt
signature.asc
Description: This is a digitally signed message part.
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Revision control, (continued)
Prev by Date: Re: GSoC: the plan for the project network virtualization
Next by Date: Re: PyHurd 0.0.0a3
Previous by thread: Re: Revision control
Next by thread: Re: Revision control
Index(es):
- Date
- Thread