monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Proposal for human readable revision IDs


From: Thomas Haas
Subject: Re: [Monotone-devel] Proposal for human readable revision IDs
Date: Tue, 06 Sep 2005 15:57:36 +0200
User-agent: Mozilla Thunderbird 1.0 (Windows/20041206)

Nathaniel

Thanks for your feedback an questions.

First, I like the idea of using hashes for identifying things like
files, manifests, and revisions. I am also fully aware of all intrinsic
and desired properties of hashes, especially the fact that to related
things, e.g. to consecutive revisions, produce completely unrelated
identifiers.

And this last property (related things resulting in unrelated
identifiers) makes identifiers difficult to use for me. It is very hard
for me browsing monotone log and understanding ancestry relationships
between ancestries easily. I always need to copy & paste revision
identifiers away and compare.

Also, I found it very hard to drill down to a single file using revision
identifiers, manifest identifiers, and finally file identifiers. The
issue is not the concept, but the fact that my brain is very bad at
comparing, remembering, and recognizing 40 byte hex strings. Tags help
to some extend, but are limited to revisions and cannot be assigned to
files or manifests.

I understand, that ambiguous identifiers can cause problems, such as the
attack you describe, but -- as you notice as well -- the problem already
exists partially due to the fact that users are using shorts in
communication. Additionally, Alice will have the same problem, if she
used tags, and some careless user not verifying the source of the tag.

So, what would I like?
I would like to be able to easily grasp, navigate, and manipulate my
repository. I believe, a (graphical) front end with easy navigation
would already solve half of the problem. I do not mind the command line
tool, but find the current implementation not pleasant to use: the
culprits are the chatty "monotone log" output and huge identifiers,
which I have a hard to time process, meaning recognize, match, and compare.

Note: All identifiers bother me, not only revisions.

Proposal:
--  I believe, I would be perfectly happy, if all identifiers get
aliased with an incremental number. The first file would be "1", the
first manifest would be "1", and the first revision would be "1" as
well. A prefix or postfix F for file, M for manifest, and R for revision
would make monotone output more readable and lead to F:1 for the first
file, M:1 for the first manifest, and R:1 for the revision. (Note: so
far I do not care about the concrete syntax F:1, 1F, F-1, F-1 so far are
all more or less fine with me.)

--  The alias is stored in the database but not shared with others. (I
can think of semi-safely sync aliases, but I am happy with pure local
aliases -- for the time being.)

--  An option (with a configurable default) shall define how monotone
prints identifiers: alias, hash, alias and hash.


--  All monotone input shall accept either the alias or the hash. If
deemed necessary, an option defines, whether aliases, hashes, or both
are accepted as input.


Regards
- tom


Nathaniel Smith wrote:
> On Tue, Sep 06, 2005 at 10:17:32AM +0200, Thomas Haas wrote:
> 
>>Although using monotone for a while, I cannot get used to easily read,
>>compare, and for sure not type monotone's revision identifiers. 40 bytes
>>of hex is too complicated for me. Also the rather long revision,
>>manifest and file identifiers use up a lot of precious real estate in
>>various outputs, such as ls, cat, or log.
> 
> 
> Thanks, we really appreciate getting feedback on things like this.
> 
> 
>>RFC 1751 (http://www.faqs.org/rfcs/rfc1751.html) describes a method for
>>representing 128 bit keys as a list of English words. While easier to
>>read, RFC 1751 does not solve the problem of real estate.
> 
> 
> It's also rather rude to use such a language-specific list.  See also:
>   http://lists.gnu.org/archive/html/monotone-devel/2004-12/txtN5iN4PaFhU.txt
> 
> 
>>Using the alphabet (A-Z, a-z) and the ten digits would reduce the real
>>estate from 40 to 27 characters.
>>
>>Additionally, or alternatively, a different, persistent representation
>>of the various identifiers could ease the use of monotone. E.g.
>>revisions could be counted as in sub-version or monotone limit itself to
>>display identifiers as short as possible, while still unique (the result
>>could be fed into monotone complete to get the full identifier).
> 
> 
> We've previously resisted using truncated identifiers in the UI,
> because of a possible attack:
>   1) Alice, a well-known and trusted developer, says "Hey, everyone,
>      rev abcdef fixes your problems, try it"
>   2) But Alice forgets to sync abcdef to the server!
>   3) Mallory notices hits, and inserts a security hole, and, since
>      abcdef is only a few bits of hash, he spends a few minutes of
>      computer time fiddling whitespace in his security hole
>      back-and-forth so that its truncated hash is also abcdef.
>   4) Mallory sneaks this version onto the server somehow (perhaps he
>      legitimately has push access, but because of his name no-one
>      trusts his certs without review).  He doesn't sign it.
>   5) Bob sees Alice's message, does a pull, checks out abcdef, and
>      happily builds and runs it and gives it to all his friends,
>      because Alice said it was awesome, and he trusts Alice.  Because
>      Alice forgot to actually push her version, he doesn't even get
>      any warning about using an ambiguous prefix.
> 
> On the other hand -- in practice, people often write truncated
> revision ids anyway, so it's arguable that this problem basically
> exists now.
> 
> Truncated hashes also raise some mildly annoying issues -- what do you
> do if some joker _does_ generate a conflict, how do you present that?
> (just one hash in a table mysteriously being longer than the rest, I
> guess?)  Does checking for ambiguity every time you write out a hash
> unduly impact performance?  And so on...
> 
> Truncated hashes do have the strong advantage over methods that they
> avoid introducing a new namespace, so that everyone has to constantly
> keep track of which namespace to use in which circumstance.
> 
> 
>>Personally, I would prefer some short, readable identifier valid only
>>for my database.
> 
> 
> What does "short, readable" mean?  Is 1932 more readable than
> 7349c052?  Or are you thinking of CVS/BK-style 23.2.17.8 type things?
> 
> How would you expect the UI to work?  In particular, how would you
> expect to match up local and global identifiers when necessary?
> Should output always include both?
> 
> Is it possible you could give more details on exactly why you want
> these things?  This is a very serious question; the more we know about
> how people are thinking about and planning to use features, the better
> we can do at designing them... is it just that you find 1932 easier to
> type than 73c9be, for instance?
> 
> -- Nathaniel
> 






reply via email to

[Prev in Thread] Current Thread [Next in Thread]