monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: user-friendly hash formats, redux


From: Oren Ben-Kiki
Subject: Re: [Monotone-devel] Re: user-friendly hash formats, redux
Date: Sat, 4 Dec 2004 13:42:47 +0200
User-agent: KMail/1.7.1

On Saturday 04 December 2004 09:40, Nathaniel Smith wrote:
> Bibblebabble gives 8 bits/5-letter "word".  Not quite as high
> density, but pretty good.  And with tab-completion, the problem is
> more recognition than recollection, and recognition for this sort of
> id coding should be especially good.

0.02$ worth: You can nearly double the density by using phonetically 
distinct syllables: { B D F H J K L M R S T V Y } x { a e i o u } - 
things like "YeRuDa". You get 3 bits per letter (13 * 5 = 65), which is 
as good as octal coding. Plus, if you read it over the phone, you stand 
a very good chance that the other guy will not mix it up.

> > The reason this keeps coming up is that the hex codes are probably
> > the most immediately intimidating thing about Monotone.
> ...
> there is a reasonable argument to be made that pronounceable
> identifiers are significantly better wrt human factors.

As a newcomer, I must say the hexadecimal ids do look pretty scary.

> A few weeks 
> ago I was talking to Derek on IRC, and we were talking about what had
> happened in part of the revision graph, and I was getting completely
> lost trying to keep track of 3-4 different identities at once;

This is an example of needing to "say it over the phone", as it were. 
Making ids "easy to remember" is a different requirement, which is best 
solved by using words (in your favorite native language).

Given there are only 64 used syllables, you can easily set up a 
language-specific mapping from each to one of your favorite language's 
words. Say, in English, 'YeRuDa' would be 'Yes Root Dad', while in 
Hebrew it might be 'Yesh Ruach Dag' (Exists Wind Fish). This makes it 
easy to remember and is not English-centric. You could even read aloud 
your language's words to someone using a different language and he'll 
still get the syllables right. More to the point, you could read him 
the syllables in _his_ language, increasing the chances he gets it 
right.

> I haven't been making any argument for Bibblebabble, though, because
> I don't know how strong these human factor effects are.  "When the
> going gets tough, the tough get empirical" -- my plan was to put
> together a little app to test this, that people could run on
> themselves and send me the results and I could make pretty graphs and
> figure out whether I should make an argument or not.

How does such a program results tell you about how people use ids - how 
often they type, say or have to memorize one? Perhaps if you added a 
logging ability to the normal monotone commands, tracking how often 
people entered ids and at what length... you'd have to discount scripts 
somehow, and it still won't tell you about actually saying ids out 
loud.

> > I would also suggest inserting a period after the word where, at
> > the moment, the revision is uniquely identified in the current
> > state of the repository.  I expect it would usually appear after
> > the second word even in big repositories.

Here's where some theory does good. Since ids are "random", that's a 
formula depending only on the number of ids. Off the top of my head, on 
average you should need 2n bits if you have 2^n ids. (...a quick hop to 
Wikipedia at http://en.wikipedia.org/wiki/Birthday_paradox, and some 
playing with formulas...). Hmmm, it seems you might get away with a bit 
or two less.

So, suppose you have around 1K ids in your database (2^10) - reasonable 
for a small project. You need around 18-20 bits for a unique id, which 
translates to 3-4 syllables (8 characters) or two Bibblebabble words 
(10 characters).

Suppose you have 256K ids (2^18). That's huge; you probably migrated 
from sqlite to something else by then :-) At any rate, you need 34-36 
bits for a unique id, which translates to 6 syllables (12 characters) 
or 4-5 Bibblebabble words (20 - 25 characters).

Six syllables, in two groups of three, is something a human can handle. 
Not only that, translating each to a word gives you a short "story", 
which is an excellent memorization technique.

In contrast, dealing with four or five 5-letter noise words seem way too 
much. Bibblebabble just doesn't scale well.

> Huh, that's a neat idea -- nonintrusive and simple.  Should be
> relatively easy to make efficient, too, if we keep around a trie of
> existing revision ids.

Sounds neat. As a first approximation you could count the number of ids, 
double the number of bits required for one, and divide by 6 to get the 
number of syllables (rounding up). Maybe add a syllable for extra 
safety. This would give you a "pretty safe" prefix without the need for 
any additional data structures.

Have fun,

 Oren Ben-Kiki




reply via email to

[Prev in Thread] Current Thread [Next in Thread]