monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: user-friendly hash formats, redux


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Re: user-friendly hash formats, redux
Date: Sat, 4 Dec 2004 05:52:31 -0800
User-agent: Mutt/1.5.6i

On Sat, Dec 04, 2004 at 01:42:47PM +0200, Oren Ben-Kiki wrote:
> On Saturday 04 December 2004 09:40, Nathaniel Smith wrote:
> > Bibblebabble gives 8 bits/5-letter "word".  Not quite as high
> > density, but pretty good.  And with tab-completion, the problem is
> > more recognition than recollection, and recognition for this sort of
> > id coding should be especially good.
> 
> 0.02$ worth: You can nearly double the density by using phonetically 
> distinct syllables: { B D F H J K L M R S T V Y } x { a e i o u } - 
> things like "YeRuDa". You get 3 bits per letter (13 * 5 = 65), which is 
> as good as octal coding. Plus, if you read it over the phone, you stand 
> a very good chance that the other guy will not mix it up.

Ah, but you still have problems.  There are both within-language
phonetic processes: in English, "ada" and "ata" are pronounced
identically -- and cross-language issues: Japanese doesn't have a
distinct "L" and "R", to take one famous example.  I'm sure
BibbleBabble isn't perfect, but it at least tries to deal with the
worst of this problems :-).

Also, I misremembered; BB actually gets 16 bits/5-letter "word".  I
knew it corresponded nicely to chunks of hex, but forgot exactly how.
So it actually is higher density than your coding, at 3.2 bits per
letter.

But since people seem to be strangely interested in the details of
designing such a system, I'll attach the BB README file; I'm sure
improvements can be made...  (Note that there are references to and
critiques of several related systems in there.)

> > A few weeks 
> > ago I was talking to Derek on IRC, and we were talking about what had
> > happened in part of the revision graph, and I was getting completely
> > lost trying to keep track of 3-4 different identities at once;
> 
> This is an example of needing to "say it over the phone", as it were. 
> Making ids "easy to remember" is a different requirement, which is best 
> solved by using words (in your favorite native language).
>
> Given there are only 64 used syllables, you can easily set up a 
> language-specific mapping from each to one of your favorite language's 
> words. Say, in English, 'YeRuDa' would be 'Yes Root Dad', while in 
> Hebrew it might be 'Yesh Ruach Dag' (Exists Wind Fish). This makes it 
> easy to remember and is not English-centric. You could even read aloud 
> your language's words to someone using a different language and he'll 
> still get the syllables right. More to the point, you could read him 
> the syllables in _his_ language, increasing the chances he gets it 
> right.

Not exactly.  Transmission over an audio medium is one problem, sure.
But it's actually not a very difficult one --- you can use something
like the "phonetic alphabet" (Alpha/Bravo/Charlie/Delta/...) and get 3
bits per word just out of hex.  Any sort of pronounceable encoding
will work fine.

Keeping 4 different distinct ids in working memory is different --
probably much more important -- and aided by chunking of any kind.

Making up language-specific mappings as a mnemonic or as a sort of
extended standin for the phonetic alphabet seems handy, but should
work about the same whatever encoding you use.

> > I haven't been making any argument for Bibblebabble, though, because
> > I don't know how strong these human factor effects are.  "When the
> > going gets tough, the tough get empirical" -- my plan was to put
> > together a little app to test this, that people could run on
> > themselves and send me the results and I could make pretty graphs and
> > figure out whether I should make an argument or not.
> 
> How does such a program results tell you about how people use ids - how 
> often they type, say or have to memorize one? Perhaps if you added a 
> logging ability to the normal monotone commands, tracking how often 
> people entered ids and at what length... you'd have to discount scripts 
> somehow, and it still won't tell you about actually saying ids out 
> loud.

Ah, well, it doesn't.  The proper way to do this is well studied; the
field of Human-Computer Interaction is an industry.  It's easy, stick
people in an eye-tracker, and stick them down at a machine that's
recording their mouse movements and keypresses and doing screen
captures, and analyze the data in terms of various models...

The poor man's version I have in mind, though, just tests things like
recall span, recognition span, typing speed, etc. -- cognitive
processes that we can be pretty sure are important.

> > > I would also suggest inserting a period after the word where, at
> > > the moment, the revision is uniquely identified in the current
> > > state of the repository.  I expect it would usually appear after
> > > the second word even in big repositories.
> 
> Here's where some theory does good. Since ids are "random", that's a 
> formula depending only on the number of ids. Off the top of my head, on 
> average you should need 2n bits if you have 2^n ids. (...a quick hop to 
> Wikipedia at http://en.wikipedia.org/wiki/Birthday_paradox, and some 
> playing with formulas...). Hmmm, it seems you might get away with a bit 
> or two less.
> 
> So, suppose you have around 1K ids in your database (2^10) - reasonable 
> for a small project. You need around 18-20 bits for a unique id, which 
> translates to 3-4 syllables (8 characters) or two Bibblebabble words 
> (10 characters).
> 
> Suppose you have 256K ids (2^18). That's huge; you probably migrated 
> from sqlite to something else by then :-) At any rate, you need 34-36 
> bits for a unique id, which translates to 6 syllables (12 characters) 
> or 4-5 Bibblebabble words (20 - 25 characters).
> 
> Six syllables, in two groups of three, is something a human can handle. 
> Not only that, translating each to a word gives you a short "story", 
> which is an excellent memorization technique.
> 
> In contrast, dealing with four or five 5-letter noise words seem way too 
> much. Bibblebabble just doesn't scale well.

Except that I lied, and BibbleBabble gets 16 bits/word, so your
calculations suggest that 2 words (10 characters) should be enough for
just about anything :-).

> > Huh, that's a neat idea -- nonintrusive and simple.  Should be
> > relatively easy to make efficient, too, if we keep around a trie of
> > existing revision ids.
> 
> Sounds neat. As a first approximation you could count the number of ids, 
> double the number of bits required for one, and divide by 6 to get the 
> number of syllables (rounding up). Maybe add a syllable for extra 
> safety. This would give you a "pretty safe" prefix without the need for 
> any additional data structures.

Good point.

-- Nathaniel

-- 
"If you can explain how you do something, then you're very very bad at it."
  -- John Hopfield




reply via email to

[Prev in Thread] Current Thread [Next in Thread]