Re: [Monotone-devel] Hash collisions resiliency

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Hash collisions resiliency

From:	William Uther
Subject:	Re: [Monotone-devel] Hash collisions resiliency
Date:	Thu, 14 Apr 2005 09:49:25 +1000

On Wed, 13 Apr 2005 15:31:46 -0700, Nathan Myers <address@hidden>wrote:

On Thu, Apr 14, 2005 at 08:10:21AM +1000, William Uther wrote:

  I'm a little confused here.  People are saying that hash collisions

are hard to detect - that you'd have to do a full-text check of allthe

files.  That doesn't seem right.  Couldn't you just have a table of
hashes in the database.


No.  Suppose you add a file to your repository, and I add one to
mine.  Unbeknownst to us, they has the same.  When we reconcile our
repositories, now there are two files with the same hash.


yes, and later in my email I wrote:

   Of course, you might have a distributed hash collision - two
machines, each of which with a distinct file, but the files have the
same hash.  This would not be detected until the data was sent between
the machines, but it could be trivially detected at that point.

Of course, as I elaborate below, this detection is not as trivial as Ithought in all cases...

On Wed, 13 Apr 2005 23:39:53 +0100, Bruce Stephens<address@hidden> wrote:

William Uther <address@hidden> writes:

   Couldn't you just have a table of hashes in the database.  When
   you add a new hash, you add it to the table.  If it is already
   there, *THEN* you do a full-text check of the two files with the
   matching hashes.


Which we already have, of course (which is the whole point).  There
are several tables, but that's OK: it's not a problem if a file
version happens to have the same hash as a changeset, I think?

That's what I thought. Which is why I was surprised that people weresaying there was a problem of hash collisions.

   If the two files actually are the same, then you're ok.  If the
   two files are different, you have a problem, *and* you have
   detected it.


I suspect the problem is that things with the same hash as something
that's already there are very common: most commits change a few files,
and you know which files have changed by checking their hashes.  And
checking the contents instead is likely to be expensive enough that
you wouldn't want to do it.

When looking for changed files in your working copy you don't want tohave to look at the file contents at all in the common case. Youshould be used file timestamps - only check file contents or hasheswhen the timestamps change or you are asked to do an explicit check.

However over a network, when syncing two databases, you will have thisissue. If the hashes on two different diffs, one on each machine,happen to be the same, then the machines might believe that each ofthem has the same diff as the other simply through the exchange ofhashes without ever exchanging the diffs themselves.

I don't think this is a big issue - rsync has the same issue.Moreover, a hash collision here will lead to strange things happeningin the recently synced information - which is probably near head, andabout to be used where the corruption will be discovered. It isn'tgoing to silently corrupt old revisions which rarely get checked,right?


Cheers,

Will          :-}

--
Dr William Uther                           National ICT Australia
Phone: +61 2 9385 6357               Computer Science and Engineering
Email: address@hidden          University of New South Wales
Web: http://www.cse.unsw.edu.au/~willu/     Sydney, Australia

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Re: Hash collisions resiliency, (continued)
- Re: [Monotone-devel] Hash collisions resiliency, Richard Li, 2005/04/13
- Re: [Monotone-devel] Hash collisions resiliency, William Uther, 2005/04/13
  - Re: [Monotone-devel] Hash collisions resiliency, Nathan Myers, 2005/04/13
  - [Monotone-devel] Re: Hash collisions resiliency, Bruce Stephens, 2005/04/13
- Re: [Monotone-devel] Hash collisions resiliency, William Uther <=

Prev by Date: Re: [Monotone-devel] Hash collisions resiliency
Next by Date: Re: [Monotone-devel] Re: Doing a diff between branches
Previous by thread: [Monotone-devel] Re: Hash collisions resiliency
Next by thread: [Monotone-devel] [PATCH] read-only commands and working copy default branch
Index(es):
- Date
- Thread