[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] Hash collisions resiliency
From: |
William Uther |
Subject: |
Re: [Monotone-devel] Hash collisions resiliency |
Date: |
Thu, 14 Apr 2005 09:49:25 +1000 |
On Wed, 13 Apr 2005 15:31:46 -0700, Nathan Myers <address@hidden>
wrote:
On Thu, Apr 14, 2005 at 08:10:21AM +1000, William Uther wrote:
I'm a little confused here. People are saying that hash collisions
are hard to detect - that you'd have to do a full-text check of all
the
files. That doesn't seem right. Couldn't you just have a table of
hashes in the database.
No. Suppose you add a file to your repository, and I add one to
mine. Unbeknownst to us, they has the same. When we reconcile our
repositories, now there are two files with the same hash.
yes, and later in my email I wrote:
Of course, you might have a distributed hash collision - two
machines, each of which with a distinct file, but the files have the
same hash. This would not be detected until the data was sent between
the machines, but it could be trivially detected at that point.
Of course, as I elaborate below, this detection is not as trivial as I
thought in all cases...
On Wed, 13 Apr 2005 23:39:53 +0100, Bruce Stephens
<address@hidden> wrote:
William Uther <address@hidden> writes:
Couldn't you just have a table of hashes in the database. When
you add a new hash, you add it to the table. If it is already
there, *THEN* you do a full-text check of the two files with the
matching hashes.
Which we already have, of course (which is the whole point). There
are several tables, but that's OK: it's not a problem if a file
version happens to have the same hash as a changeset, I think?
That's what I thought. Which is why I was surprised that people were
saying there was a problem of hash collisions.
If the two files actually are the same, then you're ok. If the
two files are different, you have a problem, *and* you have
detected it.
I suspect the problem is that things with the same hash as something
that's already there are very common: most commits change a few files,
and you know which files have changed by checking their hashes. And
checking the contents instead is likely to be expensive enough that
you wouldn't want to do it.
When looking for changed files in your working copy you don't want to
have to look at the file contents at all in the common case. You
should be used file timestamps - only check file contents or hashes
when the timestamps change or you are asked to do an explicit check.
However over a network, when syncing two databases, you will have this
issue. If the hashes on two different diffs, one on each machine,
happen to be the same, then the machines might believe that each of
them has the same diff as the other simply through the exchange of
hashes without ever exchanging the diffs themselves.
I don't think this is a big issue - rsync has the same issue.
Moreover, a hash collision here will lead to strange things happening
in the recently synced information - which is probably near head, and
about to be used where the corruption will be discovered. It isn't
going to silently corrupt old revisions which rarely get checked,
right?
Cheers,
Will :-}
--
Dr William Uther National ICT Australia
Phone: +61 2 9385 6357 Computer Science and Engineering
Email: address@hidden University of New South Wales
Web: http://www.cse.unsw.edu.au/~willu/ Sydney, Australia
- [Monotone-devel] Re: Hash collisions resiliency, (continued)
- [Monotone-devel] Re: Hash collisions resiliency, Peter Simons, 2005/04/15
- Re: [Monotone-devel] Re: Hash collisions resiliency, Jon Bright, 2005/04/15
- Re: [Monotone-devel] Re: Hash collisions resiliency, Nathan Myers, 2005/04/15
- [Monotone-devel] Re: Hash collisions resiliency, Peter Simons, 2005/04/15
- Re: [Monotone-devel] Re: Hash collisions resiliency, Nathaniel Smith, 2005/04/15
- Re: [Monotone-devel] Re: Hash collisions resiliency, Nathaniel Smith, 2005/04/16
Re: [Monotone-devel] Hash collisions resiliency, Richard Li, 2005/04/13
Re: [Monotone-devel] Hash collisions resiliency, William Uther, 2005/04/13
Re: [Monotone-devel] Hash collisions resiliency,
William Uther <=