[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: hash resizing bug
From: |
Jim Meyering |
Subject: |
Re: hash resizing bug |
Date: |
Wed, 17 Jun 2009 23:24:11 +0200 |
Eric Blake wrote:
> According to Eric Blake on 6/17/2009 12:21 PM:
>> Before the rehash, about 20% of the buckets were empty. But after changing
>> the
>> prime factor, the hasher function scatters better, and _every single bucket_
>> is
>> occupied. Which means ALL subsequent calls to hash_insert will trigger an
>> overflow, rather than inserting into a bucket head. And since the resize
>> checking logic in hash_rehash is only triggered after insertion into a bucket
>> head, the hash table has effectively been locked into a fixed size that will
>> never grow again, no matter how many hash_insert calls are made, even though
>> the table is certainly exceeding the requested threshold of 20% empty
>> buckets.
>> Ouch.
>
> I'm now playing with this simple code motion patch, also available at:
> http://repo.or.cz/w/gnulib/ericb.git
> $ git pull git://repo.or.cz/gnulib/ericb.git master
>
> --
> Don't work too hard, make some time for fun as well!
>
> Eric Blake address@hidden
>>From d3edc0d1e302af613e8b56396f1f9d0bec15a264 Mon Sep 17 00:00:00 2001
> From: Eric Blake <address@hidden>
> Date: Wed, 17 Jun 2009 13:01:41 -0600
> Subject: [PATCH] hash: check for resize before insertion
>
> * lib/hash.c (hash_insert): Check whether bucket usage exceeds
> threshold before insertion, so that a pathological hash_rehash
> that fills every bucket can still trigger another rehash.
>
> Signed-off-by: Eric Blake <address@hidden>
> ---
> ChangeLog | 5 +++++
> lib/hash.c | 52 ++++++++++++++++++++++++++--------------------------
> 2 files changed, 31 insertions(+), 26 deletions(-)
>
> diff --git a/ChangeLog b/ChangeLog
> index 1f57b7e..38c1d45 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,5 +1,10 @@
> 2009-06-17 Eric Blake <address@hidden>
>
> + hash: check for resize before insertion
> + * lib/hash.c (hash_insert): Check whether bucket usage exceeds
> + threshold before insertion, so that a pathological hash_rehash
> + that fills every bucket can still trigger another rehash.
> +
> hash: manage bucket overflows more efficiently
> * lib/hash.c [USE_OBSTACK]: Delete; the optimizations provided by
> an obstack have been folded in to mainline code.
> diff --git a/lib/hash.c b/lib/hash.c
> index 9d78990..e4310b5 100644
> --- a/lib/hash.c
> +++ b/lib/hash.c
> @@ -999,32 +999,6 @@ hash_insert (Hash_table *table, const void *entry)
> if ((data = hash_find_entry (table, entry, &bucket, false)) != NULL)
> return data;
>
> - /* ENTRY is not matched, it should be inserted. */
> -
> - if (bucket->data)
> - {
> - struct hash_entry *new_entry = allocate_entry (table);
> -
> - if (new_entry == NULL)
> - return NULL;
> - if (table->overflow_size <= new_entry - table->overflow)
> - abort ();
> -
> - /* Add ENTRY in the overflow of the bucket. */
> -
> - new_entry->data = (void *) entry;
> - new_entry->next = bucket->next;
> - bucket->next = new_entry - table->overflow;
> - table->n_entries++;
> - return (void *) entry;
> - }
> -
> - /* Add ENTRY right in the bucket head. */
> -
> - bucket->data = (void *) entry;
> - table->n_entries++;
> - table->n_buckets_used++;
> -
> /* If the growth threshold of the buckets in use has been reached, increase
> the table size and rehash. There's no point in checking the number of
> entries: if the hashing function is ill-conditioned, rehashing is not
> @@ -1057,6 +1031,32 @@ hash_insert (Hash_table *table, const void *entry)
> }
> }
You can't use a pre-rehash "bucket" here (after rehash),
since it is no longer valid.
Considering the cost of rehashing (and its relative infrequency),
one more hash_find_entry won't even show up on the radar.
> + /* ENTRY is not matched, it should be inserted. */
> +
> + if (bucket->data)
> + {
> + struct hash_entry *new_entry = allocate_entry (table);
> +
> + if (new_entry == NULL)
> + return NULL;
> + if (table->overflow_size <= new_entry - table->overflow)
> + abort ();
> +
> + /* Add ENTRY in the overflow of the bucket. */
> +
> + new_entry->data = (void *) entry;
> + new_entry->next = bucket->next;
> + bucket->next = new_entry - table->overflow;
> + table->n_entries++;
> + return (void *) entry;
> + }
> +
> + /* Add ENTRY right in the bucket head. */
> +
> + bucket->data = (void *) entry;
> + table->n_entries++;
> + table->n_buckets_used++;
> +
> return (void *) entry;
> }
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, (continued)
- Re: [PATCH] hash: declare some functions with the warn_unused_result attribute, Jim Meyering, 2009/06/17
- hash_rehash allocatino failure (was: [PATCH] hash: declare some functions with the warn_unused_result attribute), Eric Blake, 2009/06/17
- Re: hash_rehash allocatino failure, Jim Meyering, 2009/06/17
- Re: hash_rehash allocatino failure, Eric Blake, 2009/06/18
- Re: hash_rehash allocation failure, Eric Blake, 2009/06/19
- hash resizing bug (was: [PATCH] hash: declare some functions with the warn_unused_result attribute), Eric Blake, 2009/06/17
- Re: hash resizing bug, Eric Blake, 2009/06/17
- Re: hash resizing bug, Eric Blake, 2009/06/17
- Re: hash resizing bug, Jim Meyering, 2009/06/17
- Re: hash resizing bug, Eric Blake, 2009/06/17
- Re: hash resizing bug,
Jim Meyering <=
- Re: hash resizing bug, Eric Blake, 2009/06/17
- Re: hash resizing bug, Jim Meyering, 2009/06/18
- Re: hash resizing bug, Jim Meyering, 2009/06/17
- Re: hash resizing bug, Eric Blake, 2009/06/18
- Re: hash resizing bug, Jim Meyering, 2009/06/18
- Re: hash resizing bug, Jim Meyering, 2009/06/18
- Re: hash resizing bug, Jim Meyering, 2009/06/18
- Re: hash resizing bug, Eric Blake, 2009/06/18
- Re: hash resizing bug, Jim Meyering, 2009/06/18
- Re: hash resizing bug, Eric Blake, 2009/06/18