guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Resizing hash tables in Guile


From: Mikael Djurfeldt
Subject: Re: Resizing hash tables in Guile
Date: Thu, 13 Feb 2003 10:35:40 +0100
User-agent: Gnus/5.090008 (Oort Gnus v0.08) Emacs/21.2 (i386-pc-linux-gnu)

Roland Orre <address@hidden> writes:

> On Wed, 2003-02-12 at 18:53, Mikael Djurfeldt wrote:
>> I've been thinking that maybe we should continue the move and let the
>> resizing tables entirely replace the fixed size ones.
[...]

> I think that this is a a good idea, but I would like the optional size
> argument to still be there as an initial guess about the number of
> items to expect
[...]

Agreed.

> Of course this additional level of abstraction is nice as it also gives
> a future option to use trees if the user e.g gives an optional
> comparision operator.
>
> By the way, Joris van der Hoeven mentioned that this type of resizing
> hash tables are faster than trees, which have a time complexity
> of O(log n)+reshuffling time if they are balanced. Do you have any
> numbers showing how much faster and if possible if there are any
> conditions? The reshuffling time will grow at least O(n) when the
> size of our linear tables increases if I have understood right.

I doubt that we'll ever use trees under the hash table API, the reason
being that the user expects O(1) for hash table operations.  Even
though O(log n) grows slowly, it is substantially different for large
enough data sets.

Regarding reshuffling time: Yes, rehuffling means that every operation
isn't O(1), but it *does* mean that they are O(1) on average.  You can
understand this by a trick sometimes used in algorithm run-time
analysis called "amortization":

The idea is that for every operation on the hash table you "pay" some
extra virtual time, and the sum of this time can then be used to
account for the reshuffling time.  In my implementation, the hash
table is roughly doubled for each rehash.  Before rehashing occurs,
you have inserted N elements.  This has cost you less than cN seconds.
Rehashing is O(2N) = O(N), so we can say it will cost us less than dN
seconds.  If we now pay off d seconds per operation in advance, and
note that the argument above holds equally well for each rehashing
point, we realize that each operation costs less than c + d seconds on
average.  This means that, on average, operations are O(1).

Mikael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]