[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Resizing hash tables in Guile
From: |
Joris van der Hoeven |
Subject: |
Re: Resizing hash tables in Guile |
Date: |
Thu, 13 Feb 2003 15:24:32 +0100 (MET) |
> > Regarding reshuffling time: Yes, rehuffling means that every operation
> > isn't O(1), but it *does* mean that they are O(1) on average. You can
> > understand this by a trick sometimes used in algorithm run-time
> > analysis called "amortization":
> >
> > The idea is that for every operation on the hash table you "pay" some
> > extra virtual time, and the sum of this time can then be used to
> > account for the reshuffling time. In my implementation, the hash
> > table is roughly doubled for each rehash. Before rehashing occurs,
> > you have inserted N elements. This has cost you less than cN seconds.
> > Rehashing is O(2N) = O(N), so we can say it will cost us less than dN
> > seconds. If we now pay off d seconds per operation in advance, and
> > note that the argument above holds equally well for each rehashing
> > point, we realize that each operation costs less than c + d seconds on
> > average. This means that, on average, operations are O(1).
>
> Inserts are, but lookups aren't necessarily.
Both aren't necessarily, because inserting requires looking up too.
> Lookups being O(1) requires uniformity of bucket sizes.
> Worst case hash table lookup time is still O(N).
You can also store a binary search tree in each of the buckets,
if you think that your hash function is bad.
> And good hashing functions are still hard to write.
I do not really agree. A good hash algorithm for lists (or strings),
which I use in TeXmacs, is to rotate the 32 bit integer hash values of
each of the members by a prime number like 3, 5, 7 or 11 and progressively
take the exclusive or. This seems to lead to bucket sizes as
predicted by probability theory, even for hash tables of size 2^p.
> People overestimate log(N) and overuse O(). When comparing an O(1)
> algorithm to an O(log(N)) algorithm, it really comes down to the
> actual functions involved, and actual problem size, not just the
> asymptotic behavior. 2^32 is over 4,000,000,000.
A factor 10 is still a factor 10 though.
(2^10 ~~ 1000).
> With this many
> items, log(N) is still just 32, so an O(log(N)) algorithm will still
> beat an O(1) algorithm if it's really log_2(N) vs 32.
Yes, but the O(1) is really *table lookup* multiplied by a small
constant here, so this is *fast*. You may adjust the small constant
by choosing an appropriate threshold for "size/nr buckets".
> Also, if a person's relying on O(1) for hash table performance, it might be
> not because they need that on average, but because they need an upper
> bound on the operation time, in which case automatic resizing would
> also violate this, even though it maintains O(1) on average.
This is a more serious drawback of standard hash tables, but,
as I said before, we already have garbage collection in Guile anyway...
> Trees also sort the data for you, which hash tables don't give you.
But you need a compairison operation for that,
which may be even less natural than a hash function.
> So, ideally, one would have a hash table object with & without
> resizing, and various sorts of tree (AVL, red/black, B*, etc) objects.
> insert and delete and map would be methods that work on all of the
> above, with map on trees returning the data in sorted order. For that
> matter, insert & delete might as well also work on lists...
Agreed: ideally, we have everything :^)
- Re: Efficiency and flexibility of hash-tables, (continued)
- Re: Efficiency and flexibility of hash-tables, Mikael Djurfeldt, 2003/02/10
- Re: Efficiency and flexibility of hash-tables, Mikael Djurfeldt, 2003/02/10
- Resizing hash tables in Guile, Mikael Djurfeldt, 2003/02/11
- Re: Resizing hash tables in Guile, Roland Orre, 2003/02/11
- Re: Resizing hash tables in Guile, Marius Vollmer, 2003/02/12
- Re: Resizing hash tables in Guile, Marius Vollmer, 2003/02/12
- Re: Resizing hash tables in Guile, Mikael Djurfeldt, 2003/02/12
- Re: Resizing hash tables in Guile, Roland Orre, 2003/02/12
- Re: Resizing hash tables in Guile, Mikael Djurfeldt, 2003/02/13
- Re: Resizing hash tables in Guile, Harvey J. Stein, 2003/02/13
- Re: Resizing hash tables in Guile,
Joris van der Hoeven <=
- Re: Resizing hash tables in Guile, Harvey J. Stein, 2003/02/13
- Re: Resizing hash tables in Guile, Paul Jarc, 2003/02/13
- Re: Resizing hash tables in Guile, Joris van der Hoeven, 2003/02/13
- Re: Resizing hash tables in Guile, Rob Browning, 2003/02/12
- Re: Efficiency and flexibility of hash-tables, Roland Orre, 2003/02/10
- Re: Efficiency and flexibility of hash-tables, Paul Jarc, 2003/02/12
- Re: Efficiency and flexibility of hash-tables, Roland Orre, 2003/02/12