[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new libihash implementation
From: |
Marcus Brinkmann |
Subject: |
Re: new libihash implementation |
Date: |
Sat, 28 Feb 2004 20:50:15 +0100 |
User-agent: |
Wanderlust/2.10.1 (Watching The Wheels) SEMI/1.14.6 (Maruoka) FLIM/1.14.6 (Marutamachi) APEL/10.6 Emacs/21.3 (i386-pc-linux-gnu) MULE/5.0 (SAKAKI) |
Hi,
I really like my new ihash implementation. This currently uses
unsigned int as key, although I guess we might need to use a uintptr_t
or whatever it's called. Anyway, I am using a typedef so this is a
transparent change that can be done anytime.
Is anybody opposed to have the new ihash implementation in the tree?
Do I need to repost it because it not only escaped your memory by now,
but also email folders? :)
Thanks,
Marcus
At Wed, 27 Aug 2003 23:34:49 +0200,
Marcus Brinkmann wrote:
>
> Hi,
>
> as promised, I rewrote libihash, at least most parts of it. This
> implementation is lightly tested, thanks to Marco Gerards, I hope more
> testing follows so that we can be confident about it.
>
> This implementation has the following new features:
>
> * Everything put into the hurd_ namespace.
> * Uses unsigned int (I still have to change that to uint32_t) instead of a
> signed type.
> * Proper typedefs for all important user visible data types.
> * Keys and values are interleaved in a single array, to bring them close to
> each other. This should be easier on the memory cache. Also, calloc is
> used instead malloc with initializer loop - so we don't necessarily need
> to touch all pages immediately (depends on calloc implementation of
> course).
> * No memory is allocated for location pointers. Instead, they are required
> to be always in a constant offset from the value pointer. This is always
> the case in the Hurd. Also location pointers are optional (you decide if
> to use them an initialization/creation time). This cuts down memory usage
> by 33%.
> * struct hurd_ihash can now officially be inlined in other structs, and even
> statically initialized.
> * New prime numbers generated of the form 4*i + 3. Instead of the whacky
> i*k mod p probing, we use quadratic probing (k +/- i^2). Together with
> the primes being 3 mod 4, this guarantees that all indices in the table
> are tried. Previously this was only guaranteed if the key was not 0 mod p.
> By hashing 3, 6, 7, 14, 17, 34, ... you could force libihash to allocate
> a larger hash table with every second item! although there were lots of
> free spaces. I think I actually saw this happen with proc (which would
> hash all numbers 1, 2, 3, 4, etc and thus grow the hash table up to
> whatever the largest PID is, instead of only up to the number of living
> processes).
> * In addition, hash tables perform badly if their load factor is high. So
> I implemented a maximum load factor, that makes libihash to allocate a
> bigger table if the number of used entries gets above a certain percentage.
> The default is 80%, which is recommended by text books on hashing
> algorithms.
> * Iterating through the hash is now done by a macro instead of a function
> with callback function. Callback functions are slow. Macros are fast.
> The macro definition is a bit obscene, but it works well ;)
> * Adding the same key twice actually does replace the old entry correctly
> with calling the cleanup handler. This is thoroughly broken in the
> current implementation.
>
> There is still room for improvement:
> * I didn't list all changes in the ihash changelog entry, but just said
> "rewritten". I could and probably should fill the details in, though.
> * Although _add replaces an existing entry, this is usually not needed and
> makes _add perform badly because usually this means the cost for an
> unsuccessful search are added. However, to change that I would have to
> verify carefully that no user expects this behaviour, or if they do, to
> change user code to use a new _replace function or just to not rely on it.
> Usually code checks with _find anyway if the key exists.
> I verified most of the Hurd code, but not very carefully, it seems
> feasible. Auth might be an exception, but auth is broken anyway, for
> example, I think it will do the wrong thing if you call authenticate with
> the same rendezvous port twice (either in the user side or in the server
> side). Auth should have cleanup handlers that cancel the pending
> authentication if an existing key is added.
> * A minimum load could be added to free memory. If this is an improvement
> depends on the usage pattern though.
> * An alternative hashing algorithm could use buckets for better deletion (so
> you don't accumulate deleted entries over time). Probably only needed if we
> don't change _add to not replace by default.
>
> I want to postpone this for later, because I don't want to touch auth, and
> also libports seemed peculiar (not to speak of rpctrace and the other users).
>
> Ok to apply? Of course we will have to wait for Roland to come back.
>
> Thanks,
> Marcus
- Re: new libihash implementation,
Marcus Brinkmann <=