Re: new libihash implementation

hurd-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new libihash implementation

From:	Marcus Brinkmann
Subject:	Re: new libihash implementation
Date:	Sat, 28 Feb 2004 20:50:15 +0100
User-agent:	Wanderlust/2.10.1 (Watching The Wheels) SEMI/1.14.6 (Maruoka) FLIM/1.14.6 (Marutamachi) APEL/10.6 Emacs/21.3 (i386-pc-linux-gnu) MULE/5.0 (SAKAKI)

Hi,

I really like my new ihash implementation.  This currently uses
unsigned int as key, although I guess we might need to use a uintptr_t
or whatever it's called.  Anyway, I am using a typedef so this is a
transparent change that can be done anytime.

Is anybody opposed to have the new ihash implementation in the tree?
Do I need to repost it because it not only escaped your memory by now,
but also email folders? :)

Thanks,
Marcus

At Wed, 27 Aug 2003 23:34:49 +0200,
Marcus Brinkmann wrote:
> 
> Hi,
> 
> as promised, I rewrote libihash, at least most parts of it.  This
> implementation is lightly tested, thanks to Marco Gerards, I hope more
> testing follows so that we can be confident about it.
> 
> This implementation has the following new features:
> 
> * Everything put into the hurd_ namespace.
> * Uses unsigned int (I still have to change that to uint32_t) instead of a
>   signed type.
> * Proper typedefs for all important user visible data types.
> * Keys and values are interleaved in a single array, to bring them close to
>   each other.  This should be easier on the memory cache.  Also, calloc is
>   used instead malloc with initializer loop - so we don't necessarily need
>   to touch all pages immediately (depends on calloc implementation of
>   course).
> * No memory is allocated for location pointers.  Instead, they are required
>   to be always in a constant offset from the value pointer.  This is always
>   the case in the Hurd.  Also location pointers are optional (you decide if
>   to use them an initialization/creation time).  This cuts down memory usage
>   by 33%.
> * struct hurd_ihash can now officially be inlined in other structs, and even
>   statically initialized.
> * New prime numbers generated of the form 4*i + 3.  Instead of the whacky
>   i*k mod p probing, we use quadratic probing (k +/- i^2).  Together with
>   the primes being 3 mod 4, this guarantees that all indices in the table
>   are tried.  Previously this was only guaranteed if the key was not 0 mod p.
>   By hashing 3, 6, 7, 14, 17, 34, ... you could force libihash to allocate
>   a larger hash table with every second item! although there were lots of
>   free spaces.  I think I actually saw this happen with proc (which would
>   hash all numbers 1, 2, 3, 4, etc and thus grow the hash table up to
>   whatever the largest PID is, instead of only up to the number of living
>   processes).
> * In addition, hash tables perform badly if their load factor is high.  So
>   I implemented a maximum load factor, that makes libihash to allocate a
>   bigger table if the number of used entries gets above a certain percentage.
>   The default is 80%, which is recommended by text books on hashing
>   algorithms.
> * Iterating through the hash is now done by a macro instead of a function
>   with callback function.  Callback functions are slow.  Macros are fast.
>   The macro definition is a bit obscene, but it works well ;)
> * Adding the same key twice actually does replace the old entry correctly
>   with calling the cleanup handler.  This is thoroughly broken in the
>   current implementation.
> 
> There is still room for improvement:
> * I didn't list all changes in the ihash changelog entry, but just said
>   "rewritten".  I could and probably should fill the details in, though.
> * Although _add replaces an existing entry, this is usually not needed and
>   makes _add perform badly because usually this means the cost for an
>   unsuccessful search are added.  However, to change that I would have to
>   verify carefully that no user expects this behaviour, or if they do, to
>   change user code to use a new _replace function or just to not rely on it.
>   Usually code checks with _find anyway if the key exists.
>   I verified most of the Hurd code, but not very carefully, it seems
>   feasible.  Auth might be an exception, but auth is broken anyway, for
>   example, I think it will do the wrong thing if you call authenticate with
>   the same rendezvous port twice (either in the user side or in the server
>   side).  Auth should have cleanup handlers that cancel the pending
>   authentication if an existing key is added.
> * A minimum load could be added to free memory.  If this is an improvement
>   depends on the usage pattern though.
> * An alternative hashing algorithm could use buckets for better deletion (so
>   you don't accumulate deleted entries over time).  Probably only needed if we
>   don't change _add to not replace by default.
> 
> I want to postpone this for later, because I don't want to touch auth, and
> also libports seemed peculiar (not to speak of rpctrace and the other users).
> 
> Ok to apply?  Of course we will have to wait for Roland to come back.
> 
> Thanks,
> Marcus

[Prev in Thread]

Current Thread

[Next in Thread]

Re: new libihash implementation, Marcus Brinkmann <=
- Re: new libihash implementation, Roland McGrath, 2004/02/28

Next by Date: Re: new libihash implementation
Next by thread: Re: new libihash implementation
Index(es):
- Date
- Thread