bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Adding a real HashTable implementation to gnulib


From: Bruno Haible
Subject: Re: [RFC] Adding a real HashTable implementation to gnulib
Date: Sun, 02 Dec 2018 14:41:01 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-138-generic; KDE/5.18.0; x86_64; ; )

Hi,

Darshit Shah wrote:
> I recently tried to use the hash table implementation in gnulib which resides
> in the "hash" module. However, I quickly realised that the hash table in 
> gnulib
> seems to be what is otherwise popularly known as a hash set, i.e., it supports
> storing and retrieving just values from the structure. 
> 
> On the other hand, a hash table is usually expected to have a key->value
> mapping that is stored.

I agree that the gnulib 'hash' module is just a particular case, and
probably the module name is not very descriptive.

> Within GNU Wget, we have a fairly portable version of a hash table implemented
> which I think would be a good addition for gnulib. What do you think?

There's not only the one from wget
  https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.h
  https://git.savannah.gnu.org/gitweb/?p=wget.git;a=blob;f=src/hash.c

but also the one from gettext
  
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.h
  
https://git.savannah.gnu.org/gitweb/?p=gettext.git;a=blob;f=gnulib-local/lib/hash.c

and the one from glib
  https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.h
  https://gitlab.gnome.org/GNOME/glib/blob/master/glib/ghash.c

and the one from libxml
  https://gitlab.gnome.org/GNOME/libxml2/blob/master/include/libxml/hash.h
  https://gitlab.gnome.org/GNOME/libxml2/blob/master/hash.c

and the ones from CLN
  https://www.ginac.de/CLN/cln.git/?p=cln.git;a=tree;f=src/base/hash

and many more.

The implementation you are proposing is an "open-addressed table with linear
probing collision resolution". To me that is unacceptable. When I used Kyoto
Common Lisp (KCL) many years ago, I got an endless loop during a hash table
access, and it was precisely because of this open-addressed table structure.
I don't want a code which requires careful setting of parameters in order
not to run into an endless loop. Instead better have a code that cannot run
into an endless loop *by design*.

The hash_string function that you propose shifts by 5 bits at each step;
I suspect that it has the same problem as the one I tested and discussed in
https://haible.de/bruno/hashfunc.html .

For Gnulib, I would want a generic container, i.e. a "map", like we have
"list" and "ordered set" already (modules 'list' and 'oset'). Other GNU
maintainers have reported that they like this approach.

However, this will still not fit all possible needs because there are
special cases that people want to see optimized:
  - The case when the key is a string; additionally when the key is
    allocated in an obstack and there is no remove.
  - The struniq function (as in localename.c).

Then, what about extra requirements?
  - The existing gnulib 'hash' module is pretty unique: it keeps statistics.
    But is anyone really using this feature?
  - malloc vs. xmalloc.
  - Multithread-safety should IMO not be considered as an extra requirement.
    This is better done in application logic, because typically in the scope
    of the lock the application will do more than just the hash table lookup.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]