bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hamt: New module.


From: Marc Nieper-Wißkirchen
Subject: Re: [PATCH] hamt: New module.
Date: Sat, 3 Apr 2021 22:32:22 +0200

Hi Bruno,

thank you very much for all the helpful feedback. I have incorporated your suggestions, and you can find an updated diff below.

Thanks,

Marc

diff --git a/doc/containers.texi b/doc/containers.texi
index 15c915b93..9bc1ae43d 100644
--- a/doc/containers.texi
+++ b/doc/containers.texi
@@ -35,6 +35,9 @@ log
 Gnulib provides several generic container data types.  They can be used
 to organize collections of application-defined objects.
 
+@node Ordinary containers
+@subsection Ordinary container data types
+
 @multitable @columnfractions .15 .5 .1 .1 .15
 @headitem Data type
 @tab Details
@@ -599,6 +602,58 @@ For C++, Gnulib provides a C++ template class for each of these container data t
 @tab @code{"gl_omap.hh"}
 @end multitable
 
+@node Specialized containers
+@subsection Specialized container data types
+
+The @code{hamt} module implements the hash array mapped trie (HAMT) data
+structure.  This is a data structure that contains (key, value) pairs.
+Lookup of a (key, value) pair given the key is an O(1) operation,
+assuming a good hash function for the keys is employed.
+
+The HAMT data structure is useful when you want modifications (additions
+of pairs, removal, value changes) to be visible only to some part of
+your program, whereas other parts of the program continue to use the
+unmodified HAMT.  The HAMT makes this possible in a space-efficient
+manner: the modified and the unmodified HAMT share most of their
+allocated memory.  It is also time-efficient: Every such modification
+is O(1) on average, again assuming a good hash function for the keys.
+
+A HAMT can be used whenever an ordinary hash table would be used.  It
+does however, provide non-destructive updating operations without the
+need to copy the whole container  On the other hand, a hash table is
+simpler so that its performance may be better when persistence is not
+needed.
+
+For example, a HAMT can be used to model the dynamic environment in a
+LISP interpreter.  Updating a value in the dynamic environment of one
+continuation frame would not modify values in earlier frames.
+
+To use the module, include @code{hamt.h} in your code.  The public
+interface is documented in that header file.  You have to provide a hash
+function and an equivalence relation, which defines key equality.  The
+module includes a test file @code{test-hamt.c}, which demonstrates how
+the API can be used.
+
+In the current implementation, each inner node of the HAMT can store up
+to @math{32 = 2^5} entries and subtries.  Whenever a collision between
+the initial bits of the hash values of two entries would happen, the
+next @math{5} bits of the hash values are examined and the two entries
+pushed down one level in the trie.
+
+HAMTs have the same average access times as hash tables but grow and
+shrink dynamically, so they use memory more economically and do not have
+to be periodically resized.
+
+They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
+Hash Trees (Report). Infoscience Department, École Polytechnique
+Fédérale de Lausanne.}
+
+HAMTs are well-suited to a persistent data structure, which means that
+each updating operation (like inserting, replacing, or removing an
+element) returns a new HAMT while leaving the original one intact.  This
+is achieved through structure sharing, which is even safe in the
+presence of multiple threads when the used C compiler supports atomics.
+
 @ifnottex
 @unmacro log
 @end ifnottex
diff --git a/lib/hamt.h b/lib/hamt.h
index 55ee964dd..25a0ad9f9 100644
--- a/lib/hamt.h
+++ b/lib/hamt.h
@@ -78,10 +78,13 @@ _GL_INLINE_HEADER_BEGIN
 /************/
 
 /* A hamt stores pointers to elements.  Each element has to be a
-   struct whose initial member is of the type Hamt_entry.  An element
-   is conceptually owned by a hamt as soon as it is inserted.  It will
-   be automatically freed as soon as the last hamt containing it is
-   freed.  */
+   struct whose initial member is of the type Hamt_entry.  You need to
+   define this struct yourself.  It will typically contain an
+   Hamt_entry, a key, and, optionally, a value.
+
+   An element is conceptually owned by a hamt as soon as it is
+   inserted.  It will be automatically freed as soon as the last hamt
+   containing it is freed.  */
 typedef struct
 {
 #if GL_HAMT_THREAD_SAFE

Am Sa., 3. Apr. 2021 um 21:55 Uhr schrieb Bruno Haible <bruno@clisp.org>:
Hi Marc,

> what do you think of the attempt attached below?

I believe that technical documentation of a feature should be written
to answer the questions in this order:
  1. What is it? (Short summary)
  2. When would I use it? (Use cases)
  3. How do I use it?
  4. Further details.

The idea is to get the reader understand rapidly whether they want to
use the feature or not. That is the purpose of parts 1 and 2.

For part 1:

> +The @code{hamt} module provides a persistant version of persistent hash
> +array mapped tries (HAMTs).

"A persistant version of persistent ..." ?

The term "persistent" is only explained much later.

> A HAMT is an array mapped trie in which
> +elements are stored according to the initial bits of their hash values.

This is a technical detail, to be mentioned later.

I would start in a different way. How about this?

  The @code{hamt} module implements the hash array mapped trie (HAMT)
  data structure.  This is a data structure that contains (key, value)
  pairs (or just plain keys, if no value is needed).  Lookup of a
  (key, value) pair given the key is an O(1) operation, assuming a
  good hash function for the keys is employed.

  The HAMT data structure is useful when you want modifications
  (additions of pairs, removal, value changes) to be visible only to
  some part of your program, whereas other parts of the program
  continue to use the unmodified HAMT.  The HAMT makes this possible
  in a space-efficient manner: the modified and the unmodified HAMT
  share most of their allocated memory.  It is also time-efficient:
  Every such modification is O(1) on average, again assuming a good
  hash function for the keys

For part 2:

> +A HAMT can be used whenever an ordinary hash table would be used.  It
> +does however, provide non-destructive updating operations without the
> +need to copy the whole container  On the other hand, a hash table is
> +simpler so that its performance may be better when persistence is not
> +needed.
> +
> +For example, a HAMT can be used to model the dynamic environment in a
> +LISP interpreter.  Updating a value in the dynamic environment of one
> +continuation frame would not modify values in earlier frames.

This is good; this should come right after part 1.

Now part 3, right after part 2:

> +To use the module, include @code{hamt.h} in your code.  The public
> +interface is documented in that header file.

OK.

And the rest are details that can come after part 3:

> +In the current implementation, each inner node of the HAMT can store up
> +to @math{32 = 2^5} elements and subtries.  Whenever a collision between
> +the initial bits of the hash values of two elements happens, the next
> +@math{5} bits of the hash values are examined and the two elements
> +pushed down one level in the trie.
> +
> +HAMTs have the same average access times as hash tables but grow and
> +shrink dynamically, so they use memory more economically and do not have
> +to be periodically resized.
> +
> +They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
> +   Hash Trees (Report). Infoscience Department, École Polytechnique
> +   Fédérale de Lausanne.}
> +
> +HAMTs are well-suited to a persistent data structure, which means that
> +each updating operation (like inserting, replacing, or removing an
> +element) returns a new HAMT while leaving the original one intact.  This
> +is achieved through structure sharing, which is even safe in the
> +presence of multiple threads when the used C compiler supports atomics.
> +

By the way, I just starred at the Hamt_entry struct and wondered how to
make use of it. The test class shows it. How about extending the comment
a bit:

   Each element has to be a
   struct whose initial member is of the type Hamt_entry.
->
   Each element has to be a
   struct whose initial member is of the type Hamt_entry.  You need to
   define this struct yourself.  It will typically contain a
   Hamt_entry, a key, and optionally a value.

Bruno


reply via email to

[Prev in Thread] Current Thread [Next in Thread]