[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hamt: New module.

From: Bruno Haible
Subject: Re: [PATCH] hamt: New module.
Date: Sat, 03 Apr 2021 21:55:45 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-206-generic; KDE/5.18.0; x86_64; ; )

Hi Marc,

> what do you think of the attempt attached below?

I believe that technical documentation of a feature should be written
to answer the questions in this order:
  1. What is it? (Short summary)
  2. When would I use it? (Use cases)
  3. How do I use it?
  4. Further details.

The idea is to get the reader understand rapidly whether they want to
use the feature or not. That is the purpose of parts 1 and 2.

For part 1:

> +The @code{hamt} module provides a persistant version of persistent hash
> +array mapped tries (HAMTs).

"A persistant version of persistent ..." ?

The term "persistent" is only explained much later.

> A HAMT is an array mapped trie in which
> +elements are stored according to the initial bits of their hash values.

This is a technical detail, to be mentioned later.

I would start in a different way. How about this?

  The @code{hamt} module implements the hash array mapped trie (HAMT)
  data structure.  This is a data structure that contains (key, value)
  pairs (or just plain keys, if no value is needed).  Lookup of a
  (key, value) pair given the key is an O(1) operation, assuming a
  good hash function for the keys is employed.

  The HAMT data structure is useful when you want modifications
  (additions of pairs, removal, value changes) to be visible only to
  some part of your program, whereas other parts of the program
  continue to use the unmodified HAMT.  The HAMT makes this possible
  in a space-efficient manner: the modified and the unmodified HAMT
  share most of their allocated memory.  It is also time-efficient:
  Every such modification is O(1) on average, again assuming a good
  hash function for the keys

For part 2:

> +A HAMT can be used whenever an ordinary hash table would be used.  It
> +does however, provide non-destructive updating operations without the
> +need to copy the whole container  On the other hand, a hash table is
> +simpler so that its performance may be better when persistence is not
> +needed.
> +
> +For example, a HAMT can be used to model the dynamic environment in a
> +LISP interpreter.  Updating a value in the dynamic environment of one
> +continuation frame would not modify values in earlier frames.

This is good; this should come right after part 1.

Now part 3, right after part 2:

> +To use the module, include @code{hamt.h} in your code.  The public
> +interface is documented in that header file.


And the rest are details that can come after part 3:

> +In the current implementation, each inner node of the HAMT can store up
> +to @math{32 = 2^5} elements and subtries.  Whenever a collision between
> +the initial bits of the hash values of two elements happens, the next
> +@math{5} bits of the hash values are examined and the two elements
> +pushed down one level in the trie.
> +
> +HAMTs have the same average access times as hash tables but grow and
> +shrink dynamically, so they use memory more economically and do not have
> +to be periodically resized.
> +
> +They were described and analyzed in @cite{Phil Bagwell (2000). Ideal
> +   Hash Trees (Report). Infoscience Department, École Polytechnique
> +   Fédérale de Lausanne.}
> +
> +HAMTs are well-suited to a persistent data structure, which means that
> +each updating operation (like inserting, replacing, or removing an
> +element) returns a new HAMT while leaving the original one intact.  This
> +is achieved through structure sharing, which is even safe in the
> +presence of multiple threads when the used C compiler supports atomics.
> +

By the way, I just starred at the Hamt_entry struct and wondered how to
make use of it. The test class shows it. How about extending the comment
a bit:

   Each element has to be a
   struct whose initial member is of the type Hamt_entry.
   Each element has to be a
   struct whose initial member is of the type Hamt_entry.  You need to
   define this struct yourself.  It will typically contain a
   Hamt_entry, a key, and optionally a value.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]