bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)


From: Bruno Haible
Subject: Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)
Date: Sun, 11 Oct 2020 03:28:33 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-189-generic; KDE/5.18.0; x86_64; ; )

Hi Marc,

Some comments:

* GL_HAMT_THREAD_SAFE can be defined to 1 not only if
    __STDC_VERSION__ >= 201112 && ! defined __STD_NO_ATOMICS__
  but also if
    __GNUC__ + (__GNUC_MINOR >= 9) > 4
  (see the other mail).

* Still '#ifdef GL_HAMT_THREAD_SAFE'.

* Typo s/comparision/comparison/

* Hamt_processor: The comment does not say what the return value means. Maybe it
  would be clearer if this type was moved down to the "Iteration" section.

* hamt_lookup: If the caller is allowed to modify the payload stored in the
  returned entry, this function should return a 'Hamt_entry *', not a
  'const Hamt_entry *'. Just like 'strchr' and 'strstr' return a 'char *',
  not a 'const char *'.

* In trienode_count, you don't need count_one_bits_l, only count_one_bits.
  It's OK to assume that 'uint32_t' and 'unsigned int' are the same type.

* If subtrie_lookup was inlined in entry_lookup, you could get rid of the
  forward declaration of entry_lookup, and compilers would be more likely
  to turn the recursion of entry_lookup into a loop (tail-recursion
  elimination).

* If subtrie_do_while was inlined in entry_do_while, you could get rid of the
  forward declaration of entry_do_while.

* It would be useful to be able to construct modules 'hamt-set' and 'hamt-map',
  analogous to 'hash-set' and 'hash-map', but the hamt API currently has two
  issues that block it:

  1) The iterator API is such that you cannot write the iteration using a loop;
     it involves a function call that is active during the entire iteration.
     This is also problematic for two other reasons:

       - Creating a closure in C (unlike Scheme or Lisp) is tedious hand-work.
         (The GNU C extension that allows local functions inside functions [1]
         is not a solution, because
           - it is unportable,
           - it forces an executable stack on many platforms, which is
             considered bad for security.)

       - In Common Lisp, iteration through a hash table is available not only
         through a function-call interface (MAPHASH [2]) but also through an
         iteration-loop interface (WITH-HASH-TABLE-ITERATOR [3], LOOP [4]).

         For example, in GNU clisp, LOOP expands to

         [1]> (macroexpand-1 '(loop for x being each hash-key of h do (foo x)))
         (MACROLET ((LOOP-FINISH NIL (SYSTEM::LOOP-FINISH-ERROR)))
          (BLOCK NIL
           (LET ((#:WHTI-3214 (SYSTEM::HASH-TABLE-ITERATOR H)))
            (MULTIPLE-VALUE-BIND (#:MORE?3215 #:HASH-KEY-3216 #:HASH-VALUE-3217)
             (SYSTEM::HASH-TABLE-ITERATE #:WHTI-3214)
             (DECLARE (IGNORE #:HASH-VALUE-3217))
             (LET ((X NIL))
              (LET NIL
               (MACROLET ((LOOP-FINISH NIL '(GO SYSTEM::END-LOOP)))
                (TAGBODY SYSTEM::BEGIN-LOOP (UNLESS #:MORE?3215 (LOOP-FINISH))
                 (SETQ X #:HASH-KEY-3216) (PROGN (PROGN (FOO X)))
                 (MULTIPLE-VALUE-SETQ
                  (#:MORE?3215 #:HASH-KEY-3216 #:HASH-VALUE-3217)
                  (SYSTEM::HASH-TABLE-ITERATE #:WHTI-3214))
                 (GO SYSTEM::BEGIN-LOOP) SYSTEM::END-LOOP
                 (MACROLET
                  ((LOOP-FINISH NIL (SYSTEM::LOOP-FINISH-WARN)
                    '(GO SYSTEM::END-LOOP)))))))))))) ;

         You can see that there is an initial function call HASH-TABLE-ITERATOR
         upfront, and then a function call to HASH-TABLE-ITERATE in each round
         of the loop.

         This principle makes it possible to iterate e.g. through a hash table
         and a list in parallel.

  2) We have a dilemma regarding use of malloc vs. xmalloc. While modules
     meant for use in programs can readily use xmalloc, modules meant for use
     (also) in libraries cannot use xmalloc, since a library is not supposed
     to terminate the program, even when memory is tight.

     The solution we found for this dilemma is that the *-set and *-map modules
     use just malloc, and we have thin wrapper modules (xset, xmap) that do
     call xalloc_die() in case of out-of-memory.

     Unfortunately, the API of many of the functions need to be adjusted to
     cope with the possibility of an ENOMEM failure. That is tedious work, and
     the code does not look so pretty afterwards... But I see no other way to
     make it fit for use in libraries.

Bruno

[1] https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Nested-Functions.html
[2] 
http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/fun_maphash.html
[3] 
http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/mac_with-hash_ble-iterator.html
[4] 
http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/sec_6-1-2-1-6.html




reply via email to

[Prev in Thread] Current Thread [Next in Thread]