bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)


From: Marc Nieper-Wißkirchen
Subject: Re: [New module] Persistent Hash Array Mapped Tries (HAMTs)
Date: Sun, 11 Oct 2020 10:20:37 +0200

Am So., 11. Okt. 2020 um 03:28 Uhr schrieb Bruno Haible <bruno@clisp.org>:
>
> Hi Marc,
>
> Some comments:
>
> * GL_HAMT_THREAD_SAFE can be defined to 1 not only if
>     __STDC_VERSION__ >= 201112 && ! defined __STD_NO_ATOMICS__
>   but also if
>     __GNUC__ + (__GNUC_MINOR >= 9) > 4

Fixed.

>   (see the other mail).
>
> * Still '#ifdef GL_HAMT_THREAD_SAFE'.

Fixed.

> * Typo s/comparision/comparison/

Fixed.

> * Hamt_processor: The comment does not say what the return value means. Maybe 
> it
>   would be clearer if this type was moved down to the "Iteration" section.

Moved.

> * hamt_lookup: If the caller is allowed to modify the payload stored in the
>   returned entry, this function should return a 'Hamt_entry *', not a
>   'const Hamt_entry *'. Just like 'strchr' and 'strstr' return a 'char *',
>   not a 'const char *'.

Unless the caller knows what it does, modifying the payload is not a
good idea because the entry is shared between different hamts. If the
caller really wants to modify the payload, it has to do an explicit
type cast (which is safe).

> * In trienode_count, you don't need count_one_bits_l, only count_one_bits.
>   It's OK to assume that 'uint32_t' and 'unsigned int' are the same type.

Okay.

> * If subtrie_lookup was inlined in entry_lookup, you could get rid of the
>   forward declaration of entry_lookup, and compilers would be more likely
>   to turn the recursion of entry_lookup into a loop (tail-recursion
>   elimination).

I would prefer to have smaller functions. Are there any relevant
compilers who do not inline the function at higher recursion levels?

> * If subtrie_do_while was inlined in entry_do_while, you could get rid of the
>   forward declaration of entry_do_while.

> * It would be useful to be able to construct modules 'hamt-set' and 
> 'hamt-map',
>   analogous to 'hash-set' and 'hash-map', but the hamt API currently has two
>   issues that block it:
>
>   1) The iterator API is such that you cannot write the iteration using a 
> loop;
>      it involves a function call that is active during the entire iteration.
>      This is also problematic for two other reasons:
>
>        - Creating a closure in C (unlike Scheme or Lisp) is tedious hand-work.
>          (The GNU C extension that allows local functions inside functions [1]
>          is not a solution, because
>            - it is unportable,
>            - it forces an executable stack on many platforms, which is
>              considered bad for security.)
>
>        - In Common Lisp, iteration through a hash table is available not only
>          through a function-call interface (MAPHASH [2]) but also through an
>          iteration-loop interface (WITH-HASH-TABLE-ITERATOR [3], LOOP [4]).
>
>          For example, in GNU clisp, LOOP expands to
>
>          [1]> (macroexpand-1 '(loop for x being each hash-key of h do (foo 
> x)))
>          (MACROLET ((LOOP-FINISH NIL (SYSTEM::LOOP-FINISH-ERROR)))
>           (BLOCK NIL
>            (LET ((#:WHTI-3214 (SYSTEM::HASH-TABLE-ITERATOR H)))
>             (MULTIPLE-VALUE-BIND (#:MORE?3215 #:HASH-KEY-3216 
> #:HASH-VALUE-3217)
>              (SYSTEM::HASH-TABLE-ITERATE #:WHTI-3214)
>              (DECLARE (IGNORE #:HASH-VALUE-3217))
>              (LET ((X NIL))
>               (LET NIL
>                (MACROLET ((LOOP-FINISH NIL '(GO SYSTEM::END-LOOP)))
>                 (TAGBODY SYSTEM::BEGIN-LOOP (UNLESS #:MORE?3215 (LOOP-FINISH))
>                  (SETQ X #:HASH-KEY-3216) (PROGN (PROGN (FOO X)))
>                  (MULTIPLE-VALUE-SETQ
>                   (#:MORE?3215 #:HASH-KEY-3216 #:HASH-VALUE-3217)
>                   (SYSTEM::HASH-TABLE-ITERATE #:WHTI-3214))
>                  (GO SYSTEM::BEGIN-LOOP) SYSTEM::END-LOOP
>                  (MACROLET
>                   ((LOOP-FINISH NIL (SYSTEM::LOOP-FINISH-WARN)
>                     '(GO SYSTEM::END-LOOP)))))))))))) ;
>
>          You can see that there is an initial function call 
> HASH-TABLE-ITERATOR
>          upfront, and then a function call to HASH-TABLE-ITERATE in each round
>          of the loop.
>
>          This principle makes it possible to iterate e.g. through a hash table
>          and a list in parallel.

In Scheme, it is even more relevant because you have call/cc so any
iteration can be suspended and restored any number of times.

I thought this could be solved by storing a "next" pointer in the
payload but this doesn't work with hamts sharing entries... So I will
add some iterator interface as well. Preferably one that can also be
used to implement Hamts in Scheme.

>   2) We have a dilemma regarding use of malloc vs. xmalloc. While modules
>      meant for use in programs can readily use xmalloc, modules meant for use
>      (also) in libraries cannot use xmalloc, since a library is not supposed
>      to terminate the program, even when memory is tight.

GMP more or less has this behavior.

>      The solution we found for this dilemma is that the *-set and *-map 
> modules
>      use just malloc, and we have thin wrapper modules (xset, xmap) that do
>      call xalloc_die() in case of out-of-memory.
>
>      Unfortunately, the API of many of the functions need to be adjusted to
>      cope with the possibility of an ENOMEM failure. That is tedious work, and
>      the code does not look so pretty afterwards... But I see no other way to
>      make it fit for use in libraries.

The bigger problem is that it mustn't make the code slower for the
99,99999% of the use cases where there is either enough memory or
calling xalloc_die is the only reasonable action.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]