[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Adding a real HashTable implementation to gnulib

From: Tim Rühsen
Subject: Re: [RFC] Adding a real HashTable implementation to gnulib
Date: Mon, 26 Aug 2019 16:24:01 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

Hi Bruno,

sorry for the delay. This somewhat dropped off the tip of my list, but
lurks at the edge of my consciousness, coming up from time to time.

On 12/5/18 12:43 AM, Bruno Haible wrote:
> Hi Tim,
>> Currently we have
>> - positive integer (N): resize by old value * N
>> - negative integer (N): resize by old value + (-N)
> It's currently the opposite:
> - positive integer (N): resize to old size + N
> - negative integer (N): resize to old size * (-N)
>>>   - The ability to set a growth policy > 0. This leads to quadratic
>>>     run time behaviour. I mean, you make such an effort to have O(1)
>>>     access on average, and then the fixed-size increment of the size
>>>     turns the insertion operation into an average-O(n) operation.
>> In Wget2 we use reasonable default sizes for the hashmap to make
>> resizing more or less a corner case.
> The problem is that when you guessed wrong about the "corner case",
> the rehash will consume 99.9% of your program's run time.
>> How can we do it better ?
> 1. Remove the ability to "resize to old size + N",
> 2. (optional) When "resize to old size * FACTOR", allow the factor
>    to be 1.5 or 1.3 or something like that.

That's what I did meanwhile, FACTOR is now a float and that's it.

>>>   - Some functions take keysize and valuesize arguments, some don't.
>>>     I'm confused.
>> The "noalloc" functions just store the given pointers for key and value,
>> that means without allocation. The hashmap takes ownership of the
>> objects and has to release them when being removed/destroyed (or use a
>> NULL destructor when the values are static). This can be used to
>> generate a search index for an existing bunch of values (objects /
>> structures). Like a "unique index" for a field in a database table.
>> Example function: wget_hashmap_put_noalloc()
>> Then we have the functions that do (shallow) allocations/clones, like
>> wget_hashmap_put(). This is for using the hashmap as a container, the
>> original objects have to be (shallow) released outside of the container.
> Hmm. What you describe here is whether ownership is passed to the container
> or kept outside the container. My question was about the keysize and
> valuesize arguments, that is, about the nature of the key and value objects
> (a. a pointer, pointing to anything in memory, or
>  b. a region of memory, with a start address and an end address).
> I think both questions are orthogonal:
>   - In either a or b, you can keep the ownership outside the container
>     by using a NULL destructor function.
>   - For container-owned objects:
>     In case a, you need a clone or copy function indeed,
>     In case b, the container needs to clone precisely the given region of
>     memory.
> But I haven't thought long about it. What do you think?

I thought a while about this and finally decided to leave memory
allocation outside the container, in the responsibility of the caller.

That reduces the number of functions, makes things easier to understand,
reduce overall code. That moves the API usage more into the direction of
a no-brainer. The additional code complexity in the application (wget2)
was much less than I feared.

Thank you for your hints / thoughts and please excuse me for taking so
long to answer.

Regards, Tim

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]