bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: modules for generic unordered sets and mappings


From: Paul Eggert
Subject: Re: RFC: modules for generic unordered sets and mappings
Date: Thu, 01 Jul 2010 17:44:46 -0700
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100423 Thunderbird/3.0.4

I've just gone through "du", and it's fresh in my
mind, so I thought I'd run through what I needed there, to see how
well it maps to the proposal.

I needed three kinds of hash tables.

* Table A is indexed by device number (a dev_t value) and the value is a
  secondary hash table.  Typically the number of devices is relatively
  small.

* Table B is indexed by inode number (an ino_t value) and has no value.
  All that matters is whether the key is present.  So it is a set,
  not a general mapping.  Often there are many, many inode numbers.

* The gnulib hash package uses void * keys, but having a void * value
  that exists only to point to an ino_t value wastes memory.  So, if
  an inode number is small (less than M, say), it maps to itself.  If
  it is large, it is a key into a third table (table C) whose values
  are mapped inode numbers.  (Values in table C are always M or
  greater.)  The mapped inode number (whether taken from table C, or
  used directly) is cast to void * and then used as the index into
  table B.  In practice, most inode numbers are less than M so this is
  a storage win.

All three tables are add-only; nothing is ever removed from any of them.
The tables never need to be freed (although "du" currently does free them,
to debug memory leaks in the underlying hash table implementation).

All that I really needed, by the way, was a single set, indexed by
dev+ino, saying whether the dev+ino key was present.  Doing this in
the obvious way, though, chewed up too much storage, much of it
redundant, as the device numbers were typically repeated.  Hence
the two-level Table A and Table B above.

(It would be helpful if sets of integers were implemented more
efficiently than simple hash tables using integer keys, but now I
realize I'm asking for quite a bit so perhaps I should quit while I'm
ahead.)

> There are two flavours, depending on the key type: either a pointer
> (no copying involved upon insertion), or a memory block of varying size
> (copied during insertion).

For du's inode-keyed tables, I'd like a key type that is an integer
(not a pointer or a memory block).  With the above implementation, a
size_t key is enough for du's needs.  uintptr_t would also be a
reasonable candidate, if you'd rather have the base code use uintptr_t
and the void * version be a thin wrapper around that.

If uintmax_t keys were supported then that would be simpler for du (it
wouldn't need table C), but this would most likely be a performance
hit on machines with 32-bit addresses.

>    Here the search method returns a void *, with value (void*)-1
>    denoting "not found". Hmm, or should the search method better take a
>    'bool *' argument???

Whatever it is, it should be simple and fast.

Also, there needs to be a fast and easy way to say "I want to add this
to the table, but only if it's not there already, and if it is there
already I want you to tell me what value is already there, and I want
to be able to tell that I didn't add anything to the table".




reply via email to

[Prev in Thread] Current Thread [Next in Thread]