[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Comparing two hash tables for equality?
From: |
Mark H Weaver |
Subject: |
Re: Comparing two hash tables for equality? |
Date: |
Sun, 26 Aug 2018 18:49:00 -0400 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Hi Aleksandar,
Aleksandar Sandic <address@hidden> writes:
> I have been looking through the reference manual, but there does not seem to
> be a procedure for comparing two hash tables for equality. The procedure
> 'equal?' only returns true if the two tables are also 'eq?':
>
> scheme@(guile-user)> (define h1 (make-hash-table))
> scheme@(guile-user)> (define h2 (make-hash-table))
> scheme@(guile-user)> (equal? h1 h2)
> $1 = #f
> scheme@(guile-user)> (equal? h1 h1)
> $2 = #t
>
> Is this the intended behaviour?
Yes. That might seem surprising. I have a few things to say on that,
in no particular order:
An equality test on hash tables needs to know how to compare the keys
and how to compare the values. There's no way to pass those additional
arguments to 'equal?', so it can't do that job.
Hash table implementations typically don't offer an equality test on the
hash tables themselves, and I don't recall anyone ever asking for such
an operation before now. I guess that's because in the use case where
hash tables are beneficial -- when the tables are very large --
comparing them for equality is expensive.
While hash tables are indispensible for certain use cases, they also
have very significant downsides, and I tend to avoid them whenever
possible. Their most severe downside, in my opinion, is that they are
fundamentally incompatible with a functional programming style. They
invariably force all code that uses them to be written in an imperative
style. I cannot stress enough how important this is.
It's also not possible to efficiently create a new hash table based on
an existing one, but with one or more additional entries. In Scheme,
you can efficiently 'cons' or 'append' some entries onto the front of an
existing list without modifying the original list. That includes
association lists. To do the same with a hash table, you must make a
copy of the entire hash table and then add the new entries to the copy.
There can be no sharing of storage between multiple hash tables, as can
be done with lists, association lists, or balanced trees. Even if you
don't need sharing, hash tables also use more space than those other
data structures.
So, I would ask yourself whether the benefits of hash tables truly
outweigh the downsides in your particular use case.
If you're comfortable sharing details, I'd be curious to hear what
you're trying to accomplish here, at a higher level. Maybe there's a
better way.
Anyway, if you really need to compare hash tables for equality, here's
some suggested code that's simpler than the code you wrote, and should
be quite a bit more efficient as well. In particular, this code
performs no heap allocation:
--8<---------------cut here---------------start------------->8---
(use-modules (ice-9 match))
(define (hash-table-subset? val=? h1 h2)
"H1 and H2 must be hash tables which use equal? as the equality test
for keys. Return #t if and only if for every entry (K . V1) in H1,
there exists an entry (K . V2) in H2 such that (VAL=? V1 V2),
otherwise return #f."
(hash-fold (lambda (k v1 equal-so-far)
(and equal-so-far
(match (hash-get-handle h2 k)
((_ . v2) (val=? v1 v2))
(#f #f))))
#t
h1))
(define (hash-table=? val=? h1 h2)
"H1 and H2 must be hash tables which use equal? as the equality test
for keys. Return #t if H1 and H2 have the same keys (in the sense of
equal?) and each key has the same value (in the sense of val=?),
otherwise return #f."
(and (hash-table-subset? val=? h1 h2)
(hash-table-subset? val=? h2 h1)))
(define (equal*? a b)
(or (equal? a b)
(and (hash-table? a)
(hash-table? b)
(hash-table=? equal*? a b))))
--8<---------------cut here---------------end--------------->8---
You might not need 'equal*?' here. I included it only to match more
closely the behavior of your code.
For most purposes, 'hash-table=?' is actually more flexible than
'equal*?', as long as the "type" of values stored in a given hash table
are known. If the values stored in the hash table are numbers, then
'val=?' should be '='. If the values are themselves hash tables, then
'val=?' should be (lambda (a b) (hash-table=? foo=? a b)) for some value
of 'foo=?'. In some other cases, 'val=?' should be 'equal?'.
If you truly need hash tables whose values could contain either hash
tables or some other type, then something like 'equal*?' might be the
right tool. However, keep in mind that 'equal?' is usually not the
desired equality test for numbers, because (equal? 1 1.0) => #false and
(equal? +0.0 -0.0) => #false. 'equal?' tests for _operational_
equivalence, which makes distinctions important for memoization and some
other purposes, whereas '=' tests for mathematical equality and is
usually what you want for numbers.
Regards,
Mark