pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Merge the "charset" branch?


From: Ben Pfaff
Subject: Re: Merge the "charset" branch?
Date: Wed, 08 Apr 2009 21:59:04 -0700
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)

John Darrington <address@hidden> writes:

> One small issue I've just noticed:
>
>  hash = hsh_hash_string (tocode) ^ hsh_hash_string (fromcode);
>
> Since the ^ operator is commutative, won't this create the same hash
> for complementary convertors?  It's going to be very common to have,
> say, a utf8-to-latin1 convertor and a latin1-to-utf8 convertor
> concurrently.

Good point.

I've been meaning to replace the PSPP hash functions for a while
now.  The FNV hash is not so great, and our implementations lack
a "basis" or "initval" argument that can be used to combine
hashes in a high-quality way (e.g. not XOR of their results).

So I've pushed a branch for review that fixes these problems.
It's named "hash", and here is the summary:

commit b4e3275011982e29b80589bef705fc8a0a0316dd
Author: Ben Pfaff <address@hidden>
Date:   Wed Apr 8 21:39:22 2009 -0700

    NPAR TESTS: Consistently order variables in summary statistics.
    
    The set of variables in the NPAR TESTS specs structure was ordered
    randomly, according to however the hash function happened to arrange them.
    Sort them by variable name, instead, so that they always appear in
    alphabetical order in, e.g., descriptive statistics output.
    
    The particular hash function PSPP uses now tends to order variables
    alphabetically anyhow.  The next commit changes the PSPP hash functions,
    so fixing this in advance prevents having to update any test output.

 src/language/stats/npar.q |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

commit e9c717e43278364a49b68db4718cab5c9229c8fb
Author: Ben Pfaff <address@hidden>
Date:   Wed Apr 8 21:55:31 2009 -0700

    Use Bob Jenkins lookup3 hash instead of FNV.
    
    The Jenkins lookup3 hash is superior to FNV in collision resistance,
    avalanching, and performance on systems that do not have fast
    multiplication.  It also provides a good way to combine the result of
    a previous hashing step with the current hash, using its "basis" argument.
    This commit replaces the PSPP implementation of FNV with the Jenkins
    lookup3 hash and updates all the current users.
    
    In addition, John Darrington pointed out that commit dd2e61b4a
    "Make create_iconv() properly distinguish converters by name"
    unintentionally introduced gratuitous hash collisions, by causing
    all converters where tocode and fromcode were the same to hash to
    value 0, and converters where tocode and fromcode were swapped to
    hash to the same value as each other.  Using the "basis" argument to
    the Jenkins hash properly, instead of just attempting to combine
    hash values with XOR, fixes this problem.

 src/data/attributes.c           |    6 +-
 src/data/file-handle-def.c      |   12 ++-
 src/data/file-name.c            |    6 +-
 src/data/short-names.c          |    8 +-
 src/data/value-labels.c         |    8 +-
 src/data/value.c                |    6 +-
 src/data/variable.c             |    4 +-
 src/language/stats/autorecode.c |    4 +-
 src/language/stats/crosstabs.q  |    2 +-
 src/libpspp/hash-functions.c    |  196 ++++++++++++++++++++++++++++-----------
 src/libpspp/hash-functions.h    |   12 +-
 src/libpspp/i18n.c              |    2 +-
 src/math/covariance-matrix.c    |   11 +-
 13 files changed, 180 insertions(+), 97 deletions(-)

-- 
"Premature optimization is the root of all evil."
--D. E. Knuth, "Structured Programming with go to Statements"




reply via email to

[Prev in Thread] Current Thread [Next in Thread]