aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Add option for forcing a 64bit hash on 32bit systems


From: Kevin Atkinson
Subject: Re: Add option for forcing a 64bit hash on 32bit systems
Date: Sun, 27 Dec 2020 16:11:16 -0500 (EST)
User-agent: Alpine 2.20 (DEB 67 2015-01-07)

On Sat, 26 Dec 2020, Érico Nogueira via The New Aspell Develpment Mailing List 
wrote:

On Sat Dec 26, 2020 at 5:26 PM -03, Kevin Atkinson wrote:
On Tue, 22 Dec 2020, Érico Nogueira wrote:

Aspell compiled dictionary formats are not really meant to be portable.
Compiling a dictionary is now very fast, it is even faster if checks are
disabled. One thing I will be open to is the creation of a portable text
based format which can be compiled very quickly on startup. This will
take some refactoring though to make it work.

Allowing compilation at run time would allow for simpler packaging,
especially when cross compiling. So this would be very nice :)

The basic idea behind the format is just to store the sorted word list (optionally after passing though "aspell clean", and maybe with soundslikes) in a file with a special header. The file will then be compressed using prezip (a special compression for sorted wordlists) and then gzip.

When Aspell is given a portable format it will first check if a compiled dictionary already exists in either $XDG_CACHE_HOME/aspell/ or /var/cache/aspell/ (or maybe /usr/lib/aspell) and if so use that, otherwise it will attempt to compile it to $XDG_CACHE_HOME/aspell/ and if that fails try compiling it to /tmp/aspell/.

I am not sure when I will get around to doing this.  Patches are welcome. :)

The only compatibility option offered is forcing 32bit hashes for all
systems, which makes 64bit systems incapable of reading 64bit dictionaries.

To be clear the only thing this does is change the type the hash function
used from size_t to u32int in modules/speller/default/readonly_ws.cpp:
<< ...
All integers used in the dictionary are 32 bit as 64 bit integers will be an
overkill. The fact that a 64 bit hash function is used on a 32 bit integer is
an oversight. I would rather that a 32 bit hash function is used on all
systems. The only reason that option exists is to avoid breaking dictionary
compatibility on 64bit systems. I am open to enabling 32 bit hashes by
default on the next major version bump.

Note that on Debian Aspell is compiled with --enable-32-bit-hash-fun.

I should also note that Debian compiles the dictionary as part of the post-install process. This saves a lot of space (often an order of magnetite) and avoids having to package architecture specific dictionary packages.

I was looking into simplifying how we build the dictionaries in Void
Linux.  Since our (32-bit) ARM packages are cross compiled from 64-bit
hosts, including aspell dictionaries, I thought standardizing on a
64-bit format would be best. It might make sense, then, to clear up the
explanation in [1], since it isn't clear that 32-bit hashes are actually
preferred. Given your explanation, I will simply force the 32-bit hashes
for all platforms, which is definitely simpler.

How about if a add something like this:

  A 32-bit hash function is preferred as the hash table uses 32-bit integers on
  all platforms.   Future versions of Aspell may default to forcing a 32-bit
  hash function.

Kevin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]