[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: case-insensitive hash of strings

From: Bruno Haible
Subject: Re: case-insensitive hash of strings
Date: Tue, 21 Aug 2007 22:59:01 +0200
User-agent: KMail/1.5.4


> A couple of questions.  First, in hash-pjw.c, should we be using unsigned
> char instead of char to iterate through the NUL-terminated string?

I believe it should usually have no effect on the average number of
collisions (= average length of a non-empty hash bucket), but I would be
more comfortable with this change if you could post some concrete figures.

I would assume that the gcc-generated machine code for both cases is equally

> Second, would it be worth adding a case-insensitive version of hash_pjw,
> so that strings can be hashed to the same value regardless of their case?
>  It only makes sense for single-byte locales, but that's all the more that
> hash_pjw accommodates at the moment.

The majority of locales in use nowadays are multibyte locales (UTF-8,
GB18030 and EUC-*). Therefore I would concentrate on a solution that works
for both kinds of locales.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]