[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ASCII-folded search [was: Re: Upcoming loss of usability ...]

From: Stephen J. Turnbull
Subject: Re: ASCII-folded search [was: Re: Upcoming loss of usability ...]
Date: Thu, 18 Jun 2015 16:48:58 +0900

Eli Zaretskii writes:

 > No, it's much more complex than that.  For starters, normalization
 > won't convert u+2018 etc. to their ASCII counterparts.  The Unicode
 > Standard doesn't consider those even compatibility-equivalent.

True, but the OP asked for a "reasonable subset".  Given the context,
sure, we'd have to go beyond what NFD (or NFKD) gives, but that could
be done over time, starting with a few quotation characters (which can
probably be assembled by selecting on Unicode name).

 > And for matching just the base characters (which is what I presume
 > is meant here by "ascii-folding"), we'd need to handle correctly
 > any number of combinations of pre-composed and decomposed character
 > sequences in both the search string and the text we search, and
 > implement that on the fly, since the buffer text obviously cannot
 > be transformed for these purposes.

That's not at all obvious, for two reasons.  (1) If the applications
producing and consuming the buffer text claim Unicode conformance, we
sure can.  (2) Nobody said we have to do the transformation in place.

 > Interested individuals can start by studying the following
 > references:

I don't think that's the place to start.  The whole idea is heuristic.
Sure, at some point we'd want to improve accuracy by applying those
TRs, but anyone who wants to do this can start with just the

I'm not offering to do this myself, so your advice is better than
mine.  But it *could* be done this way.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]