[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ASCII-folded search [was: Re: Upcoming loss of usability ...]

From: Eli Zaretskii
Subject: Re: ASCII-folded search [was: Re: Upcoming loss of usability ...]
Date: Thu, 18 Jun 2015 08:27:03 +0300

> From: "Stephen J. Turnbull" <address@hidden>
> Date: Thu, 18 Jun 2015 13:52:49 +0900
> Cc: address@hidden
> Marcin Borkowski writes:
>  > On the other hand, it would be great if we had an "ascii-folding"
>  > option, making (some reasonable subset of) Unicode "equivalent" to
>  > ASCII,
> I believe Emacs already implements NFD normalization.

Yes, see ucs-normalize-NFD-region and friends.

> All you need after that is to skip compose characters when
> searching.

No, it's much more complex than that.  For starters, normalization
won't convert u+2018 etc. to their ASCII counterparts.  The Unicode
Standard doesn't consider those even compatibility-equivalent.  And
for matching just the base characters (which is what I presume is
meant here by "ascii-folding"), we'd need to handle correctly any
number of combinations of pre-composed and decomposed character
sequences in both the search string and the text we search, and
implement that on the fly, since the buffer text obviously cannot be
transformed for these purposes.

So yes, this feature is something that's sorely needed, but volunteers
need to know that the task is not too easy (or else it would have been
done long ago).  Interested individuals can start by studying the
following references:

  . Sections 5.18 "Case Mappings" and 5.19 "Mapping Compatibility
    Variants" of the Unicode Standard

  . UTN#5 "Canonical Equivalence in Applications"

  . UTR#15 "Unicode Normalization Forms"

reply via email to

[Prev in Thread] Current Thread [Next in Thread]