[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Upcoming loss of usability of Emacs source files and Emacs.

From: Eli Zaretskii
Subject: Re: Upcoming loss of usability of Emacs source files and Emacs.
Date: Thu, 18 Jun 2015 10:41:02 +0300

> Date: Thu, 18 Jun 2015 09:08:06 +0200
> Cc: address@hidden, address@hidden, address@hidden, address@hidden,
>         address@hidden, address@hidden, address@hidden, address@hidden
> From: Ulrich Mueller <address@hidden>
> >>>>> On Thu, 18 Jun 2015, Eli Zaretskii wrote:
> >> ;; Ignore accent and umlaut marks when searching.
> >> ;; Works for Emacs 19.30 and later.
> >> (let ((eqv-list '("aAàÀáÁâÂãÃäÄåÅ"
> >>              "cCçÇ"
> >>              "eEèÈéÉêÊëË"
> >>              "iIìÌíÍîÎïÏ"
> >>              "nNñÑ"
> >>              "oOòÒóÓôÔõÕöÖøØ"
> >>              "uUùÙúÚûÛüÜ"
> >>              "yYýÝÿ"))
> >>       (table (standard-case-table))
> >>       canon)
> >>   (setq canon (copy-sequence table))
> >>   (mapcar (lambda (s)
> >>        (mapcar (lambda (c) (aset canon c (aref s 0))) s))
> >>      eqv-list)
> >>   (set-char-table-extra-slot table 1 canon)
> >>   (set-char-table-extra-slot table 2 nil)
> >>   (set-standard-case-table table))

Btw, the above doesn't work at all for me in Emacs 25: searching for
'a' doesn't find the variants with diacriticals.  Maybe I didn't use
it correctly -- is something else required beyond evaluating the
expression and making sure I-search does a case-insensitive search?

> > Also, this doesn't handle decomposed characters, as in 'å'.  So this
> > is not really Unicode-compliant, it's a half-measure of sorts.
> The above code snippet predates Unicode Emacs, so you cannot expect it
> to handle NFC and NFD and other intricacies of Unicode normalisation.
> (Also I've never seen anything else than the NFC forms, e.g., for
> German umlauts, in the texts that I usually edit.)

Mac OS X's HFS filesystem holds file names in NFD, AFAIK.

And diacriticals are only the tip of the iceberg.  E.g., when you
search for 'n', won't you want to find 'ⁿ' and '🄝' as well, at least
sometimes, and likewise with '²' and '⒉' and '🄃' when looking for '2'?
These require support for compatibility decompositions, not just for
canonical decompositions as in the case of diacriticals.

> BTW, also isearch-forward doesn't match å when searching for å, and
> vice versa. So by your above argument, search in Emacs isn't Unicode
> compliant anyway.

Of course, Emacs isn't Unicode-compliant -- this is why I said this
feature is sorely needed, and that your proposal is a half-measure.

> (But not sure if it should be, because I think that this would break
> Boyer-Moore.)

It's already broken for multibyte characters anyway.  And yes,
handling equivalence in searching complicates the algorithm even more,
but that's a necessary payment for the extended functionality.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]