[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: search-default-mode char-fold-to-regexp and Greek Extended block cha
From: |
Robert Pluim |
Subject: |
Re: search-default-mode char-fold-to-regexp and Greek Extended block characters |
Date: |
Mon, 22 Jul 2019 20:39:22 +0200 |
>>>>> On Sun, 21 Jul 2019 13:03:37 +0200, Robert Pluim <address@hidden> said:
>>>>> On Fri, 19 Jul 2019 21:13:02 +0300, Eli Zaretskii <address@hidden> said:
Eli> (get-char-code-property ?ί 'decomposition) => (943) ; (#x03af) i.e (?ί)
Eli> (get-char-code-property ?ί 'decomposition) => (953 769) ; (#x03b9
#x0301)
Eli> Do we expand the decomposition property recursively? It sounds like
Eli> we don't, but maybe we should.
Robert> We donʼt. The following patch allows searching for ι (0x3b9) to
match
Robert> both ί (0x3af) and ί (1f77). It doesnʼt recurse, but I have no idea
if
Robert> there are longer chains of decompositions.
The answer to that, empirically, is 'yes', since with the following
patch the number of characters equivalent to ι increases, ie:
Standard =>
(aref char-fold-table ?ι)"\\(?:ι[̀́̄̆̈̓̔͂]\\|[ίιϊἰἱὶιῐῑῖ𝛊𝜄𝜾𝝸𝞲]\\)"
2 level decomposition =>
(aref char-fold-table ?ι)"\\(?:ι[̀́̄̆̈̓̔͂]\\|[ΐίιϊἰ-ἷὶίιῐῑῒῖῗ𝛊𝜄𝜾𝝸𝞲]\\)"
n level decomposition =>
(aref char-fold-table ?ι)"\\(?:ι[̀́̄̆̈̓̔͂]\\|[ΐίιϊἰ-ἷὶίιῐ-ΐῖῗ𝛊𝜄𝜾𝝸𝞲]\\)"
>From 3628379cf461805008b34e01dba751183c0b857c Mon Sep 17 00:00:00 2001
From: Robert Pluim <address@hidden>
Date: Mon, 22 Jul 2019 20:27:59 +0200
Subject: [PATCH] Follow decomposition chains when constructing char-fold-table
To: address@hidden
* lisp/char-fold.el (char-fold-make-table): Decompose the
decomposition of each character, adding equivalences to the original
character, until no more decompositions are left.
---
etc/NEWS | 8 ++++++++
lisp/char-fold.el | 21 +++++++++++++++++++++
2 files changed, 29 insertions(+)
diff --git a/etc/NEWS b/etc/NEWS
index e9ec21bb4c..33fe7075ec 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1169,6 +1169,14 @@ and case-sensitivity together with search strings in the
search ring.
+++
*** 'flush-lines' prints and returns the number of deleted matching lines.
+---
+*** 'char-fold-to-regexp' now matches more variants of a base character.
+The table used to check for equivalence of characters is now built
+using the complete chain of unicode decompositions of a character,
+rather than stopping after one level, such that searching for
+e.g. GREEK SMALL LETTER IOTA will now also find GREEK SMALL LETTER
+IOTA WITH OXIA.
+
** Debugger
+++
diff --git a/lisp/char-fold.el b/lisp/char-fold.el
index 9d3ea17b41..6842d38a62 100644
--- a/lisp/char-fold.el
+++ b/lisp/char-fold.el
@@ -78,6 +78,27 @@
(cons (char-to-string char)
(aref equiv (car decomp))))))))
(funcall make-decomp-match-char decomp char)
+ ;; Check to see if the first char of the decomposition
+ ;; has a further decomposition. If so, add a mapping
+ ;; back from that second decomposition to the original
+ ;; character. This allows e.g. 'ι' (GREEK SMALL LETTER
+ ;; IOTA) to match both the Basic Greek block and
+ ;; Extended Greek block variants of IOTA +
+ ;; diacritical(s). Repeat until there are no more
+ ;; decompositions.
+ (let ((dec decomp)
+ next-decomp)
+ (catch 'done
+ (while dec
+ (setq next-decomp (char-table-range table (car dec)))
+ (when (consp next-decomp)
+ (when (symbolp (car next-decomp))
+ (setq next-decomp (cdr next-decomp)))
+ (if (not (eq (car dec)
+ (car next-decomp)))
+ (funcall make-decomp-match-char (list (car
next-decomp)) char)
+ (throw 'done t)))
+ (setq dec next-decomp))))
;; Do it again, without the non-spacing characters.
;; This allows 'a' to match 'ä'.
(let ((simpler-decomp nil)
--
2.21.0.419.gffac537e6c
- search-default-mode char-fold-to-regexp and Greek Extended block characters, Robert Pluim, 2019/07/19
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Eli Zaretskii, 2019/07/19
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Robert Pluim, 2019/07/19
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Eli Zaretskii, 2019/07/19
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Robert Pluim, 2019/07/21
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters,
Robert Pluim <=
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Eli Zaretskii, 2019/07/23
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Robert Pluim, 2019/07/23
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Juri Linkov, 2019/07/23
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Robert Pluim, 2019/07/24
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Robert Pluim, 2019/07/24
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Robert Pluim, 2019/07/24
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Juri Linkov, 2019/07/24
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Basil L. Contovounesios, 2019/07/24
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Juri Linkov, 2019/07/25
- Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Re: search-default-mode char-fold-to-regexp and Greek Extended block characters, Robert Pluim, 2019/07/25