[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Char-folding: how can we implement matching multiple characters as a
From: |
Eli Zaretskii |
Subject: |
Re: Char-folding: how can we implement matching multiple characters as a single "thing"? |
Date: |
Tue, 01 Dec 2015 17:50:12 +0200 |
> Date: Tue, 1 Dec 2015 14:18:30 +0000
> From: Artur Malabarba <address@hidden>
>
> There's also a 3rd option. I posted some code here a while ago that
> implemented char-folding by temporarily replacing the
> (current-case-table) with a char-fold-table. This was fast, and much
> nicer than the current regexps, but it had the limitation of only
> being a character-to-character relation. So it couldn't do something
> as basic as 'a' matching "รค" (because that's 1 char matching 2).
>
> However, it's possible that we could combine the two solutions, using
> this case-table for as much as possible and then using regexps for
> anything else. This way the regexp pattern that replaces each input
> character would likely be considerably smaller than 45 chars (I'd
> guess between 3 and 15 depending on the character).
> The number of branches would still scale badly with the input string
> size. but the smaller multiplicative factor should give us more leeway
> before scaling up to 10k chars.
My gut feeling is that if we go to the C level, we should implement
this properly. Coding another partial solution will almost certainly
bump into some subtle limitations. In particular, any solution that
requires a literal search to use regexps under the hood will present
restrictions, because it will not play well with other regexp-based
features, like word search and C-M-s itself.