[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: idn.el and confusables.txt
From: |
Ted Zlatanov |
Subject: |
Re: idn.el and confusables.txt |
Date: |
Sat, 14 May 2011 12:06:04 -0500 |
User-agent: |
Gnus/5.110018 (No Gnus v0.18) Emacs/24.0.50 (gnu/linux) |
On Sat, 14 May 2011 19:42:39 +0300 Eli Zaretskii <address@hidden> wrote:
EZ> Isn't it better to design the table for efficient use to begin with?
Yes, and I ask you and the other experts on char-tables to help with
that design. I am far from an expert on that topic.
>> But I don't know if markchars.el needs to be terribly fast.
EZ> I hope we are not introducing another character property for a
EZ> single use. Some use, some day might need to do it fast.
This is premature optimization. I only have a single use in hand.
Let's make sure markchars.el is fast and we can optimize for other uses
when they are needed.
>> Two char-tables would be enough: one small table for the confusable ->
>> target mapping, and one even smaller for the reverse target ->
>> (confusable list) mapping. The reverse lookup table could be stored in
>> an extra slot of the primary lookup table.
EZ> Doesn't confusables.txt include both mappings already? If so, you
EZ> don't need the reverse table.
I thought the lookups would be faster with a reverse mapping in one of
the scenarios you listed (looking up all the characters that might be
confused with a given one). But I realized it doesn't need to be.
Let's say C1, C2, and C3 are confusables mapped to C1. Then the mapping
is C1 -> (C2, C3); C2 -> C1; and C3 -> C1.
The algorithm is "if a character maps to an atom it's confusable with
it, if it maps to a list the whole lisp is confusable to this
character." So to find all the confusables mapped to a character you
need at most two lookups.
In addition to the character mapping we also need a confusable data
type, which can be SL/SA (single-script) or ML/MA (mixed-script). I
don't know where to store that. Maybe we can just have two char-tables
for the two data types. There aren't going to be more data types
AFAIK. But markchars.el can definitely use the knowledge that the
confusable is within a single script or not.
Does all of that make sense?
Ted
- Re: idn.el and confusables.txt, Stefan Monnier, 2011/05/13
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/13
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
- Re: idn.el and confusables.txt, Lennart Borgman, 2011/05/14
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
- Re: idn.el and confusables.txt,
Ted Zlatanov <=
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/14
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/14
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/15
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/15
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/16
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/16
- Re: idn.el and confusables.txt, Eli Zaretskii, 2011/05/17
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/17
- Re: idn.el and confusables.txt, Ted Zlatanov, 2011/05/18
- Re: idn.el and confusables.txt, Stefan Monnier, 2011/05/14