bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#58070: [PATCH] Add tamil99 input method


From: Eli Zaretskii
Subject: bug#58070: [PATCH] Add tamil99 input method
Date: Tue, 27 Sep 2022 09:23:20 +0300

> Cc: 58070@debbugs.gnu.org
> From: Arun Isaac <arunisaac@systemreboot.net>
> Date: Tue, 27 Sep 2022 02:25:28 +0530
> 
> > This has the advantage that you can insert the vowel sign for any
> > consonant out-of-sequence i.e., you can say h j BACKSPACE s
> > to insert கி (and so do other rules).
> 
> I agree. Your imperative approach does have this advantage. But, it
> comes at the price of having to inspect the buffer at (point). The
> declarative approach does not need to inspect the buffer at all since it
> merely composes sequential keystrokes and doesn't know anything about
> what's already on the buffer. I personally think buffer inspection is a
> lot of code complexity for a simple input method like tamil99, but
> perhaps Eli should take a call on this.

I don't think I understand what you are talking about (I'm not an
expert on Quail).  Does this complexity slow down the input
noticeably?  Does it make the code much harder to understand, even if
you put enough comments there to explain what's going on?  If not,
then I don't think the added complexity should be a problem, and you
should decide based on other aspects.

And as I said earlier, we could have two input methods for Tamil, so
we don't necessarily have to decide which of the two is better.

> Also, while the out-of-sequence vowel insertion is a very clever
> feature, it shouldn't be required at all if we handled grapheme cluster
> boundaries correctly. See
> https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

Well, we do, that's why cursor motion moves by grapheme clusters,
right?  Also, see below.

> Let me explain with a latin example for the benefit of non-Tamil
> readers. Suppose we had:
> 
> g̀|
> 
> where | is the position of the cursor. Now, if we press backspace, the
> entire g+grave accent grapheme cluster should be deleted. But, what
> actually happens is that the grave accent alone is deleted and we are
> left with a 'g' like so:
> 
> g|
> 
> A similar thing happens in Tamil. Now, based on user expectation, this
> may be acceptable in some languages. But, in Tamil, it is quite contrary
> to user expectation. If I have
> 
> கி|
> 
> and press backspace, I get:
> 
> க|
> 
> But, I want the whole "user-perceived character" (கி) deleted like so:
> 
> |

There's a problem with the above: in some situations you want deletion
by codepoints, in others you want deletion by grapheme clusters.  (It
is possible that with Tamil the former is rarely the case, but it is
definitely a frequent case with other scripts, in particular with
those that have diacriticals.)  Emacs 29 solves this by having
delete-forward-char, which is usually bound to the <Delete> key,
delete by grapheme clusters, while DEL (which deletes backward) and
C-d delete individual codepoints.  The primary motivation for DEL to
delete by codepoints is that it allows you to make sub-grapheme
corrections to stuff you just typed, for example if you typed an
incorrect accent.

Emacs 29 also has the composition-break-at-point variable, which you
could set non-nil, in which case <Delete> will also work by
codepoints.  So perhaps the out-of-sequence vowel insertion would be
possible without further complications if composition-break-at-point
is non-nil?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]