bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41506: 28.0.50; RTL problem


From: Pip Cet
Subject: bug#41506: 28.0.50; RTL problem
Date: Sat, 06 Jun 2020 13:05:43 +0000
User-agent: Gnus/5.13 (Gnus v5.13)

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Pip Cet <pipcet@gmail.com>
>> Cc: 41506@debbugs.gnu.org
>> Date: Sat, 06 Jun 2020 07:58:24 +0000
>> 
>> when we're called with bidi_it->first_elt = true, it's possible we
>> shouldn't touch bidi_it->new_paragraph at all...
>
> Can you elaborate on why you think that?

Sorry, I shouldn't have said "touch" there. I meant "set", though I no
longer think so.

> first_elt can be set when we are at the beginning of a paragraph or
> when we are in the middle of it, so its meaning is different from that
> of new_paragraph.

Indeed.

>> +     paragraph might start.  But don't do that for the first
>> +     element since this function will be called twice in that
>> +     case.  */
>
> Which code causes the two calls, and why is that significant in this
> case?

Maybe this code would be clearer:

      if (!bidi_it->first_elt)
        {
          bytepos++;
          pos++;
        }

We always look at the paragraph containing the next character to be
loaded by bidi_level_of_next_char. If first_elt is set, that is the
current character; otherwise, it's the one after that.

In the "\n\nש" case, this happens:

1. bidi_paragraph_init is called with first_elt = 1 at buffer position 1
2. new_paragraph is cleared to false
3. bidi_at_paragraph_end is called for buffer position 2. That looks
like a line ending a paragraph, though it's actually a line starting the
next paragraph. Still, it returns true.
4. new_paragraph is set again
5. bidi_paragraph_init is called with first_elt = 0 at buffer position 1

So everything happens to work in this case, even though several of the
assumptions in the bidi code are violated.  The code is written to
assume paragraphs contain at least two characters: that assumption means
it's valid for bidi_paragraph_init to clear new_paragraph. In this case,
it's not, but the next line we're looking at, while not actually ending
a paragraph, looks like it is...

What I'm not sure about is "\n \nש". It could be either a single
two-line paragraph followed by ש, or a single-character paragraph
followed by another paragraph whose first line happens to contain only a
space character; in the first case, paragraph orientation would default
to L2R, in the second case, it would be R2L. Do you happen to know what
Unicode says for this case?

>From c5232df875d62ead326d5e90f122ab9ac9798e59 Mon Sep 17 00:00:00 2001
From: Pip Cet <pipcet@gmail.com>
Date: Sat, 6 Jun 2020 13:02:55 +0000
Subject: [PATCH] Handle buffers containing two newlines followed by an RTL
 char

* src/bidi.c (bidi_paragraph_init): Correct handling of initial
newlines.  (Bug#41506)
---
 src/bidi.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/bidi.c b/src/bidi.c
index 1017bd2d52..8aa325fe6d 100644
--- a/src/bidi.c
+++ b/src/bidi.c
@@ -1714,8 +1714,12 @@ bidi_paragraph_init (bidi_dir_t dir, struct bidi_it 
*bidi_it, bool no_default_p)
       s = (STRINGP (bidi_it->string.lstring)
           ? SDATA (bidi_it->string.lstring)
           : bidi_it->string.s);
-      if (bytepos > begbyte
-         && bidi_char_at_pos (bytepos, s, bidi_it->string.unibyte) == '\n')
+      /* We always look at the paragraph containing the next character
+        to be loaded by bidi_level_of_next_char.
+
+        This code happens to work for a buffer containing two
+        newlines followed by an RTL character (Bug#41506).  */
+      if (!bidi_it->first_elt)
        {
          bytepos++;
          pos++;
-- 
2.27.0.rc0


reply via email to

[Prev in Thread] Current Thread [Next in Thread]