bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41506: 28.0.50; RTL problem


From: Eli Zaretskii
Subject: bug#41506: 28.0.50; RTL problem
Date: Sat, 06 Jun 2020 16:45:38 +0300

> From: Pip Cet <pipcet@gmail.com>
> Cc: 41506@debbugs.gnu.org
> Date: Sat, 06 Jun 2020 13:05:43 +0000
> 
> >> +   paragraph might start.  But don't do that for the first
> >> +   element since this function will be called twice in that
> >> +   case.  */
> >
> > Which code causes the two calls, and why is that significant in this
> > case?
> 
> Maybe this code would be clearer:
> 
>       if (!bidi_it->first_elt)
>       {
>         bytepos++;
>         pos++;
>       }

Could be, let's see what is the conclusion of this discussion.

> In the "\n\nש" case, this happens:
> 
> 1. bidi_paragraph_init is called with first_elt = 1 at buffer position 1
> 2. new_paragraph is cleared to false
> 3. bidi_at_paragraph_end is called for buffer position 2. That looks
> like a line ending a paragraph, though it's actually a line starting the
> next paragraph. Still, it returns true.
> 4. new_paragraph is set again
> 5. bidi_paragraph_init is called with first_elt = 0 at buffer position 1

I minor correction to item 3: the second newline in this example is
handled as belonging to the previous paragraph.  You can see that by
examining the behavior of RIGHT and LEFT arrow keys: they behave
differently in R2L and L2R paragraphs.

> What I'm not sure about is "\n \nש". It could be either a single
> two-line paragraph followed by ש, or a single-character paragraph
> followed by another paragraph whose first line happens to contain only a
> space character; in the first case, paragraph orientation would default
> to L2R, in the second case, it would be R2L. Do you happen to know what
> Unicode says for this case?

It's not Unicode in this case, it's Emacs.  If UAX#9 is read and
followed strictly, then each \n ends a paragraph and begins a new one.
IOW, every physical line is a separate paragraph.  This is a direct
consequence of Newline's bidi class being B (paragraph separator):

  (get-char-code-property #x0a 'bidi-class) => B

(as mandated by 3.2 in UAX#9), and of rules P1--P3 in UAX#9.

However, since in Emacs the usual case is that hard newlines are used
to fill text, the default UAX#9 behavior would make no sense, as a
line that happens to start with a R2L character would be rendered
right-to-left, even if the previous line wasn't.  It would produce a
randomly jagged display of paragraphs that mix L2R and R2L characters
just because a line was broken in a different place by filling.

So we use the "higher-level protocols" fire escape (see 4.3 in UAX#9)
and define a "paragraph" differently, for the purposes of base
paragraph direction: we by default require that paragraphs be
separated by empty lines, see bidi-paragraph-separate-re.  Thus, the
above example by default treats the " \n" line as a paragraph
separator, and the ש after it as the start of a new paragraph.

(For completeness, we do support the strict interpretation of UAX#9:
if you set both bidi-paragraph-start-re and bidi-paragraph-separate-re
to "^", you get that.  Any code changes we come up with here must
therefore be tested at least with those settings as well.)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]