[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#41506: 28.0.50; RTL problem
From: |
Eli Zaretskii |
Subject: |
bug#41506: 28.0.50; RTL problem |
Date: |
Sat, 06 Jun 2020 16:45:38 +0300 |
> From: Pip Cet <pipcet@gmail.com>
> Cc: 41506@debbugs.gnu.org
> Date: Sat, 06 Jun 2020 13:05:43 +0000
>
> >> + paragraph might start. But don't do that for the first
> >> + element since this function will be called twice in that
> >> + case. */
> >
> > Which code causes the two calls, and why is that significant in this
> > case?
>
> Maybe this code would be clearer:
>
> if (!bidi_it->first_elt)
> {
> bytepos++;
> pos++;
> }
Could be, let's see what is the conclusion of this discussion.
> In the "\n\nש" case, this happens:
>
> 1. bidi_paragraph_init is called with first_elt = 1 at buffer position 1
> 2. new_paragraph is cleared to false
> 3. bidi_at_paragraph_end is called for buffer position 2. That looks
> like a line ending a paragraph, though it's actually a line starting the
> next paragraph. Still, it returns true.
> 4. new_paragraph is set again
> 5. bidi_paragraph_init is called with first_elt = 0 at buffer position 1
I minor correction to item 3: the second newline in this example is
handled as belonging to the previous paragraph. You can see that by
examining the behavior of RIGHT and LEFT arrow keys: they behave
differently in R2L and L2R paragraphs.
> What I'm not sure about is "\n \nש". It could be either a single
> two-line paragraph followed by ש, or a single-character paragraph
> followed by another paragraph whose first line happens to contain only a
> space character; in the first case, paragraph orientation would default
> to L2R, in the second case, it would be R2L. Do you happen to know what
> Unicode says for this case?
It's not Unicode in this case, it's Emacs. If UAX#9 is read and
followed strictly, then each \n ends a paragraph and begins a new one.
IOW, every physical line is a separate paragraph. This is a direct
consequence of Newline's bidi class being B (paragraph separator):
(get-char-code-property #x0a 'bidi-class) => B
(as mandated by 3.2 in UAX#9), and of rules P1--P3 in UAX#9.
However, since in Emacs the usual case is that hard newlines are used
to fill text, the default UAX#9 behavior would make no sense, as a
line that happens to start with a R2L character would be rendered
right-to-left, even if the previous line wasn't. It would produce a
randomly jagged display of paragraphs that mix L2R and R2L characters
just because a line was broken in a different place by filling.
So we use the "higher-level protocols" fire escape (see 4.3 in UAX#9)
and define a "paragraph" differently, for the purposes of base
paragraph direction: we by default require that paragraphs be
separated by empty lines, see bidi-paragraph-separate-re. Thus, the
above example by default treats the " \n" line as a paragraph
separator, and the ש after it as the start of a new paragraph.
(For completeness, we do support the strict interpretation of UAX#9:
if you set both bidi-paragraph-start-re and bidi-paragraph-separate-re
to "^", you get that. Any code changes we come up with here must
therefore be tested at least with those settings as well.)