[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Mixed L2R and R2L paragraphs and horizontal scroll

From: Eli Zaretskii
Subject: Re: Mixed L2R and R2L paragraphs and horizontal scroll
Date: Tue, 02 Feb 2010 21:30:54 +0200

> Date: Tue, 02 Feb 2010 09:08:40 +0100
> From: martin rudalics <address@hidden>
> CC: address@hidden, address@hidden
>  > I already implemented such a feature: a per-buffer variable that
>  > forces all paragraphs to be either L2R or R2L.  A value of `nil' means
>  > the direction of each paragraph is dynamically determined by applying
>  > the rules described in the Unicode Standard Annex 9 (UAX#9).
> I meant a function which does (1) set such a variable

You mean, besides "M-x set-variable RET"?

> and (2) apply it to one or all windows showing a buffer.

Currently, the variable is per-buffer, so it affects all the windows
showing that buffer.  Why would one need to do that only in some
windows showing a buffer?

> Calling this function would temporarily override any L2R/R2L
> specifications specified for a file, buffer, or paragraph.

There are no specifications for a file (unless you set the variable
I'm talking about in file's local variables section).  As for
individual paragraphs, control of their base direction is not by some
Emacs setting, but by inserting special formatting characters at the
beginning of each paragraph.  These characters (LRM and RLM) are
supposed to be invisible by default, i.e. displayed as zero-width
space, but they have strong directionality, L for LRM and R for RLM.
Since UAX#9 says that a paragraph's base direction is determined by
its first strong directional character, each one of these two
characters sets the paragraph direction according to directionality of
the character.

It would be easy enough to write a command that inserts LRM or RLM at
the beginning of each paragraph in a buffer or region.  But that's
application level, and I still have a lot of turf to cover before I
get to that.

> BTW, do UAX#9 paragraphs require new definitions for `paragraph-start'
> or `paragraph-separate'?

It does:

   Paragraphs are divided by the Paragraph Separator or appropriate
   Newline Function [...].  Paragraphs may also be determined by
   higher-level protocols: for example, the text in two different
   cells of a table will be in different paragraphs.

and the table of Bidirectional Character Types says that a Paragaraph
Separator type is assigned to the following characters:

   Paragraph separator, appropriate Newline Functions, higher-level
   protocol paragraph determination

Accordingly, in the Unicode Database, the characters CR and LF
(a.k.a. NL) that normally separate lines have the Paragraph Separator
(B) type.

This could sound like a disaster (each line being a separate
paragraph), since Emacs uses hard newlines to fill paragraphs.
Fortunately, UAX#9 leaves a fire escape: it says (see above) that
paragraphs can also be determined by ``higher-level protocols''.  I
used this fire escape to preserve the normal Emacs notion of a
paragraph, including the usual sense of `paragraph-start' and
`paragraph-separate'.  For instance the code that determines the base
direction of each paragraph looks back for a position that matches
`paragraph-start', and then finds the first strong directional
character after that.

So UAX#9 does define a default for paragraph start that is different
from Emacs, but gives us a way to preserve ours.  Which we did.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]