emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Long lines and bidi


From: Eli Zaretskii
Subject: Re: Long lines and bidi
Date: Mon, 11 Feb 2013 18:42:12 +0200

> Date: Mon, 11 Feb 2013 09:43:17 +0400
> From: Dmitry Antipov <address@hidden>
> CC: Eli Zaretskii <address@hidden>, Paul Eggert <address@hidden>
> 
> Yet another interesting profile (generated by scroll-both micro-benchmark with
> r111730) is shown below.
> 
> Input is 4K lines, each line is ~27K bytes, Imla'ei (modern Arabic) script.

Can you publish the file, or the URL where you downloaded it from?

> IIUC this R2L text with long lines should push bidi really hard,
> but... bidi core routines (by itself) are almost irrelevant in the
> profile:

Actually, that's expected, see below.

>      39.96%        emacs  emacs                          [.] scan_buffer
>      28.72%        emacs  emacs                          [.] 
> buf_charpos_to_bytepos
>      21.82%        emacs  emacs                          [.] 
> buf_bytepos_to_charpos
>       0.59%        emacs  emacs                          [.] 
> re_match_2_internal
>       0.51%        emacs  emacs                          [.] 
> sub_char_table_ref
>       0.42%        emacs  emacs                          [.] mark_object
>       0.23%        emacs  emacs                          [.] 
> composition_gstring_width
>       0.19%        emacs  libc-2.16.so                   [.] 
> __memcpy_ssse3_back
>       0.18%        emacs  emacs                          [.] x_produce_glyphs
>       0.17%        emacs  emacs                          [.] 
> move_it_in_display_line_to
>       0.17%        emacs  emacs                          [.] hash_lookup
>       0.17%        emacs  emacs                          [.] Fgarbage_collect
>       0.17%        emacs  emacs                          [.] lface_hash
>       0.16%        emacs  emacs                          [.] 
> decode_coding_utf_8
>       0.16%        emacs  emacs                          [.] face_for_font
>       0.16%        emacs  emacs                          [.] 
> composition_gstring_p
>       0.15%        emacs  emacs                          [.] compile_pattern
>       0.15%        emacs  emacs                          [.] 
> get_next_display_element
>       0.14%        emacs  emacs                          [.] 
> bidi_level_of_next_char
>       0.12%        emacs  emacs                          [.] font_range
>       0.12%        emacs  emacs                          [.] bidi_fetch_char
>       0.12%        emacs  emacs                          [.] internal_equal
>       0.11%        emacs  emacs                          [.] autocmp_chars
>       0.11%        emacs  emacs                          [.] char_table_ref
>       0.11%        emacs  libgtk-3.so.0.600.4            [.] 
> 0x0000000000115bf0
>       0.10%        emacs  emacs                          [.] 
> next_element_from_buffer
>       0.10%        emacs  emacs                          [.] 
> composition_update_it
>       0.10%        emacs  emacs                          [.] boyer_moore

The Arabic script is a heavy user of character compositions: they are
important for correct shaping of the glyphs, without which any speaker
of Arabic will turn away in disgust.  The fact that you see functions
like composition_update_it, composition_gstring_p,
composition_gstring_width, and sub_char_table_ref all hint towards
this.  Character compositions work by scanning the vicinity of a
composable character using regular expression matching in Lisp.  That
is why you see re_match_2_internal relatively high in the profile.
Handling these compositions can obscure any bidi reordering.  To
disable this factor, turn off auto-composition-mode.

More importantly, you cannot easily "push bidi really hard", not with
a file that consists of predominantly RTL characters.  That's because
such a file is as easy to display as a pure LTR text: the characters
are delivered for display entirely in their logical order in the
buffer, and only laid out starting at the right margin of the window
instead of at the left margin.

To exercise bidi.c, you need heavily mixed RTL and LTR text, with
digits, punctuation, and lots of embeddings and directional overrides
(using the LRE, RLE, RLO, and LRO control characters), which push and
pop the reordering stack.  Only then the reordering of characters will
become non-trivial, and you _might_ see some bidi functions as hot
spots.  I say "might" because bidi.c uses a dynamic cache which allows
it to fetch and analyze each character only once, even if reordering
jumps here and there like a young goat.  Thus, the only overhead of
reordering is the logic that decides where in the cache is the next
character to deliver for display; the cache is accessed directly (it
is implemented as a linear array).

There could be rare pathological situations where bidi.c needs to
examine lots (and I'm talking tens or hundreds of thousands) of
characters for some simple redisplay operation.  A few of these were
discovered and taken care of during late stages of v24.1 development,
but maybe there are some more.  These typically show up as heavy usage
of bidi_fetch_char or its subroutines, or of bidi_find_paragraph_start
and its subroutines.  I haven't seen such problems since last July.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]