Re: [patch suggestion] Mitigating the poor Emacs performance on huge org

emacs-orgmode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [patch suggestion] Mitigating the poor Emacs performance on huge org

From:	Ihor Radchenko
Subject:	Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Date:	Sat, 23 May 2020 21:53:42 +0800

The patch is attached

featuredrawertextprop-20200523.patch
Description: Text Data


Ihor Radchenko <address@hidden> writes:

> Hello,
>
> [The patch itself will be provided in the following email]
>
> I have five updates from the previous version of the patch:
>
> 1. I implemented a simplified version of element parsing to detect
> changes in folded drawers or blocks. No computationally expensive calls
> of org-element-at-point or org-element-parse-buffer are needed now.
>
> 2. The patch is now compatible with master (commit 2e96dc639). I
> reverted the earlier change in folding drawers and blocks. Now, they are
> back to using 'org-hide-block and 'org-hide-drawer. Using 'outline would
> achieve nothing when we use text properties.
>
> 3. 'invisible text property can now be nested. This is important, for
> example, when text inside drawers contains fontified links (which also
> use 'invisible text property to hide parts of the link). Now, the old
> 'invisible spec is recovered after unfolding.
>
> 4. Some outline-* function calls in org referred to outline-flag-region
> implementation, which is not in sync with org-flag-region in this patch.
> I have implemented their org-* versions and replaced the calls
> throughout .el files. Actually, some org-* versions were already
> implemented in org, but not used for some reason (or not mentioned in
> the manual). I have updated the relevant sections of manual. These
> changes might be relevant to org independently of this feature branch.
>
> 5. I have managed to get a working version of outline folding via text
> properties. However, that approach has a big downside - folding state
> cannot be different in indirect buffer when we use text properties. I
> have seen packages relying on this feature of org and I do not see any
> obvious way to achieve different folding state in indirect buffer while
> using text properties for outline folding.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on the new implementation for tracking changes:
>
>> Of course we can. It is only necessary to focus on changes that would
>> break the structure of the element. This does not entail a full parsing.
>
> I have limited parsing to matching beginning and end of a drawer/block.
> The basic functions are org--get-element-region-at-point,
> org--get-next-element-region-at-point, and org--find-elements-in-region.
> They are simplified versions of org-element-* parsers and do not require
> parsing everything from the beginning of the section.
>
> For now, I keep everything in org.el, but those simplified parsers
> probably belong to org-element.el.
>
>> If we can stick with `after-change-functions' (or local equivalent),
>> that's better. It is more predictable than `before-change-functions' and
>> alike.
>
> For now, I still used before/after-change-functions combination.
> I see the following problems with using only after-change-functions: 
>
> 1. They are not guaranteed to be called after every single change:
>
> From (elisp) Change Hooks:
> "... some complex primitives call ‘before-change-functions’ once before
> making changes, and then call ‘after-change-functions’ zero or more
> times"
>
> The consequence of it is a possibility that region passed to the
> after-change-functions is quite big (including all the singular changes,
> even if they are distant). This region may contain changed drawers as
> well and unchanged drawers and needs to be parsed to determine which
> drawers need to be re-folded.
>
>> And, more importantly, they are not meant to be used together, i.e., you
>> cannot assume that a single call to `before-change-functions' always
>> happens before calling `after-change-functions'. This can be tricky if
>> you want to use the former to pass information to the latter.
>
> The fact that before-change-functions can be called multiple times
> before after-change-functions, is trivially solved by using buffer-local
> changes register (see org--modified-elements). The register is populated
> by before-change-functions and cleared by after-change-functions.
>
>> Well, `before-change-fuctions' and `after-change-functions' are not
>> clean at all: you modify an unrelated part of the buffer, but still call
>> those to check if a drawer needs to be unfolded somewhere.
>
> 2. As you pointed, instead of global before-change-functions, we can use
> modification-hooks text property on sensitive parts of the
> drawers/blocks. This would work, but I am concerned about one annoying
> special case:
>
> -------------------------------------------------------------------------
> :BLAH: <inserted outside any of the existing drawers>
>
> <some text>
>
> :DRAWER: <folded>
> Donec at pede.
> :END:
> -------------------------------------------------------------------------
> In this example, the user would not be able to unfold the folder DRAWER
> because it will technically become a part of a new giant BLAH drawer.
> This may be especially annoying if <some text> is more than one screen
> long and there is no easy way to identify why unfolding does not work
> (with point at :DRAWER:).
>
> Because of this scenario, limiting before-change-functions to folded
> drawers is not sufficient. Any change in text may need to trigger
> unfolding.
>
> In the patch, I always register possible modifications in the
> blocks/drawers intersecting with the modified region + a drawer/block
> right next to the region.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on the nested 'invisible text property implementation.
>
> The idea is to keep 'invisible property stack push and popping from it
> as we add/remove 'invisible text property. All the work is done in
> org-flag-region.
>
> This was originally intended for folding outlines via text properties.
> Since using text properties for folding outlines is not a good idea,
> nested text properties have much less use. As I mentioned, they do
> preserve link fontification, but I am not sure if it worth it for the
> overhead to org-flag-region. Keeping this here mostly in the case if
> someone has any ideas how it can be useful.
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> More details on replaced outline-* -> org-* function calls.
>
> I have implemented org-* versions of the following functions:
>
> - outline-hide-entry
> - outline-hide-subtree
> - outline-hide-sublevels
> - outline-show-heading
> - outline-show-branches
>
> The org-* versions trivially use org-flag-region instead of
> outline-flag-region.
>
> Replaced outline-* calls where org- versions were already available:
>
> - outline-show-entry
> - outline-show-all
> - outline-show-subtree
>
> I reflected the new (including already available) functions in the
> manual and removed some defalias from org-compat.el where they are not
> needed. 
>
> -----------------------------------------------------------------------
> -----------------------------------------------------------------------
>
> Further work:
>
> 1. after-change-functions use org-hide-drawer/block-toggle to
> fold/unfold after modification. However, I just found that they call
> org-element-at-point, which slows down modifications in folded
> drawers/blocks. For example, realigning a long table inside folded
> drawer takes >1sec, while it is instant in the unfolded drawer.
>
> 2. org-toggle-custom-properties is terribly slow on large org documents,
> similarly to folded drawers on master. It should be relatively easy to
> use text properties there instead of overlays.
>
> 3. Multiple calls to before/after-change-functions is still a problem. I
> am looking into following ways to reduce this number:
>  - reduce the number of elements registered as potentially modified
>    + do not add duplicates to org--modified-elements
>    + do not add unfolded elements to org--modified-elements
>    + register after-change-function as post-command hook and remove it
>      from global after-change-functions. This way, it will be called
>      twice per command only.
>  - determine common region containing org--modified-elements. if change
>    is happening within that region, there is no need to parse
>    drawers/blocks there again.
>
> P.S.
>
>>> It was mostly an annoyance, because they returned different results on
>>> the same element. Specifically, they returned different :post-blank and
>>> :end properties, which does not sound right.
>>
>> OK. If you have a reproducible recipe, I can look into it and see what
>> can be done.
>
> Recipe to have different (org-element-at-point) and
> (org-element-parse-buffer 'element)
> -------------------------------------------------------------------------
> <point-min>
> :PROPERTIES:
> :CREATED:  [2020-05-23 Sat 02:32]
> :END:
>
>
> <point-max>
> -------------------------------------------------------------------------
>
>
> Best,
> Ihor
>
> Nicolas Goaziou <address@hidden> writes:
>
>> Hello,
>>
>> Ihor Radchenko <address@hidden> writes:
>>
>>>> As you noticed, using Org Element is a no-go, unfortunately. Parsing an
>>>> element is a O(N) operation by the number of elements before it in
>>>> a section. In particular, it is not bounded, and not mitigated by
>>>> a cache. For large documents, it is going to be unbearably slow, too.
>>>
>>> Ouch. I thought it is faster.
>>> What do you mean by "not mitigated by a cache"?
>>
>> Parsing starts from the closest headline, every time. So, if Org parses
>> the Nth element in the entry two times, it really parses 2N elements.
>>
>> With a cache, assuming the buffer wasn't modified, Org would parse
>> N elements only. With a smarter cache, with fine grained cache
>> invalidation, it could also reduce the number of subsequent parsed
>> elements.
>>
>>> The reason I would like to utilise org-element parser to make tracking
>>> modifications more robust. Using details of the syntax would make the
>>> code fragile if any modifications are made to syntax in future.
>>
>> I don't think the code would be more fragile. Also, the syntax we're
>> talking about is not going to be modified anytime soon. Moreover, if
>> folding breaks, it is usually visible, so the bug will not be unnoticed.
>>
>> This code is going to be as low-level as it can be.
>>
>>> Debugging bugs in modification functions is not easy, according to my
>>> experience.
>>
>> No, it's not. 
>>
>> But this is not really related to whether you use Element or not.
>>
>>> One possible way to avoid performance issues during modification is
>>> running parser in advance. For example, folding an element may
>>> as well add information about the element to its text properties.
>>> This will not degrade performance of folding since we are already
>>> parsing the element during folding (at least, in
>>> org-hide-drawer-toggle).
>>
>> We can use this information stored at fold time. But I'm not even sure
>> we need it.
>>
>>> The problem with parsing an element during folding is that we cannot
>>> easily detect changes like below without re-parsing.
>>
>> Of course we can. It is only necessary to focus on changes that would
>> break the structure of the element. This does not entail a full parsing.
>>
>>> :PROPERTIES: <folded>
>>> :CREATED: [2020-05-18 Mon]
>>> :END: <- added line
>>> :ID: test
>>> :END:
>>>
>>> or even
>>>
>>> :PROPERTIES:
>>> :CREATED: [2020-05-18 Mon]
>>> :ID: test
>>> :END: <- delete this line
>>>
>>> :DRAWER: <folded, cannot be unfolded if we don't re-parse after deletion>
>>> test
>>> :END:
>>
>> Please have a look at the "sensitive parts" I wrote about. This takes
>> care of this kind of breakage.
>>
>>> The re-parsing can be done via regexp, as you suggested, but I don't
>>> like this idea, because it will end up re-implementing
>>> org-element-*-parser.
>>
>> You may have misunderstood my suggestion. See below.
>>
>>> Would it be acceptable to run org-element-*-parser
>>> in after-change-functions?
>>
>> I'd rather not do that. This is unnecessary consing, and matching, etc.
>>
>>> If I understand correctly, it is not as easy.
>>> Consider the following example:
>>>
>>> :PROPERTIES:
>>> :CREATED: [2020-05-18 Mon]
>>> <region-beginning>
>>> :ID: example
>>> :END:
>>>
>>> <... a lot of text, maybe containing other drawers ...>
>>>
>>> Nullam rutrum.
>>> Pellentesque dapibus suscipit ligula.
>>> <region-end>
>>> Proin quam nisl, tincidunt et, mattis eget, convallis nec, purus.
>>>
>>> If the region gets deleted, the modification hooks from chars inside
>>> drawer will be called as (hook-function <region-beginning>
>>> <region-end>). So, there is still a need to find the drawer somehow to
>>> mark it as about to be modified (modification hooks are ran before
>>> actual modification).
>>
>> If we can stick with `after-change-functions' (or local equivalent),
>> that's better. It is more predictable than `before-change-functions' and
>> alike.
>>
>> If it is a deletion, here is the kind of checks we could do, depending
>> on when they are performed.
>>
>> Before actual changes :
>>
>>   1. The deletion is happening within a folded drawer (unnecessary step
>>      in local functions).
>>   2. The change deleted the sensitive line ":END:".
>>   3. Conclusion : unfold.
>>
>> Or, after actual changes :
>>
>>   1. The deletion involves a drawer.
>>   2. Text properties indicate that the beginning of the propertized part
>>      of the buffer start with org-drawer-regexp, but doesn't end with
>>      `org-property-end-re'. A "sensitive part" disappeared!
>>   3. Conclusion : unfold
>>
>> This is far away from parsing. IMO, a few checks cover all cases. Let me
>> know if you have questions about it.
>>
>> Also, note that the kind of change you describe will happen perhaps
>> 0.01% of the time. Most change are about one character, or a single
>> line, long.
>>
>>> The only difference between using modification hooks and
>>> before-change-functions is that modification hooks will trigger less
>>> frequently. 
>>
>> Exactly. Much less frequently. But extra care is required, as you noted
>> already.
>>
>>> Considering the performance of org-element-at-point, it is
>>> probably worth doing. Initially, I wanted to avoid it because setting a
>>> single before-change-functions hook sounded cleaner than setting
>>> modification-hooks, insert-behind-hooks, and insert-in-front-hooks.
>>
>> Well, `before-change-fuctions' and `after-change-functions' are not
>> clean at all: you modify an unrelated part of the buffer, but still call
>> those to check if a drawer needs to be unfolded somewhere.
>>
>> And, more importantly, they are not meant to be used together, i.e., you
>> cannot assume that a single call to `before-change-functions' always
>> happens before calling `after-change-functions'. This can be tricky if
>> you want to use the former to pass information to the latter.
>>
>> But I understand that they are easier to use than their local
>> counterparts. If you stick with (before|after)-change-functions, the
>> function being called needs to drop the ball very quickly if the
>> modification is not about folding changes. Also, I very much suggest to
>> stick to only `after-change-functions', if feasible (I think it is), per
>> above.
>>
>>> Moreover, these text properties would be copied by default if one uses 
>>> buffer-substring. Then, the hooks will also trigger later in the yanked
>>> text, which may cause all kinds of bugs.
>>
>> Indeed, that would be something to handle specifically. I.e.,
>> destructive modifications (i.e., those that unfold) could clear such
>> properties.
>>
>> It is more work. I don't know if it is worth the trouble if we can get
>> out quickly of `after-change-functions' for unrelated changes.
>>
>>> It was mostly an annoyance, because they returned different results on
>>> the same element. Specifically, they returned different :post-blank and
>>> :end properties, which does not sound right.
>>
>> OK. If you have a reproducible recipe, I can look into it and see what
>> can be done.
>>
>> Regards,
>>
>> -- 
>> Nicolas Goaziou
>
> -- 
> Ihor Radchenko,
> PhD,
> Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
> State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong 
> University, Xi'an, China
> Email: address@hidden, address@hidden

-- 
Ihor Radchenko,
PhD,
Center for Advancing Materials Performance from the Nanoscale (CAMP-nano)
State Key Laboratory for Mechanical Behavior of Materials, Xi'an Jiaotong 
University, Xi'an, China
Email: address@hidden, address@hidden

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers, (continued)

Prev by Date: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Next by Date: Re: org-babel-demarcate-block doesn't respect case preference
Previous by thread: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Next by thread: Re: [patch suggestion] Mitigating the poor Emacs performance on huge org files: Do not use overlays for PROPERTY and LOGBOOK drawers
Index(es):
- Date
- Thread