Re: Indentation and gc

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Indentation and gc

From:	Ihor Radchenko
Subject:	Re: Indentation and gc
Date:	Sun, 12 Mar 2023 14:50:33 +0000
Eli Zaretskii <eliz@gnu.org> writes:

>> Well. I do realize that there should be a limit, which is why I put it
>> as 100Mb.
>
> Yes, but what is that 100 MiB number based on? any measurements of the
> time it takes to run GC behind that number? or just a more-or-less
> arbitrary value that "seems right"?

That's what I tried to explain below. At the end, I used Emacs Lisp
object usage divided by 10 and rounded down to hundreds. I did not try
to be precise - just accurate to orders of magnitude.

>> Strictly speaking, GC pauses scale with heap size.
>
> If by "heap size" you mean the total size of heap-allocated memory in
> the Emacs process, then this is inaccurate.  GC traverses only the
> Lisp objects, whereas Emacs also allocates memory from the heap for
> other purposes.  It also allocates memory from the VM outside of the
> "normal" heap -- that's where the buffer text memory usually comes
> from, as well as any large enough chunk of memory Emacs needs.

Thanks for the clarification.

>> Increasing GC threshold
>> will have two effects on the heap size: (1) thresholds lager than normal
>> heap size will dominate the GC time - Emacs will need to traverse all
>> the newly added data to be GCed;
>
> You seem to assume that GC traverses only the Lisp objects
> newly-allocated since the previous GC.  This is incorrect: it
> traverses _all_ of the Lisp objects, both old and new.

No, I am aware that GC traverses all the Lisp objects.
That's why I said that large threshold only increases GC time
significantly when the threshold is comparable to the heap size (part of
it containing Lisp objects). Otherwise, heap size mostly determines how
long it takes to complete a single GC.

>> (2) too large thresholds will cause heap fragmentation, also
>> increasing the GC times as the heap will expand.
>
> Not sure why do you think heap fragmentation increases monotonically
> with larger thresholds.  Maybe you should explain what you consider
> "heap fragmentation" for the purposes of this discussion.

See my other reply with my measurements of memory-limit vs.
gc-cons-threshold. I assume that this scaling will not be drastically
different even for different users. We can ask others to repeat my
measurements though.

>> I think that (2) is the most important factor for real world scenarios
>
> Not sure why you think so.  Maybe because I don't have a clear idea
> what kind of fragmentation you have in mind here.

I meant that as long as gc-cons-threshold is much lower (10x or so) than
heap size (Lisp object part), we do not need to worry about (1). Only
(2) remains a concern.

>> Emacs' default gives some lower safe bound on the threshold - it is
>> `gc-cons-percentage', defaulting to 1% of the heap size.
>
> Actually, the default value of gc-cons-percentage is 0.1, i.e. 10%.
> And it's 10% of the sum total of all live Lisp objects plus the number
> of bytes allocated for Lisp objects since the last GC.  Not 10% of the
> heap size.

Interesting. I thought that it is in percents.
Then, I have to mention that I intentionally reduced gc-cons-percentage
in my testing, which I detailed in my other message.

With Emacs defaults (0.1 gc-cons-percentage), I get:

memory-limit gcs-done gc-elapsed
526852 103 4.684100536

An equivalent of gc-cons-threshold = between 4Mb and 8Mb

10% also means that 800k gc-cons-threshold does not matter much even
with emacs -Q -- it uses over 8Mb memory and thus gc-cons-percentage
should dominate the GC, AFAIU.

Note that my proposed 100Mb gc-cons-threshold limit will correspond to
1Gb live Lisp objects. For reference, this is what I have now (I got the
data using memory-usage package):

   Total in lisp objects: 1.33GB (live 1.18GB, dead  157MB)

Even if Emacs uses several hundreds Mbs of Lisp objects (typical
scenario with third-party packages), my suggested gc-cons-threshold does
not look too risky yet reducing GC when loading init.el (when heap size
is still small).

> How large is what you call "heap size" in your production session, may
> I ask?

See the above.

>> AFAIU, routine throw-away memory allocation in Emacs is not directly
>> correlated with the memory usage - it rather depends on the usage
>> patterns and the packages being used. For example, it takes about 10
>> complex helm searches for me to trigger my 250Mb threshold - 25Mb per
>> helm command.
>
> This calculation is only valid if each of these 10 commands conses
> approximately the same amount of Lisp data.  If that is not so, you
> cannot really divide 250 MiB by 10 and claim that each command used up
> that much Lisp memory.  That's because GC is _not_ triggered as soon
> as Emacs crosses the threshold, it is triggered when Emacs _checks_
> how much was consed since last GC and discovers it consed more than
> the threshold.  The trigger for testing is unrelated to crossing the
> threshold.

Sure. I ran exactly same command repeatedly. Just to get an idea about
what is possible. Do not try to interpret my results as precise - they
are just there to provide some idea about the orders of magnitude for
the allocated memory.

>> To get some idea about the impact of gc-cons-threshold on memory
>> fragmentation, I compared the output of `memory-limit' with 250Mb vs.
>> default 800kb threshold:
>> 
>>  250Mb threshold - 689520 kb memory
>>  800kb threshold - 531548 kb memory
>> 
>> The memory usage is clearly increased, but not catastrophically, despite
>> using rather large threshold.
>> 
>> Of course, it is just init.el, which is loaded once.
>
> Correction: it is _your_ init.el.  We need similar statistics from
> many users and many different usage patterns; only then we will be
> able to draw valid conclusions.

Sure. Should we formally try to call for such benchmarks?

>> Memory fragmentation as a result of routine Emacs usage may cause
>> more significant memory usage increase.
>
> Actually, Emacs tries very hard to avoid fragmentation.  That's why it
> compacts buffers, and that's why it can relocate buffer text and
> string data.

Indeed. But despite all of the best efforts, fragmentation increases if
we delay GCs, right?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Indentation and gc, (continued)
Prev by Date: Question on getting tree-sitter matches
Next by Date: Re: Indentation and gc
Previous by thread: Re: Indentation and gc
Next by thread: Re: Indentation and gc
Index(es):
- Date
- Thread