bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#39413: 26.2; Emacs gets hung


From: ISHIKAWA,chiaki
Subject: bug#39413: 26.2; Emacs gets hung
Date: Wed, 29 Apr 2020 06:36:46 +0900
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0

Hi,

I am back to this strange guru meditation of emacs.
Emacs would eventually begin working, but the silent period where no key
or mouse event is acknowledged is rather longish.

Several weeks after I installed GNU Emacs 28.0.50 (that is what
|emacs-version| variable says. I installed from git repo.) , I began
noticing the occasional long pause (no response) again.

That is noticeable when I was running a mozilla thunderbird build
script in *shell* buffer. (M-X shell RETURN)
Obviously, the large log may cause some resource issues, but, say
the year before, I might have gotten the warning about memory running
low (I can't recall the exact warning. It asked something about "Am I
sure to do the operation?", like for example, when I invoked
|erase-buffer| to the very large *shell* buffer that contains many log
lines. Emacs responded something about dire situation and if I was
really invoking the function, and not by mistake, etc.
However, it never got seemingly hung for like 10-15 seconds.

Yesterday, this long pause happened again, and so I attached gdb to
the running emacs process and the following is what I found.

Basically, sweep_conses() takes unusually long time.
(It is possible that by the time I noticed the long hung and attached gdb, the major
portion of the problematic code had already been executed.)

This is the first stac ktrace when I attach gdb to emacs.
So it looks that it was trying to free cons cells.

#0  lisp_align_free (block=<optimized out>) at alloc.c:1266
#1  0x000055bc512500f2 in sweep_conses () at alloc.c:6907
#2  0x000055bc51255b27 in gc_sweep () at alloc.c:7134
#3  garbage_collect () at alloc.c:6105
#4  0x000055bc51255ed1 in maybe_garbage_collect () at alloc.c:5968
#5  0x000055bc51271785 in maybe_gc () at lisp.h:5068
#6  Ffuncall (nargs=4, args=args@entry=0x7fff92901a70) at eval.c:2778
#7  0x000055bc512a5838 in exec_byte_code
    (bytestr=<optimized out>, vector=<optimized out>, maxdepth=<optimized out>, args_template=<optimized out>, nargs=<optimized out>, args=<optimized out>) at bytecode.c:633 #8  0x000055bc51271707 in Ffuncall (nargs=4, args=args@entry=0x7fff92901f38) at eval.c:2808

    ... omitted ...

I invoked "finish" command to gdb. (A very handy command, indeed.)

finish
Run till exit from #0  lisp_align_free (block=<optimized out>) at alloc.c:1266
0x000055bc512500f2 in sweep_conses () at alloc.c:6907
6907              lisp_align_free (cblk);

 where
#0  0x000055bc512500f2 in sweep_conses () at alloc.c:6907
#1  0x000055bc51255b27 in gc_sweep () at alloc.c:7134
#2  garbage_collect () at alloc.c:6105
#3  0x000055bc51255ed1 in maybe_garbage_collect () at alloc.c:5968
    ... omitted ...

Another finish

Run till exit from #0  0x000055bc512500f2 in sweep_conses () at alloc.c:6907
gc_sweep () at alloc.c:7135
7135      sweep_floats ();
(gdb)


sweep_conses() took rather a long time. Maybe five seconds or so. (Could
be longer almost 10 seconds...)

Another finish.

 sweep_floats ();
(gdb) finish
Run till exit from #0  gc_sweep () at alloc.c:7135
garbage_collect () at alloc.c:7141
7141      check_string_bytes (!noninteractive);
(gdb)

sweep_float() ends quickly.

Another finish.

finish
Run till exit from #0  garbage_collect () at alloc.c:7141
0x000055bc51271785 in maybe_gc () at lisp.h:5068
5068        maybe_garbage_collect ();
(gdb)

check_string_bytes finished quickly.

finish
Run till exit from #0  0x000055bc51271785 in maybe_gc () at lisp.h:5068
deliver_input_available_signal (sig=29) at keyboard.c:7198
7198      deliver_process_signal (sig, handle_input_available_signal);
(gdb) Quit
(gdb) where

The above one took a fraction of a second. It was noticeable.
finish
Run till exit from #0  deliver_input_available_signal (sig=29) at keyboard.c:7198
<signal handler called>
(gdb) finish
Run till exit from #0  <signal handler called>
0x000055bc51271785 in maybe_gc () at lisp.h:5068
5068        maybe_garbage_collect ();
(gdb)

immediate return

Well, actually after a few more finish (which completed immediately),
emacs was responsive again.

I had a few similar experiences yesterday.


My observation:

So my take on this is sweep_conses() takes a rather long time and not
letting emacs to take any input. (And there could be other functions
before it.)

Question.: Has there been any change in emacs's alloc.c code in the
last year or two?  (Even an seemingly innocuous change might have
introduced a very bad CPU cache behavior that makes sweep_conses() to
perform badly. Or maybe it is now allowed to run without
time-constraint?)

I suspect that since the apparent long hung occurred during the time
when |*shell*| buffer was accumulating many lines from a largish
running build process, maybe the string allocation was not catching
up and causing excessive GC. In that case, the ordering of reclaiming
functions could be arranged to find string first?
(Addition to the buffer of lines read from a child process is
basically an insertion at the end of the buffer of strings, correct?)

OR maybe I am hitting the age old adagio of "GC considered
harmful". [1] Simply my virtual space is too large to be handled
efficiently.  I remember reading the paper the first time it appeared,
but I never thought it would hit me on my home PC. (The PC at the time
had 64KB-128KB of memory).

I am running emacs under Debian GNU/linux inside VirtualBox which is
assigned 16GB memory.

My suggestion:

I would think sweep_conses() may want to quit early, if possible, after
obtaining certain amount of space (or for a casual GC invoked
periodically, may want to quit after one second or something like
that. That is, we may want to set a elapsed time limit.)
In a really tight situation, we can invoke the reclaiming functions
with a flag telling them to use as much CPU/Elapsed time to reclaim as many octets as possible,
but for a periodic GC, it might make sense to return half-heartedly.

Oh, wait, have I set a variable to control periodic GC and forgot
about it?

I checked. I don't think I tweaked the following variables. C-H k output.

gc-cons-threshold is a variable defined in ‘C source code’.
Its value is 800000

gc-cons-percentage is a variable defined in ‘C source code’.
Its value is 0.1

As of now in a different incarnation of emacs process (this is a
differnt process from the one which I attached gdb.),

gc-elapsed is a variable defined in ‘C source code’.
Its value is 68.3795433

Hope this helps us shedding some light on the guru meditation of emacs
probably due to an unusually longish GC.

TIA

Chiaki

PS: Given that this may have something to do with the paging activity
of the emacs process, the linux version change (from 4 to 5) may have
affected the paging behavior of the program, but it is hard to say.  I
now see this linux instance is using 293M SWAP.  Compiling mozilla
thunderbird source runs |make -j6| and g++-9 compiles CPP source files
which are very large and so they exert heavy memory pressure. A bad
memory access pattern of emacs, not friendly to locality of access,
would add to page faults. Except, |xosview| program does not show any
paging activity. I am not sure if |xosview| is correct or not.


[1] "Address/memory management for a gigantic LISP environment or,
     GC considered harmful",  Jon L. White,
 LFP '80: Proceedings of the 1980 ACM conference on LISP and functional programming,
August 1980 Pages 119–127
https://doi.org/10.1145/800087.802797
https://dl.acm.org/doi/10.1145/1317203.1317206


On 2020/02/23 2:53, chiaki-ishikawa-thunderbird-account wrote:
Thank you. I will taken note of "finish" and see if I can observe the return from alloc.c code
if I get to see the next obvious hung.

Chiaki

On 2020/02/22 2:27, Noam Postavsky wrote:
tags 39413 + unreproducible
quit

chiaki-ishikawa-thunderbird-account <chiaki.ishikawa@ubin.jp> writes:

I printed stack backtrace and Emacs seemed to be in memory allocator.
Then I let it continue.
Still the emacs screen was locked up.
So I interrupted the execution, and printed stack backtrace.
Again Emacs was in memory allocator.
Eventually I gave up and killed emacs.

It seems that garbage collection or
routine in alloc.c was looping.
That's not enough evidence to show that the garbage collection was
looping.  There could be some higher level loop that does a lot of
allocation, so that if you stop at some random point you would have a
high probability of stopping in the alloc.c.  If you catch this in gdb
again, try running the 'finish' a few times to see if it can leave the
alloc.c code.

This is not a repeatable bug. I have no idea how to reproduce this. It
occurs every now and then.









reply via email to

[Prev in Thread] Current Thread [Next in Thread]