[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

wait_reading_process_ouput hangs in certain cases (w/ patches)

From: Matthias Dahl
Subject: wait_reading_process_ouput hangs in certain cases (w/ patches)
Date: Tue, 24 Oct 2017 20:52:20 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

Hello all,

recursively calling accept-process-output with the first call waiting
for the output of a specific process, can hang Emacs (as-in: it is
waiting forever but can be canceled of course) even though it should
not be the case since data was read.

This is actually not all that uncommon. One example of this is a hang
seen with Magit opening its COMMIT_MSG buffer, reported here [1]. I've
myself run into this problem continuously which is why I started to
debug it in the first place.

The hang with Magit happens in flyspell.el which waits for output from
its spellchecker process through accept-process-output and specifies
that specific process as wait_proc. Now depending on timing (race),
wait_reading_process_output can call the pending timers... which in
turn can call accept-process-output again. This almost always leads
to the spellchecker output being read back in full, so there is no
more data left to be read. Thus the original accept-process-output,
which called wait_reading_process_output, will wait for the data to
become available forever since it has no way to know that those have
already been read.

Naturally one could argue that a timeout should be specified when
calling accept-process-output. But nevertheless I still think this is
a breach of contract. The caller expects accept-process-output to
return as promised when data has been read from the specified process
and it clearly isn't always the case and can hang forever, depending
on timing and the specifics of the data being read.

The attached patches fix this by introducing process output read
accounting -- simply counting the bytes read per process. And using
that data to strategically check in wait_reading_process_output for
data being read while we handed over control to timers and/or filters.

I haven't seen any ill side-effects in my tests and it clearly fixes
the problem seen in [1] as well as I would wager quite a few others
that were probably seen by user's of all kinds of setups that seemed
unpredictable and mysterious and never debugged.

As a side-note: I decided against an artificial metric and went with
simply counting the bytes read, since this does come in handy when
doing debugging and being able to see how much data was read from a
process during specific time intervals.

Also, this still leaves the possibility that wait_reading_process_output
could wait forever while being called without wait_proc and a timeout
set. This could be mitigated as well by some sort of a tick counter that
only increases when no wait_proc was specified and data from processes
were processed. I decided against implementing that for now since imho
the chances of that happening are marginal, if at all present. OTOH,
the semantics in that case are not all that clear and would further add
complexity to an already rather unhealthy function. I am naturally open
to other opinions and implementing this as well if requested. :-)

Any suggestions and/or comments are very welcome, as always.

Thanks for the patience to read this longish post. :-)

So long,

[1] https://github.com/magit/magit/issues/2915

Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu

Attachment: 0001-Add-process-output-read-accounting.patch
Description: Text Data

Attachment: 0002-src-process.c-wait_reading_process_output-Fix-wait_p.patch
Description: Text Data

reply via email to

[Prev in Thread] Current Thread [Next in Thread]