[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wait_reading_process_ouput hangs in certain cases (w/ patches)

From: Matthias Dahl
Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Date: Thu, 26 Oct 2017 16:07:31 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

Hello Eli,

Thanks for taking the time to review the issue and patches. :-)

On 25/10/17 16:53, Eli Zaretskii wrote:

> I'm not sure I understand the situation where this happens; can you
> elaborate?

Sure. Let's take the Magit issue [1] as an example:

When committing, Magit prepares a COMMIT_MSG buffer and does some
process magic of its own which is pretty much irrelevant for this.

At some point during that, while we are already in an instance of
wait_reading_process_output (due to sit_for), the post-command-hooks are

And here things get interesting. Eventually flyspell-post-command-hook
is run which executes flyspell-word synchronously. That basically does
write out a word that needs to be checked to the spellchecker process,
waits for the results from stdin via accept-process-output and goes on.
Of special note here is that it a) specifies a wait_proc (spellchecker
process) and no timeout or whatsoever.

The output from the spellchecker is usually there instantaneously, so
that is actually unnoticeable, unless wait_reading_process_output, that
was invoked through that specific accept-process-output, decides to run
the timers.

And here comes the catch: At this point, usually the spellchecker output
is already available but not yet read. When the timers run, one of them
calls accept-process-output again which will read the entire available
output of the spellchecker process. Since there will be no more data on
that fd unless some interaction happens with the process, our original
call to accept-process-output/wait_reading_process_output will wait
endlessly for the data to become available (due to wait_proc being set
without a timeout).

Thus, it appears that Magit hangs while in truth, flyspell hangs waiting
for the spellchecker results to return that have already been read back.

The gist of it is: If we have an active wait_reading_process_output call
with a wait_proc set but no timeout that calls out to either timers or
filters, it is entirely possible that those directly or indirectly call
us again recursively, thus reading the output we are waiting for without
us ever noticing it, if no further output becomes available in addition
to what was read unnoticed... like it happens with flyspell.

That is what my patches fix: They simply add a bytes read metric to each
process structure that we can check for change at strategically relevant
points and decide if we got some data back that went unnoticed and break
out from wait_reading_process_output.

I know, flyspell should do its business asynchronously and also specify
a timeout since it is being run through hooks. Those are bugs by itself.
But I also think that wait_reading_process_output violates its contract
and is buggy in this regard as well, since it should properly function
even if it calls out to filters or timers -- and it clearly does not and
I would wager more hangs seen in the wild that weren't debugged, could
be attributed to this very bug.

I hope my rambling speech was somewhat helpful and clear and I could get
the problem described sufficiently without being too confusing. :-)

If there are any question marks left hanging over your head, please
don't hesitate to ask and I will try my best to clear them up -- but it
might end up being another longish mail, so be warned. ;)


[1] https://github.com/magit/magit/issues/2915

Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu

reply via email to

[Prev in Thread] Current Thread [Next in Thread]