[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Examples of concurrent coproc usage?
From: |
Carl Edquist |
Subject: |
Re: Examples of concurrent coproc usage? |
Date: |
Thu, 14 Mar 2024 04:58:48 -0500 (CDT) |
[My apologies up front for the length of this email. The short story is I
played around with the multi-coproc support: the fd closing seems to work
fine to prevent deadlock, but I found one bug apparently introduced with
multi-coproc support, and one other coproc bug that is not new.]
On Mon, 11 Mar 2024, Zachary Santer wrote:
Was "RFE: enable buffering on null-terminated data"
On Mon, Mar 11, 2024 at 7:54 AM Carl Edquist <edquist@cs.wisc.edu> wrote:
(Kind of a side-note ... bash's limited coprocess handling was a long
standing annoyance for me in the past, to the point that I wrote a bash
coprocess management library to handle multiple active coprocess and
give convenient methods for interaction. Perhaps the trickiest bit
about multiple coprocesses open at once (which I suspect is the reason
support was never added to bash) is that you don't want the second and
subsequent coprocesses to inherit the pipe fds of prior open
coprocesses. This can result in deadlock if, for instance, you close
your write end to coproc1, but coproc1 continues to wait for input
because coproc2 also has a copy of a write end of the pipe to coproc1's
input. So you need to be smart about subsequent coprocesses first
closing all fds associated with other coprocesses.
https://lists.gnu.org/archive/html/help-bash/2021-03/msg00296.html
https://lists.gnu.org/archive/html/help-bash/2021-04/msg00136.html
Oh hey! Look at that. Thanks for the links to this thread - I gave them
a read (along with the old thread from 2011-04). I feel a little bad I
missed the 2021 discussion.
You're on the money, though there is a preprocessor directive you can
build bash with that will allow it to handle multiple concurrent
coprocesses without complaining: MULTIPLE_COPROCS=1.
Who knew! Thanks for mentioning it. When I saw that "only one active
coprocess at a time" was _still_ listed in the bugs section in bash 5, I
figured multiple coprocess support had just been abandoned. Chet, that's
cool that you implemented it.
I kind of went all-out on my bash coprocess management library though
(mostly back in 2014-2016) ... It's pretty feature-rich and pleasant to
use -- to the point that I don't think there is any going-back to bash's
internal coproc for me, even with multiple coprocess are support. I
implemented it with shell functions, so it doesn't rely on compiling
anything or the latest version of bash being present. (I even added bash3
support for older systems.)
Chet Ramey's sticking point was that he hadn't seen coprocesses used
enough in the wild to satisfactorily test that his implementation did in
fact keep the coproc file descriptors out of subshells.
To be fair coproc is kind of a niche feature. But I think more people
would play with it if it were less awkward to use and if they felt free to
experiment with multiple coprocs.
By the way, I agree with the Chet's exact description of the problems
here:
https://lists.gnu.org/archive/html/help-bash/2021-03/msg00282.html
The issue is separate from the stdio buffering discussion; the issue here
is with child processes (and I think not foreground subshells, but
specifically background processes, including coprocesses) inheriting the
shell's fds that are open to pipes connected to an active coprocess.
Not getting a sigpipe/write failure results in a coprocess sitting around
longer than it ought to, but it's not obvious (to me) how this leads to
deadlock, since the shell at least has closed its read end of the pipe to
that coprocess, so at least you aren't going to hang trying to read from
it.
On the other hand, a coprocess not seeing EOF will cause deadlock pretty
readily, especially if it processes all its input before producing output
(as with wc, sort, sha1sum). Trying to read from the coprocess will hang
indefinitely if the coprocess is still waiting for input, which is the
case if there is another copy of the write end of its read pipe open
somewhere.
If you've got examples you can direct him to, I'd really appreciate it.
[My original use cases for multiple coprocesses were (1) for
programmatically interacting with multiple command-line database clients
together, and (2) for talking to multiple interactive command-line game
engines (othello) to play each other.
Perl's IPC::Open2 works, too, but it's easier to experiment on the fly in
bash.
And in general having the freedom to play with multiple coprocesses helps
mock up more complicated pipelines, or even webs of interconnected
processes.]
But you can create a deadlock without doing anything fancy.
Well, *without multi-coproc support*, here's a simple wc example; first
with a single coproc:
$ coproc WC { wc; }
$ exec {WC[1]}>&-
$ read -u ${WC[0]} X
$ echo $X
0 0 0
This works as expected.
But if you try it with a second coproc (again, without multi-coproc
support), the second coproc will inherit copies of the shell's read and
write pipe fds to the first coproc, and the read will hang (as described
above), as the first coproc doesn't see EOF:
$ coproc WC { wc; }
$ coproc CAT { cat; }
$ exec {WC[1]}>&-
$ read -u ${WC[0]} X
# HANGS
But, this can be observed even before attempting the read that hangs.
You can 'ps' to see the user shell (bash), the coprocs' shells (bash), and
the coprocs' commands (wc & cat). Then 'ls -l /proc/PID/fd/' to see what
they have open:
- The user shell has its copies of the read & write fds open for both
coprocs (as it should)
- The coproc commands (wc & cat) each have only a single read & write pipe
open, on fd 0 & 1 (as they should)
- The first coproc's shell (WC) has only a single read & write pipe open,
on fd 0 & 1 (as it should)
- The second coproc's shell (CAT) has its own read & write pipes open, on
fd 0 & 1 (good), but it also has a copy of the user shell's read & write
pipe fds to the first coproc (WC) open (on fd 60 & 63 in this case, which
it inherited when forking from the user shell)
(And in general, latter coproc shells will have stray copies of the user
shell's r/w ends from all previous coprocs.)
So, you can examine the situation after setting up coprocs, to see if all
the coproc-related processes have just two pipes open (on fd 0 & 1). If
this is the case, I think that suffices to convince me anyway that no
deadlocks related to stray open fds can happen. But if any of them has
other pipes open (inherited from the user shell), that indicates the
problem.
I tried compiling the latest bash with MULTIPLE_COPROCS=1 (version
5.2.21(1)) to test out the multi-coproc support.
I tried standing up the above WC and CAT coprocs, together with some
others to check that the behavior looked ok for pipelines also (which I
think was one of Chet's concerns)
$ coproc WC { wc; }
$ coproc CAT { cat; }
$ coproc CAT3 { cat | cat | cat; }
$ coproc CAT4 { cat | cat | cat | cat; }
$ coproc CATX { cat ; }
And as far as the fd situation, everything checks out: the user shell has
fds open to all the coprocs, and the coproc shells & coproc commands
(including all the cat's in the pipelines) have only a single read & write
pipe open on fd 0 & 1. So, the multi-coproc code seems to be closing the
shell's copies correctly.
[The examples are boring, but their point is just to investigate the
stray-fd question.]
HOWEVER!!!
Unexpectedly, the new multi-coproc code seems to close the user shell's
end of a coprocess's pipes, once the coprocess has terminated. When
compiled with MULTIPLE_COPROCS=1, this is true even if there is only a
single coproc:
$ coproc WC { wc; }
$ exec {WC[1]}>&-
[1]+ Done coproc WC { wc; }
# WC var gets cleared!!
# shell's ${WC[0]} is also closed!
# now, can't do:
$ read -u ${WC[0]} X
$ echo $X
I'm attaching a "bad-coproc-log.txt" with more detailed ps & ls output
examining the open fds at each step, to make it clear what's happening.
This is a bug. The shell should not automatically close its read pipe to
a coprocess that has terminated -- it should stay open to read the final
output, and the user should be responsible for closing the read end
explicitly.
This is more obvious for commands that wait until they see EOF before
generating any output (wc, sort, sha1sum). But it's also true for any
command that produces output (filters (sed) or generators (ls)). If the
shell's read end is closed automatically, any final output waiting in the
pipe will be discarded.
It also invites trouble if the shell variable that holds the fds gets
removed unexpectedly when the coprocess terminates. (Suddenly the
variable expands to an empty string.) It seems to me that the proper time
to clear the coproc variable (if at all) is after the user has explicitly
closed both of the fds. *Or* else add an option to the coproc keyword to
explicitly close the coproc - which will close both fds and clear the
variable.
...
Separately, I consider the following coproc behavior to be weird, fragile,
and broken.
If you fg a coproc, then stop and bg it, it dies. Why? Apparently the
shell abandons the coproc when it is stopped, closes the pipe fds for it,
and clears the fd variable.
$ coproc CAT { cat; }
[1] 10391
$ fg
coproc CAT { cat; }
# oops!
^Z
[1]+ Stopped coproc CAT { cat; }
$ echo ${CAT[@]} # what happened to the fds?
$ ls -lgo /proc/$$/fd/
total 0
lrwx------ 1 64 Mar 14 02:26 0 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:26 1 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:25 2 -> /dev/pts/3
lrwx------ 1 64 Mar 14 02:26 255 -> /dev/pts/3
$ bg
[1]+ coproc CAT { cat; } &
$
[1]+ Done coproc CAT { cat; }
$ # sad user :(
This behavior is not new to the multi-coproc support. But just the same
it seems broken for the shell to automatically close the fds to
coprocesses. That should be done explicitly by the user.
Word to the wise: you might encounter this issue (coproc2 prevents
coproc1 from seeing its end-of-input) even though you are rigging this
up yourself with FIFOs rather than bash's coproc builtin.)
In my case, it's mostly a non-issue, because I fork the - now three -
background processes before exec'ing automatic fds redirecting to/from
their FIFO's in the parent process. All the automatic fds get put in an
array, and I do close them all at the beginning of a subsequent process
substitution.
That's a nice trick with the shell backgrounding all the coprocesses
before connecting the fifos. But yeah, to make subsequent coprocesses you
do still have to close the copy of the user shell's fds that the coprocess
shell inherits. It sounds like you are doing that (nice!), but in any
case it requires some care, and as these stack up it is really handy to
have something manage it all for you.
(Perhaps this is where I ask if you are happy with your solution or if you
would like to try out something wildly more flexible...)
Happy coprocessing! :)
Carl
bad-coproc-log.txt
Description: Text document