coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed ou


From: Carl Edquist
Subject: Re: [PATCH] tee: Add --pipe-check to allow instantly detecting closed outputs
Date: Tue, 29 Nov 2022 15:48:08 -0600 (CST)

On Tue, 29 Nov 2022, Pádraig Brady wrote:

On 29/11/2022 17:32, Carl Edquist wrote:
 ...

 If this kind of detect-broken-output-pipe logic were added to filter
 utils generally, the above example (with 4 cats) would return to the
 shell immediately.

Right. Thanks for discussing the more general pattern.
I.e. that SIGPIPE doesn't cascade back up the pipeline,
only upon attempted write to the pipe.
So it's not really a tee issue, more of a general pattern.

So it wouldn't be wrong to add this to tee (by default),
but I'm not sure how useful it is given this is a general issue for all filters.

That makes sense; though I suppose it would continue to become more useful for these types of cases as it gets added to more filters :)

Also I'm a bit wary of inducing SIGPIPE as traditionally it hasn't been handled well:

But wait now, are we talking about inducing SIGPIPE?

In the current patch for tee, I think the idea was just to remove an output from the list when it's detected to be a broken pipe, allowing tee to exit sooner if all of its outputs are detected to be broken.

Similarly for the general filter case (usually with only a single output), the idea would be to allow the program to exit right away if it's detected that the output is a broken pipe.

https://www.pixelbeat.org/programming/sigpipe_handling.html

Actually, to me, if anything this page seems to serve as motivation to add broken-pipe checking to coreutils filters in general.

The article you linked ("Don't fear SIGPIPE!") discusses three topics:


1. First it discusses default SIGPIPE behavior - programs that do not specifically handle SIGPIPE do the right thing by default (they get killed) when they continue writing to their output pipe after it breaks.

We wouldn't be changing this for any filters in coreutils. Writing to a broken pipe should still produce a SIGPIPE to kill the program. (Even with broken-pipe detection, this can still happen if input becomes ready, then the output pipe's read end is closed, then the write is attempted.)

tee is actually a special case (if -p is given), because then SIGPIPE does not kill the process, but the write will still fail with EPIPE and tee will remove that output.

We also wouldn't really be inducing anything new for programs earlier in the pipeline. If they don't handle SIGPIPE, they will just get killed with it more promptly - they will end up writing to a broken pipe one write(2) call sooner.


2. The "Incorrect handling of SIGPIPE" section discusses programs that attempt to handle SIGPIPE but do so poorly. This doesn't apply to us either. Filters that add broken-pipe detection do not need to add SIGPIPE handling. And programs that handle it poorly, earlier in the pipeline, will have their problems regardless. (Again, just one write(2) call sooner.)


3. Finally, the "Cases where SIGPIPE is not useful" section actually highlights why we *should* add this broken-pipe checking to filters in general.

The "Intermittent sources" subsection discusses exactly what we are talking about fixing:

For example 'cat | grep -m1 exit' will only exit, when you type a line after you type "exit".

If we added broken-pipe checking to cat, then this example would behave like the user would have wanted - typing "exit" would cause grep to exit, and cat will detect it's output pipe breaking, and exit immediately.

The other example about 'tail' was fixed already, as this kind of checking was added to tail, as we've discussed. It's a good start! The more utils we add it to, the more will be able to benefit.

The "Multiple outputs" subsection is specific to tee, and if anything perhaps suggests that the '-p' option should be on by default. That is, it makes an argument for why it makes sense for tee to avoid letting a SIGPIPE kill it, but rather only to exit when all the input is consumed or all the outputs have been removed due to write errors.

The "Long lived services" subsection is a generalization of what was just said about tee - namely that it makes sense that some programs want to continue after a failed write attempt into a broken pipe, and such programs need to handle or ignore SIGPIPE. This is true for such programs already, and adding broken-pipe checking to a filter in the same pipeline doesn't change that at all. (Again, it will just cause them to get a SIGPIPE/EPIPE *promptly* - one write call sooner - when the final consumer of the output completes.)

...

Or perhaps when you mention "inducing SIGPIPE", you are referring to how tail(1) does things currently (when it detects a broken output), by attempting raise(SIGPIPE) followed by exit(EXIT_FAILURE). It seems this is just an attempt to make it look to the waiting parent process that tail died trying to write to a broken pipe (somewhat of a white lie). Most likely it could just exit(EXIT_FAILURE) without confusing the caller. So if you'd like to avoid that, it's probably not actually necessary for tail (or other filters) to kill themselves with a SIGPIPE. But it's not harmful either. It just produces the same effect it would have on its next write attempt (either way it gets a SIGPIPE). It does *not* attempt to handle the SIGPIPE. (The linked article's "Incorrect handling of SIGPIPE" is about programs that attempt to handle this signal, but do so poorly.)

...

So, in summary, when it comes to the idea of adding broken-pipe checking to tee and filters in general, I'm seeing only wins -- and things that won't change, but no losses :)


Well... Except that it's work to add this to each filter. But, to lower the bar there a bit, they would not all have to be done at once.

I can imagine you might want to have some common code, or at least a common idiom, for doing this type of thing in each filter. I would point out though a couple things. tail(1) is a somewhat of special case because it does the 'tail -f' thing of continually checking multiple input files for changes. (It doesn't quit when it hits EOF on input.) tee(1) is also somewhat special because it has multiple outputs, and it has to check for each of them breaking. So the implementation for each of these will likely be different from the more general filter case, where the inputs do not continue past their respective EOFs, and where there is only a single output to check.

I had a peek at the tail(1) implementation (in check_output_alive()), and I see it does a poll just on stdout, with a 0 timeout. I imagine in the more general filter case, you might do poll on both input and output(s) together, with an unlimited timeout. The 0 timeout thing in 'tail -f' mainly makes sense because it continually loops through multiple inputs, checking for new data in files after EOF is hit. (That is, something unique to tail.)

... Sorry to see the poll thing is complicated by cross-platform behavior differences :(

...

My apologies for the long email...  Hopefully some food for thought! :)

Carl


reply via email to

[Prev in Thread] Current Thread [Next in Thread]