[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bash kill(1) doesn't report errors when $(ulimit -i) is exceeded
From: |
Cedric Blancher |
Subject: |
Re: bash kill(1) doesn't report errors when $(ulimit -i) is exceeded |
Date: |
Wed, 17 Jul 2013 01:52:36 +0200 |
On 16 July 2013 23:12, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Tue, Jul 16, 2013 at 1:31 PM, Lionel Cons <lionelcons1972@gmail.com> wrote:
>>
>> Either your ulimit -i is greater than 63000 or we have a Linux bug. If
>> ulimit -i is reached then kill(1) should fail.
>
> Traditionally kill() has never returned errors for things like this.
> In fact, quite arguably POSIX actively disallows kill() from returning
> errors for queue overflow: "The kill() function is successful if the
> process has permission to send sig to any of the processes specified
> by pid. If kill() fails, no signal shall be sent."
>
> Notice how "is successful" is not dependent on whether a signal was
> sent or not, it is dependent on whether you have _permission_ to send
> the signal to the specified process.
>
> Now, I don't think "POSIX requires" is all that big a deal, and
> there's a lot of gray areas where POSIX just doesn't talk about
> everything that could go wrong. So I don't think the above is a very
> strong argument for not possibly changing semantics, but I do argue
> that it's an argument for what traditional behavior is.
>
> I think you could quite validly argue for changing the Linux kernel
> semantics, but it has to come from that direction: talk about why you
> need it,
I think the issue came up when we (Pasteur, NHI and GE Healthcare)
seriously started to use realtime signals as communication method in
biosh (a ksh93 variant with addons for fast batch processing of
bioinformatics data), perl and bash scripts. It turned out that there
is a chance of hitting the ulimit -i limit if your machine is too
fast. The ksh93 people (Roland Mainz and David Korn) have fixed the
trouble on their side by using sigqueue() (together with a new option
-q to queue a payload value in the siginfo.value field) instead of
kill() for realtime signals and actually return an error at kill(1)
level so we can diagnose the trouble instead of accepting silent
communication failures (which are hard to detect, and even harder to
come by).
Question in this case is, should bash switch over to use sigqueue(),
or should kill(2) be fixed to return an error? Given that the silent
mode of failure which can wreak havoc, I'd opt for BOTH.
Ced
--
Cedric Blancher <cedric.blancher@gmail.com>
Institute Pasteur