bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bash kill(1) doesn't report errors when $(ulimit -i) is exceeded


From: Irek Szczesniak
Subject: Re: bash kill(1) doesn't report errors when $(ulimit -i) is exceeded
Date: Fri, 19 Jul 2013 20:56:45 +0200

On Wed, Jul 17, 2013 at 1:52 AM, Cedric Blancher
<address@hidden> wrote:
> On 16 July 2013 23:12, Linus Torvalds <address@hidden> wrote:
>> On Tue, Jul 16, 2013 at 1:31 PM, Lionel Cons <address@hidden> wrote:
>>>
>>> Either your ulimit -i is greater than 63000 or we have a Linux bug. If
>>> ulimit -i is reached then kill(1) should fail.
>>
>> Traditionally kill() has never returned errors for things like this.
>> In fact, quite arguably POSIX actively disallows kill() from returning
>> errors for queue overflow: "The kill() function is successful if the
>> process has permission to send sig to any of the processes specified
>> by pid. If kill() fails, no signal shall be sent."
>>
>> Notice how "is successful" is not dependent on whether a signal was
>> sent or not, it is dependent on whether you have _permission_ to send
>> the signal to the specified process.
>>
>> Now, I don't think "POSIX requires" is all that big a deal, and
>> there's a lot of gray areas where POSIX just doesn't talk about
>> everything that could go wrong. So I don't think the above is a very
>> strong argument for not possibly changing semantics, but I do argue
>> that it's an argument for what traditional behavior is.
>>
>> I think you could quite validly argue for changing the Linux kernel
>> semantics, but it has to come from that direction: talk about why you
>> need it,
>
> I think the issue came up when we (Pasteur, NHI and GE Healthcare)
> seriously started to use realtime signals as communication method in
> biosh (a ksh93 variant with addons for fast batch processing of
> bioinformatics data), perl and bash scripts. It turned out that there
> is a chance of hitting the ulimit -i limit if your machine is too
> fast. The ksh93 people (Roland Mainz and David Korn) have fixed the
> trouble on their side by using sigqueue() (together with a new option
> -q to queue a payload value in the siginfo.value field) instead of
> kill() for realtime signals and actually return an error at kill(1)
> level so we can diagnose the trouble instead of accepting silent
> communication failures (which are hard to detect, and even harder to
> come by).
>
> Question in this case is, should bash switch over to use sigqueue(),
> or should kill(2) be fixed to return an error? Given that the silent
> mode of failure which can wreak havoc, I'd opt for BOTH.

Agree with Cedric that kill(2) should be fixed.
1. The POSIX standard only defines the minimum errno codes but doesn't
prevent an implementation from returning more errno codes (e.g. see
stat() which may return EINTR on some platforms for NFS/DFS/AFS file
systems)
2. I think it is a POSIX bug that kill() and raise() can fail
silently. I'm going to report this to the Austin Group as major bug.

Irek



reply via email to

[Prev in Thread] Current Thread [Next in Thread]