bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AIX and Interix also do early PID recycling.


From: Michael Haubenwallner
Subject: Re: AIX and Interix also do early PID recycling.
Date: Wed, 25 Jul 2012 09:59:28 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.3) Gecko/20120327 Thunderbird/10.0.3

On 07/25/2012 03:05 AM, Chet Ramey wrote:
> Bash assumes that there's a PID space at least as
> large as CHILD_MAX, and that the kernel will use all of it before reusing
> any PID in the space.  Posix says that shells must remember up to CHILD_MAX
> statuses of terminated asynchronous children (the description of `wait'),
> so implicitly the kernel is not allowed to reuse process IDs until it has
> exhausted CHILD_MAX PIDs.

What about grand-childs?
They do count for the kernel, but not for the toplevel shell...

> The description of fork() doesn't mention this,
> however.  The Posix fork() requirement that the PID returned can't
> correspond to an existing process or process group is not sufficient to
> satisfy the requirement on `wait'.

OTOH, AFAICT, as long as a PID isn't waitpid()ed for, it isn't reused by fork().
However, I'm unable to find that in the POSIX spec.

> Bash holds on to the status of all terminated processes, not just
> background ones, and only checks for the presence of a newly-forked PID
> in that list if the list size exceeds CHILD_MAX.  One of the results of
> defining RECYCLES_PIDS is that the check is performed on every created
> process.

What if the shell does not do waitpid(-1), but waitpid(known-child-PID).
That would mean to waitpid(synchronous-child-PID) immediately, and
waitpid(asynchronous-child-PID) upon some "wait $!" shell command, rendering
to waitpid(-1) when there's no PID passed to "wait".

> I'd be interested in knowing the value of CHILD_MAX (or even `ulimit -c')
> on the system where you're seeing this problem.

The AIX 6.1 I've debugged on has:
  #define CHILD_MAX 128
  #define _POSIX_CHILD_MAX 25
  sysconf(_SC_CHILD_MAX) = 1024

  $ ulimit -H -c -u
  core file size          (blocks, -c) unlimited
  max user processes              (-u) unlimited

  $ ulimit -S -c -u
  core file size          (blocks, -c) 1048575
  max user processes              (-u) unlimited

The Interix 6.1 we do have similar-looking stability problems has:
  CHILD_MAX not defined
  #define _POSIX_CHILD_MAX 6
  sysconf(_SC_CHILD_MAX) = 512

  $ ulimit -H -c -u
  core file size         (blocks, -c) unlimited
  max user processes             (-u) 512

  $ ulimit -S -c -u
  core file size         (blocks, -c) unlimited
  max user processes             (-u) 512

> The case where last_made_pid is equal to last_pid is a problem only when
> the PID space is extremely small -- on the order of, say, 4 -- as long as
> the kernel behaves as described above.

I'm going to run this build job with 'truss -t kfork' again, to eventually find
some too small count of different PIDs before PID-recycling by the kernel...

Anyway - defining RECYCLES_PIDS for that AIX 6.1 has reduced the error rate for
this one build job from ~37 to 0 when run 50 times.

/haubi/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]