[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Autotest and signal handling

From: Ralf Wildenhues
Subject: Re: Autotest and signal handling
Date: Sun, 16 Nov 2008 22:49:18 +0100
User-agent: Mutt/1.5.18 (2008-05-17)

Hi Eric,

* Eric Blake wrote on Sun, Nov 16, 2008 at 10:24:10PM CET:
> According to Ralf Wildenhues on 11/16/2008 7:43 AM:
> > Kudos to R. Stevens, by the way.
> Does he need mention in the changelog?

I see I should have been a bit more verbose,
but I sure don't mind adding "thanks to Richard
Stevens for writing APUE." to the ChangeLog entry,
<>.  :-)

> > +dnl So what we do is enable shell job control if available, which causes 
> > the
> > +dnl shell to start each parallel task as its own shell job, thus as a new
> > +dnl process group leader.  We then send the signal to all new process 
> > groups.
> It sounds like if the shell doesn't support job control, then we should
> not allow parallel tests?

TBH, I haven't tested a system/shell that doesn't support job control at
all, yet.  (Job control needs system support, so it's not only a feature
of the shell.)

I guess I could recompile bash with job control disabled; but that still
won't quite show the quirks of those ancient shells I guess.  And I
didn't want to introduce a test for job control without being able to
test it.  Even the shells that disallow *changing* the -m switch
otherwise seem to work fine with signals, only a few extra status output
that escapes when child scripts change status.

> > +for at_signal in 1 2 15; do
> > +dnl This signal handler is not suitable for PIPE: it causes writes.
> > +dnl The code that was interrupted may have the errexit, monitor, or xtrace
> > +dnl flags enabled, so sanitize.
> > +  trap 'set +e +x
> For portability to shells that only see options in the first argument to
> set, shouldn't this use 'set +ex'?

Maybe.  I'd probably make it 'set +x; set +e' in that case.

> > +   AS_WARN([caught signal $at_signal, bailing out])
> Is there any portable way to list the signal by name, not just number?

'kill -l 2' fails with Solaris 2.6 sh.  It supports 'kill -l', but
parsing that is ugly.  HP-UX 10.20 ksh supports only 'kill -l' and
ignores extra arguments (thus exits zero upon 'kill -l 2').

I suppose we could use
  at_signame=`kill -l $at_signal 2>&1 || echo $at_signal`
  set x $at_signame
  test $# -gt 2 && at_signame=$at_signal

> > +dnl Unfortunately, ksh93 fork-bombs when we send TSTP, so send STOP
> > +dnl if this might be ksh (STOP prevents possible TSTP handlers inside
> > +dnl AT_CHECKs from running).  Then stop ourselves.
> > +     at_sig=TSTP
> > +     test "${TMOUT+set}" = set && at_sig=STOP
> Using the very same idea you questioned earlier.

Yes, I know.  :-/

> Is there any way to more reliably detect ksh than whether TMOUT is
> set?  Does this still work in bash/zsh if TMOUT was exported?

Well, sending STOP always "works", in the sense that it reliably stops
the child processes.  The difference is that, when the child processes
themselves start other processes in different process groups, e.g.,
themselves are shells and use job control, then those grandchildren
cannot be stopped.

Could be construed a QoI issue.

> > +     kill -$at_sig $at_pids 2>/dev/null
> Is 'kill -TSTP' or 'kill -STOP' portable?  Or do we have to fall back to
> numeric arguments (for that matter, TSTP and STOP don't always map to the
> same signal numbers).

Good question.  Again, I haven't yet found problems with that.  Of
course, the 2>/dev/null won't make it exactly easy to find.
But since the numbers differ, there is little chance except to use the

> > +dnl We got a CONT, so let's go again.  Passing this to all processes
> > +dnl in the groups is necessary (because we stopped them), but it may
> > +dnl cause changed test semantics; e.g., a sleep will be interrupted.
> > +   test -z "$at_pids" || kill -CONT $at_pids 2>/dev/null' TSTP
> What do we do about shells, like ash, that choke on "trap ... TSTP"?
> $ ash -c 'trap "" TSTP'
> trap: bad signal TSTP

Which version is that?  My dash 0.5.4 on GNU/Linux does not do this.

> > +# Apparently some shells don't get around to creating 'status' any more.
> > +# And ksh93 on FreeBSD uses 256 + 13 instead of 128 + 13
> That's screwy (since exit status is supposed to be 8 bits).  As a separate
> patch, can you please prepare a documentation patch for all the gotcha's
> that you have discovered during this exercise, such as FreeBSD's broken
> exit status?

On my list; but I wanted to wait until I've tested a few more systems.

Cheers, and thanks for the quick feedback,

reply via email to

[Prev in Thread] Current Thread [Next in Thread]