[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lost output from asynchronous lists
From: |
Ralf Wildenhues |
Subject: |
Re: lost output from asynchronous lists |
Date: |
Tue, 28 Oct 2008 22:51:13 +0100 |
User-agent: |
Mutt/1.5.18 (2008-05-17) |
Hi Stephane,
* Stephane Chazelas wrote on Tue, Oct 28, 2008 at 11:26:18AM CET:
>
> I have to admit I would have thought the code above to be safe
> as well and I wonder if it's the same on all systems. But I can
> reproduce the problem on Linux. As far as I can tell, if you
> don't use O_APPEND, the system doesn't guarantee the write(2) to
> be atomic, so I suppose you can get this kind of behavior if a
> context switch occurs in the middle of a write(2) system call.
thanks for the feedback, that looks spot-on!
It is supported by the fact that the log:
> > <http://buildbot.proulx.com:9003/amd64-gnu-linux/builds/961/step-test/0>
shows that the per-test testsuite.log file contains all the output,
while the 'stdout' file did not. The former is always generated by
either
tee -a testsuite.log
or
cat >> testsuite.log
Also, I have not been able to provoke lossage on an unredirected
standard output (manually running ./micro-suite in the test dir).
> That wouldn't have anything to do with the shell.
Yep.
> Replacing foo.sh > stdout 2> stderr with
> : > stdout > stderr
> ./foo.sh >> stdout 2>> stderr
>
> should be guaranteed to work.
Yes. For shell portability, I'll write the first line as
: > stdout
: > stderr
though.
> I think
>
> { ./foo.sh | cat > stdout; } 2>&1 | cat > stderr
>
> should be OK as well as write(2)s to a pipe are meant to be
> atomic as long as they are less than PIPE_BUF bytes (a page size
> on Linux) and even if they were not atomic, I would still
> consider it a bug if one process' output to a pipe was to
> overwrite another one's.
I agree. However, this solution requires two or three more processes
than the first one.
Consequently, I think the patch below should fix the failure. I've
tried it out on a couple of GNU/Linux systems, and been unable to
provoke the failure after an hour or so. I've pushed the change,
and put Stéphane in THANKS.
Cheers,
Ralf, a lot less worried about parallel Autotest now :-)
Fix parallel test execution output lossage.
* lib/autotest/general.m4 (_AT_CHECK): Truncate files to hold
standard output and standard error before the test, use append
mode for writing.
* THANKS: Update.
Caught by Bob Proulx' build daemons, analysis and suggested fix
by Stephane Chazelas.
diff --git a/lib/autotest/general.m4 b/lib/autotest/general.m4
index 4d7c0f5..03d3902 100644
--- a/lib/autotest/general.m4
+++ b/lib/autotest/general.m4
@@ -1893,16 +1893,22 @@ m4_define([AT_DIFF_STDOUT()],
#
# ( $at_traceon; $1 ) >at-stdout 2>at-stder1
#
+# Note that we truncate and append to the output files, to avoid losing
+# output from multiple concurrent processes, e.g., an inner testsuite
+# with parallel jobs.
m4_define([_AT_CHECK],
[{ $at_traceoff
AS_ECHO(["$at_srcdir/AT_LINE: AS_ESCAPE([$1])"])
echo AT_LINE >"$at_check_line_file"
+: >"$at_stdout"
if _AT_DECIDE_TRACEABLE([$1]); then
- ( $at_traceon; $1 ) >"$at_stdout" 2>"$at_stder1"
+ : >"$at_stder1"
+ ( $at_traceon; $1 ) >>"$at_stdout" 2>>"$at_stder1"
at_func_filter_trace $?
else
- ( :; $1 ) >"$at_stdout" 2>"$at_stderr"
+ : >"$at_stderr"
+ ( :; $1 ) >>"$at_stdout" 2>>"$at_stderr"
fi
at_status=$?
at_failed=false