monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] parallel-tests merged


From: Zack Weinberg
Subject: Re: [Monotone-devel] parallel-tests merged
Date: Fri, 17 Aug 2007 17:56:36 -0700

On 8/17/07, William Uther <address@hidden> wrote:
> Hi,
>    This is failing miserably on my mac (MacOS X 10.4, Intel mac).  I
> first tried -j4, then I scaled back to just make check.  The results
> below are for the straight "make check".
[...]
>    1 _unit_tester_fail_check                       FAIL (gobbledygook:
> Check failed (return value): wanted 0 got -126
[...]

OH NO NOT THIS AGAIN.

This is a heisenbug of the very worst kind, which I saw during my own
testing but thought I had eliminated the provocation for.

Here's the deal.  The per-test child process is expected to write a
detailed human-readable log of its operations to a file "tester.log"
in the per-test directory.  It is also supposed to write a one-line
machine-parseable summary of the overall state of the test (passed,
failed, skipped, etc) to a file "STATUS" in the per-test directory.
The parent process interprets the STATUS file to accumulate statistics
about the run and print the success or failure messages.  (It is
necessary to do this dance because _exit() only passes back seven bits
of information to the parent.)  At the Lua level, there are two
different file handles - "test.log" and "s" (see run_one_test in
testlib.lua).  Lua file handles are a thin  wrapper around C stdio
FILEs.

When this bug hits, a block of text that is supposed to go to
tester.log winds up in STATUS instead, and the text that is supposed
to go to STATUS disappears into a black hole.  The code that reads
STATUS is, defensively, interpreting that as a failure.

A previous version of the code - never checked into the repository -
showed this bug about one time in four - only with ./run_unit_tests,
not always the same test cases, and never under strace.  I *thought*
that it was a problem with swapping out file descriptors 0, 1, and 2
behind stdio's back, so I took all the code that did that out and made
the log file be a separate file instead of the child's stdout/err.
That made the problem go away for me.  However, I never proved
conclusively what was causing it, and if you're seeing it with the
checked-in code, there must be something else wrong.

I am at my wits' end with this bug.  There are a couple other things
that could be tried - for instance, passing the text for STATUS back
from run_one_test to the C++ layer and writing it out there - but
without knowing where the problem comes from, we're just flailing
around in the dark.  (Did I mention the problem disappears under
strace?  Or if I stick in debugging printfs?)

zw




reply via email to

[Prev in Thread] Current Thread [Next in Thread]