bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] tests: fix test-execute with GNU make jobserver


From: Dmitry V. Levin
Subject: Re: [PATCH] tests: fix test-execute with GNU make jobserver
Date: Thu, 8 Apr 2021 01:24:40 +0300

Hi Bruno,

On Wed, Apr 07, 2021 at 11:30:47PM +0200, Bruno Haible wrote:
> Hi Dmitry,
> 
> Thanks for the proposed patch.
> 
> > On POSIX systems the GNU make jobserver is implemented as a pipe,
> > and these two unexpected descriptors make test-execute-child fail.
> 
> This problem description is a bit technical. It took me a bit of work
> to translate your description into a "how to reproduce". Namely,
> 
>   $ make -j 8 check

Just "make -j2 check", in fact, any parallelism would do, but it's not a
race, it's not due to any parallelized execution, it's a side effect of
the GNU make jobserver being enabled.

> produces this output:
> 
>   Test case 14: 0 1 2 3 4 10 
>   test-execute-main.c:295: assertion 'ret == 0' failed
>   Aborted (core dumped)
>   test-execute.sh: test case 14 failed
>   Test case 15: 0 1 2 3 4 10 
>   test-execute-main.c:305: assertion 'ret == 0' failed
>   Aborted (core dumped)
>   test-execute.sh: test case 15 failed
>   Test case 16: 0 1 2 3 4 
>   test-execute-main.c:315: assertion 'ret == 0' failed
>   Aborted (core dumped)
>   test-execute.sh: test case 16 failed
> 
> Now, there are several questions:
> 
> 1) Why is GNU make passing these descriptors 3 and 4 to the child
> process? The Makefile rule looks like this:
> 
> test-execute.sh.log: test-execute.sh
>         @p='test-execute.sh'; \
>         b='test-execute.sh'; \
>         $(am__check_pre) $(LOG_DRIVER) --test-name "$$f" \
>         --log-file $$b.log --trs-file $$b.trs \
>         $(am__common_driver_flags) $(AM_LOG_DRIVER_FLAGS) $(LOG_DRIVER_FLAGS) 
> -- $(LOG_COMPILE) \
>         "$$tst" $(AM_TESTS_FD_REDIRECT)
> 
> and it does not start with '+'. (Cf. [1].)
> 
> Oh, I see: $(am__check_pre) expands to something that contains
> $(TESTS_ENVIRONMENT), and $(TESTS_ENVIRONMENT) contains
>   MAKE='$(MAKE)'
> and that occurrence of $(MAKE) makes 'make' think that the rule may
> execute a sub-make.
> 
> 2) You suggest to deal with the file descriptors 3..19. Is this reliable?

Yes, it is reliable as it mirrors
        for (fd = 0; fd < 20; fd++)
piece of code in tests/test-execute-child.c file.

> The MAKEFLAGS variable in the environment is defined as
>   MAKEFLAGS="w -j8 --jobserver-auth=3,4 -- ..."
> So we could parse this string and look for the --jobserver-auth
> option.
> 
> Oh, wait: this option was named differently in older GNU make versions.[2]

Since tests/test-execute-child.c explicitly checks for any unexpected
descriptors, I think the most appropriate thing to do is to make sure
that no unexpected descriptors are inherited by the child.

The jobserver was the thing that triggered the test error, but there could
be other external sources of inherited descriptors, so the test should
rather be robust and make sure these external sources do not affect the
test.

> 3) You suggest to set the cloexec bit on these file descriptors, so that
> test-execute-child will have them closed. However, it is simpler to just
> close() them in test-execute-parent already, since test-execute-parent
> does not need these file descriptors either.

Yes, this is another option, in fact, it's more reliable on GNU/Linux:
The test currently relies on the fact that descriptor #10 is not
inherited, so if this is not the case, the test will fail.
If, however, all descriptors in the range are closed, the test succeeds.

I'd choose a simple close() rather than a bit more complex cloexec if
tests/test-execute-child.c did not contain this uncommented hunk:
        for (fd = 0; fd < 20; fd++)
          #ifdef __NetBSD__
          if (fd != 3)
          #endif

If for some reason __NetBSD__ is allowed to have descriptor #3 inherited,
then it shouldn't be closed in tests/test-execute-main.c either.

Bruno, while we are here, could you shed some light on this __NetBSD__
exception, ideally as a comment in tests/test-execute-child.c, please?

Maybe the reason is not strong enough to avoid closing this descriptor
at all costs.

> 4) How about other environments? I saw 'test-execute.sh' also fail in
> some recent Linux distro with KDE desktop. Apparently 'plasma' is passing
> one or two open file descriptors to all children (that is, to all processes
> running in this desktop environment). Do you think this is worth reporting?

If it isn't documented, then I'd say it's a bug, please report it.

> 5) Will glibc at some point export the close_range() function [3]?

Yes, glibc will provide close_range and closefrom functions sooner or later,
the patches are being reviewed:
https://sourceware.org/pipermail/libc-alpha/2021-March/123532.html
https://sourceware.org/pipermail/libc-alpha/2021-March/123535.html
If the review goes smoothly, they are likely to be available in glibc 2.34.

> If so, Gnulib should probably provide a substitute close_range(), and
> test-execute-main.c should better call this function, rather than rolling
> its own loop.

Yes, I think it makes sense to provide this API in gnulib, although it
won't be able to address all issues the syscall is intended to address.

I don't think, however, that the test-execute fix should wait for this
to happen.


-- 
ldv



reply via email to

[Prev in Thread] Current Thread [Next in Thread]