Re: Test-lock hang (not 100% reproducible) on GNU/Linux

bug-gnulib

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Test-lock hang (not 100% reproducible) on GNU/Linux

From:	Pavel Raiskup
Subject:	Re: Test-lock hang (not 100% reproducible) on GNU/Linux
Date:	Wed, 04 Jan 2017 13:19:36 +0100
User-agent:	KMail/5.3.3 (Linux/4.8.15-300.fc25.x86_64; KDE/5.27.0; x86_64; ; )

Hi Bruno,

On Wednesday, January 4, 2017 11:54:27 AM CET Bruno Haible wrote:
> Hi Pavel,
> 
> > Can we assume all systems supporting pthreads are conforming to this
> > specs?  That was pretty big (and pretty recent) change of specification.
> 
> The change in the specification [4] mentions that the issue arose with glibc,
> and it points to two tests from the Linux test project [5][6]. Can you run
> these tests on your Koji system? It would be interesting to see if they
> fail or hang as well.

Thanks for the links!

> > If we really want to test _this_ behavior (writers preferred over readers,
> > i.e. no rdlock when at least one wrlock acquired), shouldn't we apply
> > something like the patch from [3]?  Than that test would be detrerministic 
> > for
> > everybody, now it is really matter of luck.
> > 
> > Can we define (documment) what is the real issue we try to test by 
> > test_rwlock?
> 
> The test_rwlock function in test-lock.c is meant to test the gl_rwlock_t API
> from glthread/lock.h (which, in our case, is just a transparent layer over the
> POSIX API) in a textbook situation. It is not meant to test against a 
> particular
> bug.
> 
> If you can think of another simple textbook way to use rwlocks - without hacks
> and without using *_NP functions - then you're welcome to change or replace
> the test_rwlock function. But if you cannot, then the Austin group should 
> throw
> the rwlocks out of POSIX. The sentence from [4]
>   "applications should probably be encouraged to use mutexes unless rwlocks
>    are needed for performance reasons"
> also goes into this direction.
> 
> > I'll try ask glibc guys, or kernel guys (Fedora/RHEL).
> 
> Thanks! This is the only promising avenue, and you can do it because you
> can reproduce the issue on your Koji system (whereas for me, test_rwlock 
> always
> completes after a couple of seconds).

Done here https://bugzilla.redhat.com/show_bug.cgi?id=1410052

But what I've done (and others too at least in Fedora) is that I
explicitly disabled the 'test-lock' for Koji builds (temporarily in
rhbz#1406031 until this get's resolved/worked-around somewhere else).

> > shouldn't we apply something like the patch from [3]?
> 
> Well, the patch in [3] would make the test hang on all platforms, because the
> writers would never get a chance to take the lock. This is pointless.

I don't see how the patch changes the problem, it IMO just makes this
test-case more deterministic.  There's no "infinite rdlock" in the
testcase, right?  Only the critical section is a bit longer for readers.
And there's always _at least_ one reader in critical section ...  But what
happens (you claim on all platforms!) that reader every-time gets again
into critical section even when there are some writers waiting on lock..

Based on specs this should not happen, but it happens everywhere.  And the
result is that, on some platform, the critical section is probably long enough
only because handling with rwlocks takes some time.

> > Simply, after spending some time on this issue (and I'm not the first
> > one), I'd like to see some fix in gnulib so nobody else in future will
> > face similar issues.
> 
> The pessimistic suggestion would be to change the gnulib documentation
> to state that rwlocks are never reliable, because of the way they are
> implemented in glibc, and should therefore never be used. If we agree on
> this, then I'm willing to put a  #if 0  around test_rwlock.

I don't want to claim rwlocks are not reliable.   IMO rwlocks do what we
ask to do...  One writer OR multiple readers.

The question is what should be the default policy ... who should be more
privileged by default (writers/readers).  Specs recently changed from
"unspecified" to "privileged writers" by default.  The *_np() function don't
seem to be backed by POSIX.

But consider that there will be writers in critical section a bit longer;
then _readers_ won't ever get get into critical section (according to
recent specs).  So it is really just a matter of policy (set explicitly by
applications in general) we haven't specified in the textbook example so
far.

For me -- if test_rwlock() is just textbook example -- I am not against
moving it to docs.

Pavel

> Bruno
> 
> > [3] https://lists.gnu.org/archive/html/bug-gnulib/2017-01/msg00024.html
> [4] http://austingroupbugs.net/view.php?id=722&nbn=5
> [5] 
> https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/conformance/interfaces/pthread_rwlock_rdlock/2-1.c
> [6] 
> https://github.com/linux-test-project/ltp/blob/master/testcases/open_posix_testsuite/conformance/interfaces/pthread_rwlock_rdlock/2-2.c
> 
>

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Test-lock hang (not 100% reproducible) on GNU/Linux, (continued)
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Bruno Haible, 2017/01/05
  - Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pavel Raiskup, 2017/01/05
    - Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Bruno Haible, 2017/01/05
    - Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Pavel Raiskup, 2017/01/06
- Re: Test-lock hang (not 100% reproducible) on GNU/Linux, Torvald Riegel, 2017/01/05
  - Re: bugs in gnulib thread modules, Bruno Haible, 2017/01/05
    - Re: bugs in gnulib thread modules, Torvald Riegel, 2017/01/05
    - Re: bugs in gnulib thread modules, Bruno Haible, 2017/01/05

Prev by Date: Re: Test-lock hang (not 100% reproducible) on GNU/Linux
Next by Date: Re: Test-lock hang (not 100% reproducible) on GNU/Linux
Previous by thread: Re: Test-lock hang (not 100% reproducible) on GNU/Linux
Next by thread: Re: Test-lock hang (not 100% reproducible) on GNU/Linux
Index(es):
- Date
- Thread