bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Test-lock hang (not 100% reproducible) on GNU/Linux


From: Bruno Haible
Subject: Re: Test-lock hang (not 100% reproducible) on GNU/Linux
Date: Sat, 24 Dec 2016 18:52:07 +0100 (CET)

Hi Pádraig,

> Wow that's much better on a 40 core system:
> 
> Before your patch:
> =================
> $ time ./test-lock
> Starting test_lock ... OK
> Starting test_rwlock ... OK
> Starting test_recursive_lock ... OK
> Starting test_once ... OK
> 
> real    1m32.547s
> user    1m32.455s
> sys     13m21.532s
> 
> After your patch:
> =================
> $ time ./test-lock
> Starting test_lock ... OK
> Starting test_rwlock ... OK
> Starting test_recursive_lock ... OK
> Starting test_once ... OK
> 
> real    0m3.364s
> user    0m3.087s
> sys     0m25.477s

Wow, a 30x speed increase by using a lock instead of 'volatile'!

Thanks for the testing. I cleaned up the patch to do less
code duplication and pushed it.

Still, I wonder about the cause of this speed difference.
It must be the read from the 'volatile' variable that is problematic,
because the program writes to 'volatile' variable only 6 times in total.

What happens when a program reads from a 'volatile' variable
at address xy in a multi-processor system? It must do a broadcast
to all other CPUs "please flush your internal write caches", wait
for these flushes to be completed, and then do a read at address xy.
But the same procedure must also happen when taking a lock at
address xy. So, where does the speed difference come from?
The 'volatile' handling must be implemented in a terrible way;
either GCC generates inefficient instructions? or these instructions
are executed in a horrible way by the hardware?

What is the hardware of your 40-core machine (just for reference)?

Bruno

Attachment: 0001-lock-test-Fix-performance-problem-on-multi-core-mach.patch
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]