lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Benchmarking: gcc-8 beats gcc-10 soundly?


From: Greg Chicares
Subject: Re: [lmi] Benchmarking: gcc-8 beats gcc-10 soundly?
Date: Sat, 19 Sep 2020 20:37:59 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0

On 2020-09-19 15:48, Vadim Zeitlin wrote:
> On Sat, 19 Sep 2020 15:15:48 +0000 Greg Chicares <gchicares@sbcglobal.net> 
> wrote:
> 
> GC> It looks like gcc-10 gives us slower lmi binaries. Picking
> GC> the third '--selftest' scenario as an index of performance
> GC> (results in microseconds--less is better):
> GC> 
> GC>      gcc-10   gcc-8  ratio
> GC>      ------   -----  -----
> GC>      102659   84947   1.21  32-bit
> GC>       50121   37410   1.34  64-bit
> GC> 
> GC> The fourth scenario is even worse:
> GC> 
> GC>       33250   20654   1.61  32-bit
> GC>       24616   13009   1.89  64-bit

With -O3, the 64-bit build performs thus on those two scenarios:
  naic, ee prem solve : 5.001e-02 s mean;      49710 us least of  20 runs
  finra, no solve     : 2.483e-02 s mean;      24580 us least of  41 runs
Thus, the -O3 to -O2 speed ratio is
  49710 / 50121 = .992
  24580 / 24616 = .999
which isn't work the extra build time (82.89 vs 72.76 seconds).

Data below.

>  I've already seen performance regressions in newer g++ versions, but I
> don't think I've seen anything nearly like 89% slowdown, so it's indeed
> very astonishing. But I have trouble seeing how could it be not true, if
> you consistently obtain such results. And you're not the only one, see e.g.
> this bug report https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96337

I had the thought that perhaps this is a MinGW-w64 snafu, which
would explain why they haven't officially released anything
beyond 8.x yet. Yet the bugzilla report doesn't seem to specify
a platform, while the phoronix link in that report specifies:
| Ubuntu 20.04 with the Linux 5.8 kernel

I guess I'd better try the flags phoronix tested:
| "-O3 -march=native", and "-O3 -march=native -flto"
Right now, lmi looks like the "SciMark" benchmark here:
  https://www.phoronix.com/scan.php?page=article&item=gcc-10900k-compiler&num=2
so maybe this will resolve the anomaly.

Am I reading that benchmark right? It seems to say that
  -O3 -march=native
greatly outperforms
  -O3 -march=native -flto
Okay, I am reading it right:
| For the very basic SciMark 2 benchmarks the LTO build hurt the
| performance compared to "-O3 -march=native" but this was another
| test where the -O2 performance is much slower on GCC 10
so maybe LTO is not yet ready for prime time...so I won't even
ask about its "WHOPR" mode, which seems to be an allusion to a
"two-fisted burger" at some US fast-food restaurant.

>  Unfortunately there is no clear conclusion there, as gcc developers can't
> reproduce the problem.

It seems really strange that they would say that. I guess
phoronix is just one guy, but he seems to be a serious person
with a serious audience.

> They do say that -O2 has been changed in 10.x, so it
> could be worth using -O3 with it and see if it helps. Should I/we do it or
> will you test this yourself?

We seem to have a test case that should be reproducible,
though it's far from ideally minimal. Here's what I did:

/opt/lmi/src/lmi[0]$grep O2 workhorse.make 
  optimization_flag := -O2 -fno-omit-frame-pointer
/opt/lmi/src/lmi[0]$sed -i workhorse.make -e's/O2/O3/'
/opt/lmi/src/lmi[0]$grep O2 workhorse.make            
/opt/lmi/src/lmi[1]$grep O3 workhorse.make 
  optimization_flag := -O3 -fno-omit-frame-pointer

/opt/lmi/src/lmi[0]$make clean
rm --force --recursive /opt/lmi/gcc_x86_64-w64-mingw32/build/ship

/opt/lmi/src/lmi[0]$time make $coefficiency --output-sync=recurse install 
check_physical_closure 2>&1 | tee eraseme | less -SN
make $coefficiency --output-sync=recurse install check_physical_closure 2>&1  
1721.58s user 80.19s system 2173% cpu 1:22.89 total
tee eraseme  0.00s user 0.01s system 0% cpu 1:22.89 total
less -SN  0.03s user 0.02s system 0% cpu 1:32.48 total
/opt/lmi/src/lmi[0]$wine /opt/lmi/bin/lmi_cli_shared.exe --accept 
--data_path=/opt/lmi/data --selftest
Test speed:
  naic, no solve      : 3.704e-02 s mean;      36788 us least of  27 runs
  naic, specamt solve : 5.292e-02 s mean;      52692 us least of  19 runs
  naic, ee prem solve : 5.001e-02 s mean;      49710 us least of  20 runs
  finra, no solve     : 2.483e-02 s mean;      24580 us least of  41 runs
  finra, specamt solve: 3.943e-02 s mean;      39101 us least of  26 runs
  finra, ee prem solve: 3.769e-02 s mean;      37410 us least of  27 runs

/opt/lmi/src/lmi[0]$git checkout -- workhorse.make 
/opt/lmi/src/lmi[0]$make clean
rm --force --recursive /opt/lmi/gcc_x86_64-w64-mingw32/build/ship
/opt/lmi/src/lmi[0]$time make $coefficiency --output-sync=recurse install 
check_physical_closure 2>&1 | tee eraseme | less -SN
make $coefficiency --output-sync=recurse install check_physical_closure 2>&1  
1549.20s user 77.85s system 2236% cpu 1:12.76 total
tee eraseme  0.00s user 0.01s system 0% cpu 1:12.76 total
less -SN  0.02s user 0.01s system 0% cpu 1:14.99 total


reply via email to

[Prev in Thread] Current Thread [Next in Thread]