qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC PATCH 0/4] Idea for using hardfloat in PPC


From: Víctor Colombo
Subject: [RFC PATCH 0/4] Idea for using hardfloat in PPC
Date: Wed, 5 Oct 2022 11:37:15 -0300

As can be seem in the mailing thread that added hardfloat support in
QEMU [1], a requirement for it to work is to have float_flag_inexact
set when entering the API in softfloat.c. However, in the same thread,
it was explained that PPC target would not work by default with this
implementation.
The problem is that PPC has a non-sticky inexact bit (there is a
discussion about it in [2]), meaning that we can't just set the flag
and call the API in softfloat.c, as it would return the same flag set
to 1, and we wouldn't know if it is supposed to be updated on FPSCR or
not.
Over the last couple years, there were attempts to enable hardfpu
for Power, like [3]. But nothing got to master.
[5] shows a suggestion by Yonggang Luo and commentaries by Richard and
Zoltan, about caching the last FP instruction and reexecuting it when
necessary.

This patch set is a proposition on the idea to cache the last FP insn,
to be reexecuted later when the value of FPSCR is to be read by a
program. When executed in hardfloat, the instruction "context" is saved
inside `env`, and is expected to be reexecuted later, in softfloat,
to calculate the correct value of the inexact flag in FPSCR.
The instruction to be cached is the last instruction that changes FI.
If the instructions does not change FI, it keeps the cache intact.
If it changes FI, it caches itself and tries to execute in hardfpu.
It might or might not use hardfloat, but as the inexact flag was
artificially set, it will require to be reexecuted later. 'Later'
means when FPSCR is to be read, like during a call to MFFS, or when
a signal occurs. There are probably other places, e.g. other mffs-like
instructions, but this RFC only addresses these two scenarios.
This is supposed to be more efficient because programs very seldomly
read FPSCR, meaning the amount of reexecutions will be low.

For now, this was implemented and tested for linux-user, no softmmu
work or analysis was done.
I implemented the base code to keep all instructions working with
this new behavior (patch 1), and also implemented two instructions
as an example on what it would be necessary to do for every instruction
to use hardfpu (patches 1 and 2).

My tests with risu and other manual tests showed the behavior seems to
be correct. I tested mainly if FPSCR is the same after using softfloat
or hardfloat.

However, the impact in performance was not the expected. In x86_64 I
had a small 3% improvement, while in a Power9 machine there was a small
performance loss, as can be seem below (100 executions).

|        | min [s] | max [s] | avg [s] |
| before | 122.309 | 123.459 | 122.747 |
| after  | 123.906 | 125.016 | 124.373 |

The test code can be found in [4].

The issue is most likely all the overhead with the caching, which is
negating the improvement from hardfpu execution.

With all that said, could you kindly take a look at my implementation
and see if it can be improved to result in better performance? Is there
any chance to save this idea?

Thank you very much!

[1] 
https://patchwork.kernel.org/project/qemu-devel/patch/20181124235553.17371-8-cota@braap.org/
[2] https://lists.nongnu.org/archive/html/qemu-ppc/2022-05/msg00246.html
[3] 
https://patchwork.kernel.org/project/qemu-devel/patch/20200218171702.979F074637D@zero.eik.bme.hu/
[4] https://gist.github.com/vcoracolombo/6ad884a402f1bba531e2e3da7e196656
[5] https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg00064.html

Víctor Colombo (4):
  target/ppc: prepare instructions to work with caching last FP insn
  target/ppc: Implement instruction caching for fsqrt
  target/ppc: Implement instruction caching for muladd
  fpu/softfloat: Enable hardfpu for ppc target

 fpu/softfloat.c                    |   6 +-
 target/ppc/cpu.h                   |  28 ++++++
 target/ppc/excp_helper.c           |   2 +
 target/ppc/fpu_helper.c            | 132 +++++++++++++++++++++++++++++
 target/ppc/helper.h                |   1 +
 target/ppc/translate/fp-impl.c.inc |   1 +
 6 files changed, 166 insertions(+), 4 deletions(-)

-- 
2.25.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]