qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TCG Floating Point Support (Work in Progress)


From: Matt
Subject: Re: TCG Floating Point Support (Work in Progress)
Date: Thu, 30 Sep 2021 19:47:22 -0700

Thank you Alex, for your quick and thoughtful response.

> I've not reviewed the code as it is a rather large diff. For your proper
> submission could you please ensure that your patch series is broken up
> into discreet commits to aid review.

Of course.

> The phrase "if the user discovers some issues" is a bit of a red flag.
> We should always be striving for correct emulation of floating point.

I agree. This is an option that I added for use during feature
development. Ultimately I would like not to have such an option, and
for it to always *just work*.

> Indeed we have recently got the code base to the point we pass all of
> the Berkey softfloat test suite. This can be checked by running:
>
>   make check-softfloat
>
> However the test code links directly to the softfloat code so won't work
> with direct code execution.

I had planned to leverage the existing soft float test suite, and I
think this can be done with the right harnessing. It would be nice to
have a mechanism to test translation of individual TCG ops, e.g. be
able to run translated blocks from test code and evaluate their
output. I'm not sure if any such op level testing like that is being
done. There are also guest tests in tests/tcg, which could also be
expanded to include more FP tests.

> The existing 32/64 bit hardfloat
> optimisations work within the helpers. While generating direct code is
> appealing to avoid the cost of helper calls it's fairly well cached and
> predicted code. Experience with the initial hardfloat code showed the
> was still a performance win even with the cost of the helper call.

Unfortunately, even with the existing hardfloat support, the overhead
of the helper calls is still too costly for my particular application.

> I don't think you'll see the same behaviour emulating an x87 on anything
> that isn't an x87 because the boundaries for things like inexact
> calculation will be different. Indeed if you look at the existing
> hardfloat code function can_use_fpu() you will see we only call the host
> processor function if the inexact bit is already set. Other wrappers
> have even more checks for normal numbers. Anything that needs NaN
> handling will fallback to the correct softfloat code.

Fair points. Bit-perfect x87 emulation with this approach may be
ultimately unachievable; and I'm still evaluating the cases when this
will not work. This has been a learning experience for me, and I
gladly welcome expert input in this matter.

Personally, I would accept minor accuracy differences in trade for
significant performance advantage in emulation of game code, but not
for scientific applications, which I understand may diminish upstream
appeal of this x87 translation work.

> I think there will be a wariness to merge anything that only works for a
> single frontend/backend combination. Running translated x86 on x86 is
> not the common case for TCG ;-)

Understood; initially this works on a single frontend/backend
combination, with eventual translation to other guests and hosts. It
will be a long road, but my plan next is to produce a translation for
AArch64 systems.

> These are the things that make correct handling of floating point hard.

Agreed!

> I'll happily review patches on the list that provide for an accelerated
> FPU experience as long as the correctness is maintained.

Thank you!

Matt

On Thu, Sep 30, 2021 at 2:38 AM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Matt <mborgerson@gmail.com> writes:
>
> > Hello--
> >
> > I'm excited to share that I have been developing support for TCG
> > floating point operations; specifically, to accelerate emulation of
> > x86 guest code which heavily exercises the x87 FPU for a game console
> > emulator project based on QEMU. So far, this work has shown great
> > promise, demonstrating some dramatic performance improvements in
> > emulation of x87 heavy code.
>
> I've not reviewed the code as it is a rather large diff. For your proper
> submission could you please ensure that your patch series is broken up
> into discreet commits to aid review. It also aids bisection if
> regressions are identified.
>
> > The feature works in concert with unaccelerated x87 FPU helpers, and
> > also allows total soft float helper fallback if the user discovers
> > some issue with the hard float implementation.
>
> The phrase "if the user discovers some issues" is a bit of a red flag.
> We should always be striving for correct emulation of floating point.
> Indeed we have recently got the code base to the point we pass all of
> the Berkey softfloat test suite. This can be checked by running:
>
>   make check-softfloat
>
> However the test code links directly to the softfloat code so won't work
> with direct code execution. The existing 32/64 bit hardfloat
> optimisations work within the helpers. While generating direct code is
> appealing to avoid the cost of helper calls it's fairly well cached and
> predicted code. Experience with the initial hardfloat code showed the
> was still a performance win even with the cost of the helper call.
>
> > For the TCG target,
> > I've opted to implement it for x86-64 hosts using SSE2, although this
> > could be extended to support full 80b double extended precision with
> > host x87 support. I'm also in early development of an implementation
> > for AArch64 hosts.
>
> I don't think you'll see the same behaviour emulating an x87 on anything
> that isn't an x87 because the boundaries for things like inexact
> calculation will be different. Indeed if you look at the existing
> hardfloat code function can_use_fpu() you will see we only call the host
> processor function if the inexact bit is already set. Other wrappers
> have even more checks for normal numbers. Anything that needs NaN
> handling will fallback to the correct softfloat code.
>
> I think there will be a wariness to merge anything that only works for a
> single frontend/backend combination. Running translated x86 on x86 is
> not the common case for TCG ;-)
>
> > There are still some significant tasks to be done, like proper
> > handling of exception flags, edge cases, and testing, to name a few.
>
> These are the things that make correct handling of floating point hard.
>
> > Once in a slightly more mature state, I do think this feature would
> > make a natural addition to upstream QEMU and plan to submit it for
> > consideration.
> >
> > I'm writing to the mailing list now to inform FPU maintainers and any
> > other interested parties that this work is happening, to solicit any
> > early feedback, and to extend an invitation to anyone interested in
> > collaborating to expedite its upstreaming.
>
> I'll happily review patches on the list that provide for an accelerated
> FPU experience as long as the correctness is maintained.
>
> > My initial TCG FP work can be found here:
> > https://github.com/mborgerson/xemu/pull/464/commits
> >
> > Thanks,
> > Matt
>
>
> --
> Alex Bennée



reply via email to

[Prev in Thread] Current Thread [Next in Thread]