lightning
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] ppc: Fix extr_d() on 32-bit PPC


From: Paulo César Pereira de Andrade
Subject: Re: [PATCH] ppc: Fix extr_d() on 32-bit PPC
Date: Mon, 29 Aug 2022 17:31:39 -0300

Em seg., 29 de ago. de 2022 às 16:46, Paul Cercueil
<paul@crapouillou.net> escreveu:
>
> Hi Paulo,

  Hi Paul,

> Le lun., août 29 2022 at 16:44:35 -0300, Paulo César Pereira de
> Andrade <paulo.cesar.pereira.de.andrade@gmail.com> a écrit :
> > Em sáb., 27 de ago. de 2022 às 20:30, Paul Cercueil
> > <paul@crapouillou.net> escreveu:
> >>
> >>  Note that with your recent SQRT patches and this one, the checks all
> >>  pass on PowerPC 32-bit now.
> >
> >   That is great news.
> >   Thanks for letting me know!
>
> No problem.
>
> One thing bothers me though - extr_f() and extr_d() are defined to the
> same thing; but doubles and floats are different, how does this work?

  Based on 
https://www.ibm.com/docs/no/aix/7.2?topic=processor-interpreting-contents-floating-point-register
it is also my understanding, and how I noticed things working, on
powerpc the value is always in double precision format, just that
some instructions force it to operate as if it were single precision,
and there are instructions to "truncate" to 32 bit precision.

  I believe there might be issues with very large values when converting
an integer to a single/double precision value. For example, if doing
something like:

movi(Rx, 0x...);
extr_f(Fx, Rx);
truncr_f(Rx, Fx);

vs:

movi(Rx, 0x...);
extr_d(Fx, Rx);
truncr_d(Rx, Fx);

for very large values of 0x....
  I mean issues in the sense that different architectures might show
different values.

  But if anything, for the above example, powerpc should just show
more precision, and I believe it is something one would not want
to rely on precise values.
  Either way, to have the same result as other arches, should in
the first example add the pseudo patch:

 extr_f(Fx, Rx);
+extr_d_f(Fx, Fx);
 truncr_f(Rx, Fx);

that should truncate the double precision value to a single
precision value.

> -Paul
>
> >>
> >>
> >>  Le sam., août 27 2022 at 16:17:30 +0100, Paul Cercueil
> >>  <paul@crapouillou.net> a écrit :
> >>  > The FCFID instruction is only available on 64-bit PowerPC.
> >> Therefore
> >>  > it
> >>  > is necesary to use a software mechanism to convert integers to
> >>  > floating-point on 32-bit PowerPC.
> >>  >
> >>  > Tested and working on PowerPC-32 big-endian and little-endian.
> >>  >
> >>  > Signed-off-by: Paul Cercueil <paul@crapouillou.net>
> >>  > ---
> >>  >  lib/jit_ppc-fpu.c | 31 ++++++++++++++++++++++++-------
> >>  >  1 file changed, 24 insertions(+), 7 deletions(-)
> >>  >
> >>  > diff --git a/lib/jit_ppc-fpu.c b/lib/jit_ppc-fpu.c
> >>  > index 387cc6f..dd66d03 100644
> >>  > --- a/lib/jit_ppc-fpu.c
> >>  > +++ b/lib/jit_ppc-fpu.c
> >>  > @@ -484,23 +484,40 @@ _movi_d(jit_state_t *_jit, jit_int32_t r0,
> >>  > jit_float64_t *i0)
> >>  >       ldi_d(r0, (jit_word_t)i0);
> >>  >  }
> >>  >
> >>  > -/* should only work on newer ppc (fcfid is a ppc64 instruction)
> >> */
> >>  >  static void
> >>  >  _extr_d(jit_state_t *_jit, jit_int32_t r0, jit_int32_t r1)
> >>  >  {
> >>  >  #  if __WORDSIZE == 32
> >>  > -    jit_int32_t              reg;
> >>  > +    jit_int32_t              reg, freg, off1, off2;
> >>  > +
> >>  > +#  if __BYTE_ORDER == __BIG_ENDIAN
> >>  > +    off1 = alloca_offset - 8;
> >>  > +    off2 = alloca_offset - 4;
> >>  > +#  else
> >>  > +    off1 = alloca_offset - 4;
> >>  > +    off2 = alloca_offset - 8;
> >>  > +#  endif
> >>  > +
> >>  >      reg = jit_get_reg(jit_class_gpr);
> >>  > -    rshi(rn(reg), r1, 31);
> >>  > -    /* use reserved 8 bytes area */
> >>  > -    stxi(alloca_offset - 4, _FP_REGNO, r1);
> >>  > -    stxi(alloca_offset - 8, _FP_REGNO, rn(reg));
> >>  > +    freg = jit_get_reg(jit_class_fpr);
> >>  > +
> >>  > +    movi(rn(reg), 0x43300000);
> >>  > +    stxi_i(off1, _FP_REGNO, rn(reg));
> >>  > +    movi(rn(reg), 0x80000000);
> >>  > +    stxi_i(off2, _FP_REGNO, rn(reg));
> >>  > +    ldxi_d(rn(freg), _FP_REGNO, alloca_offset - 8);
> >>  > +    xorr(rn(reg), r1, rn(reg));
> >>  > +    stxi_i(off2, _FP_REGNO, rn(reg));
> >>  > +    ldxi_d(r0, _FP_REGNO, alloca_offset - 8);
> >>  > +    subr_d(r0, r0, rn(freg));
> >>  > +
> >>  >      jit_unget_reg(reg);
> >>  > +    jit_unget_reg(freg);
> >>  >  #  else
> >>  >      stxi(alloca_offset - 8, _FP_REGNO, r1);
> >>  > -#  endif
> >>  >      ldxi_d(r0, _FP_REGNO, alloca_offset - 8);
> >>  >      FCFID(r0, r0);
> >>  > +#  endif
> >>  >  }
> >>  >
> >>  >  static void
> >>  > --
> >>  > 2.35.1

Thanks,
Paulo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]