Re: Help with Hand-Optimized Assembly

help-gplusplus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with Hand-Optimized Assembly

From:	James Harris
Subject:	Re: Help with Hand-Optimized Assembly
Date:	Wed, 28 Mar 2012 18:29:58 -0000
User-agent:	G2/1.0

On Jan 12, 9:38 pm, Bill Woessner <woess...@nospicedham.gmail.com>
wrote:
> I'm a 100% total newbie at writing assembly. But I figured it would
> be a good exercise.  And besides, this tiny chunk of code is
> definitely in the critical path of something I'm working on.  Any and
> all advice would be appreciated.
>
> I'm trying to rewrite the following function in x86 assembly:
>
> inline double DiffAngle(double theta1, double theta2)
> {
>   double delta(theta1 - theta2);
>
>   return std::abs(delta) <= M_PI ? delta : delta - copysign(2 * M_PI,
> delta);
>
> }

The gas assembler format is ghastly so I converted your code to Nasm
creating the file DiffAngle.nasm as follows. I think it's easier to
read and despite not being inlined it seemed to be faster. Time for
100 million calls:

  c function: 2.8 seconds
  asm func:   1.2 seconds

I've not written floating point assembly code or linked C++ with
assembly before so there's a chance something is not right. I'll list
the steps I took so it can be recreated/challenged/corrected.
Hopefully this will help with the things you asked.

First the assembly code.

;
;DiffAngle
;
;Build with such as
;  nasm -f elf32 DiffAngle.nasm -l DiffAngle.list
;

bits 32
cpu ppro

%define PI           3.1415926535897932384626433832795
%define TWO_PI       6.283185307179586476925286766559
%define NEG_TWO_PI  -6.283185307179586476925286766559

global DiffAngle

section .code

DiffAngle:
  fld     qword [esp + 4]
  fsub    qword [esp + 12]
  fxam
  fnstsw  ax
  fld     qword [two_pi]
  test    ah, 2
  fld     qword [neg_two_pi]
  fcmovne st0, st1
  fstp    st1
  fsubr   st0, st1
  fldpi
  fld     st2
  fabs
  fcomip  st0, st1
  fstp    st0
  fcmovbe st0, st1
  fstp    st1
  ret     0

section .data

two_pi:      dq NEG_TWO_PI  ;NB wrong value
neg_two_pi:  dq TWO_PI      ;NB wrong value

> double DiffAngle(double theta1, double theta2)
> {
>   asm(
>       "fldl    4(%esp);"
>       "fsubl   12(%esp);"
>       "fxam;"
>       "fnstsw  %ax;"
>       "fldl    TWO_PI;"
>       "testb   $2, %ah;"
>       "fldl    NEG_TWO_PI;"
>       "fcmovne %st(1), %st;"
>       "fstp    %st(1);"
>       "fsubr   %st(1), %st;"
>       "fldpi;"
>       "fld     %st(2);"
>       "fabs;"
>       "fcomip  %st(1), %st;"
>       "fstp    %st(0);"
>       "fcmovbe %st(1), %st;"
>       "fstp    %st(1);"
>       "rep;"
>       "ret;"
>       "NEG_TWO_PI:;"
>       ".long   1413754136;"
>       ".long   1075388923;"
>       "TWO_PI:;"
>       ".long   1413754136;"
>       ".long   -1072094725;"
>       );
>
> }
>
> This compiles, runs and produces the correct answers.  But I have a
> few issues with it:

I'm not sure your code is right. The constants seem to be the other
way round from what they are intended to be. I had to swap them over
to get the same results as your C program but it is late.... If
someone points out some faults I'll respond another day.

> 1) If I declare this function inline, it gives me garbage (like
> 10^-304)

To try it out I made a test routine called dtest1.c. Build steps on
Linux for the whole thing were

  nasm -f elf32 DiffAngle.nasm -l DiffAngle.list
  g++ dtest1.c -c
  g++ dtest1.o DiffAngle.o -o dtest1

> 2) If I compile with -Wall, I get a warning that the function doesn't
> return a value, which is absolutely true, but I don't know how to fix
> it.

In dtest1.c I had to include the following prototype

extern "C" {
  double DiffAngle(double, double);
}

so that g++ didn't expect a mangled routine name.

> 3) I don't like how TWO_PI and NEG_TWO_PI are defined.  I had to steal
> it from some generated assembly.

These can be defined much more easily in the assembly code. The "dq"
code defines what the assembler calls a quadword, 8 bytes, in
the .data section. For example,

two_pi:      dq 6.283185307179586476925286766559

>  It would be nice to use M_PI,
> 4*atan(1) or something like that.

I know 4*atan(1) is pi but I don't know what M_PI is supposed to be.
I've made no attempt to understand the maths of your solution; I just
copied your code. So both the blame for faults and the credit for
increased performance go to you. :-)

James

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Help with Hand-Optimized Assembly, (continued)
- Re: Help with Hand-Optimized Assembly, Bob Masta, 2012/03/28
- Re: Help with Hand-Optimized Assembly, James Harris <=
- Re: Help with Hand-Optimized Assembly, Markus Wichmann, 2012/03/28
- Re: Help with Hand-Optimized Assembly, Jan Seiffert, 2012/03/28
- Re: Help with Hand-Optimized Assembly, Bill Woessner, 2012/03/28
  - Re: Help with Hand-Optimized Assembly, sfuerst, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, Bill Woessner, 2012/03/28

Prev by Date: Re: Help with Hand-Optimized Assembly
Next by Date: Re: Help with Hand-Optimized Assembly
Previous by thread: Re: Help with Hand-Optimized Assembly
Next by thread: Re: Help with Hand-Optimized Assembly
Index(es):
- Date
- Thread