[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Help with Hand-Optimized Assembly
From: |
James Harris |
Subject: |
Re: Help with Hand-Optimized Assembly |
Date: |
Wed, 28 Mar 2012 18:29:58 -0000 |
User-agent: |
G2/1.0 |
On Jan 12, 9:38 pm, Bill Woessner <woess...@nospicedham.gmail.com>
wrote:
> I'm a 100% total newbie at writing assembly. But I figured it would
> be a good exercise. And besides, this tiny chunk of code is
> definitely in the critical path of something I'm working on. Any and
> all advice would be appreciated.
>
> I'm trying to rewrite the following function in x86 assembly:
>
> inline double DiffAngle(double theta1, double theta2)
> {
> double delta(theta1 - theta2);
>
> return std::abs(delta) <= M_PI ? delta : delta - copysign(2 * M_PI,
> delta);
>
> }
The gas assembler format is ghastly so I converted your code to Nasm
creating the file DiffAngle.nasm as follows. I think it's easier to
read and despite not being inlined it seemed to be faster. Time for
100 million calls:
c function: 2.8 seconds
asm func: 1.2 seconds
I've not written floating point assembly code or linked C++ with
assembly before so there's a chance something is not right. I'll list
the steps I took so it can be recreated/challenged/corrected.
Hopefully this will help with the things you asked.
First the assembly code.
;
;DiffAngle
;
;Build with such as
; nasm -f elf32 DiffAngle.nasm -l DiffAngle.list
;
bits 32
cpu ppro
%define PI 3.1415926535897932384626433832795
%define TWO_PI 6.283185307179586476925286766559
%define NEG_TWO_PI -6.283185307179586476925286766559
global DiffAngle
section .code
DiffAngle:
fld qword [esp + 4]
fsub qword [esp + 12]
fxam
fnstsw ax
fld qword [two_pi]
test ah, 2
fld qword [neg_two_pi]
fcmovne st0, st1
fstp st1
fsubr st0, st1
fldpi
fld st2
fabs
fcomip st0, st1
fstp st0
fcmovbe st0, st1
fstp st1
ret 0
section .data
two_pi: dq NEG_TWO_PI ;NB wrong value
neg_two_pi: dq TWO_PI ;NB wrong value
> double DiffAngle(double theta1, double theta2)
> {
> asm(
> "fldl 4(%esp);"
> "fsubl 12(%esp);"
> "fxam;"
> "fnstsw %ax;"
> "fldl TWO_PI;"
> "testb $2, %ah;"
> "fldl NEG_TWO_PI;"
> "fcmovne %st(1), %st;"
> "fstp %st(1);"
> "fsubr %st(1), %st;"
> "fldpi;"
> "fld %st(2);"
> "fabs;"
> "fcomip %st(1), %st;"
> "fstp %st(0);"
> "fcmovbe %st(1), %st;"
> "fstp %st(1);"
> "rep;"
> "ret;"
> "NEG_TWO_PI:;"
> ".long 1413754136;"
> ".long 1075388923;"
> "TWO_PI:;"
> ".long 1413754136;"
> ".long -1072094725;"
> );
>
> }
>
> This compiles, runs and produces the correct answers. But I have a
> few issues with it:
I'm not sure your code is right. The constants seem to be the other
way round from what they are intended to be. I had to swap them over
to get the same results as your C program but it is late.... If
someone points out some faults I'll respond another day.
> 1) If I declare this function inline, it gives me garbage (like
> 10^-304)
To try it out I made a test routine called dtest1.c. Build steps on
Linux for the whole thing were
nasm -f elf32 DiffAngle.nasm -l DiffAngle.list
g++ dtest1.c -c
g++ dtest1.o DiffAngle.o -o dtest1
> 2) If I compile with -Wall, I get a warning that the function doesn't
> return a value, which is absolutely true, but I don't know how to fix
> it.
In dtest1.c I had to include the following prototype
extern "C" {
double DiffAngle(double, double);
}
so that g++ didn't expect a mangled routine name.
> 3) I don't like how TWO_PI and NEG_TWO_PI are defined. I had to steal
> it from some generated assembly.
These can be defined much more easily in the assembly code. The "dq"
code defines what the assembler calls a quadword, 8 bytes, in
the .data section. For example,
two_pi: dq 6.283185307179586476925286766559
> It would be nice to use M_PI,
> 4*atan(1) or something like that.
I know 4*atan(1) is pi but I don't know what M_PI is supposed to be.
I've made no attempt to understand the maths of your solution; I just
copied your code. So both the blame for faults and the credit for
increased performance go to you. :-)
James
- Re: Help with Hand-Optimized Assembly, (continued)
- Re: Help with Hand-Optimized Assembly, Terje Mathisen, 2012/03/28
- Re: Help with Hand-Optimized Assembly, Tim Roberts, 2012/03/28
- Re: Help with Hand-Optimized Assembly, Terje Mathisen, 2012/03/28
- Re: Help with Hand-Optimized Assembly, io_x, 2012/03/28
- Re: Help with Hand-Optimized Assembly, io_x, 2012/03/28
Re: Help with Hand-Optimized Assembly, Bob Masta, 2012/03/28
Re: Help with Hand-Optimized Assembly,
James Harris <=
Re: Help with Hand-Optimized Assembly, Markus Wichmann, 2012/03/28
Re: Help with Hand-Optimized Assembly, Jan Seiffert, 2012/03/28
Re: Help with Hand-Optimized Assembly, Bill Woessner, 2012/03/28