[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Help with Hand-Optimized Assembly
From: |
James Van Buskirk |
Subject: |
Re: Help with Hand-Optimized Assembly |
Date: |
Wed, 28 Mar 2012 18:29:57 -0000 |
"Terje Mathisen" <"terje.mathisen at tmsw.no"@giganews.com> wrote in message
5gh6u8-3062.ln1@ntp6.tmsw.no">news:5gh6u8-3062.ln1@ntp6.tmsw.no...
> sfuerst wrote:
>> There is a straight-forward algorithm using the fact that only one of
>> the bounds can be crossed...
>> Something like this:
>> (Inputs in %xmm0, and %xmm1, output in %xmm0)
>> subsd %xmm1,%xmm0
>> movsd plusM_PI(%rip), %xmm1
>> movsd minusM_PI(%rip), %xmm2
>> cmpgtsd %xmm0, %xmm1
>> cmpltsd %xmm0, %xmm2
>> andpd minus2M_PI(%rip), %xmm1
>> andpd plus2M_PI(%rip), %xmm2
>> addsd %xmm1, %xmm0
>> addsd %xmm2, %xmm0
>> I probably have some of the comparisons reversed by mistake... but you
>> get the idea. You can do both comparisons in parallel. Using sign
>> tricks doesn't seem to be profitable, as that increases the length of
>> the critical path.
> Very nice, and definitely much better than my approach!
> :-)
I really liked your approach more because it doesn't involve as many
loads nor as many long-latency operations like ADDSD and CMPccSD.
Looking at the above code we see four such long-latency instructions
in the path and I think we can do better with:
subsd xmm0, xmm1 ; {clock 1}
movsd xmm2, [signbits] ; -0.0 {asynchronous}
movaps xmm3, xmm2
andps xmm2, xmm0 ; sign(0.0,delta) {clock 4}
andnps xmm3, xmm0 ; abs(delta) {clock 4}
xorps xmm2, [minustwopi] ; -sign(2*pi,delta) {clock 5}
cmplesd xmm3, [pi] ; -1 or 0 {clock 5}
addsd xmm2, xmm0 ; delta-sign(2*pi,delta) {clock 6}
andps xmm0, xmm3 ; delta or 0 {clock 8}
andnps xmm3, xmm2 ; 0 or delta-sign(2*pi,delta) {clock 9}
orps xmm0, xmm3 ; delta or delta-sign(2*pi,delta) {clock 10}
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
Re: Help with Hand-Optimized Assembly, Bob Masta, 2012/03/28
Re: Help with Hand-Optimized Assembly, James Harris, 2012/03/28
Re: Help with Hand-Optimized Assembly, Markus Wichmann, 2012/03/28
Re: Help with Hand-Optimized Assembly, Jan Seiffert, 2012/03/28
Re: Help with Hand-Optimized Assembly, Bill Woessner, 2012/03/28