[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] *much* faster filtering
From: |
Chen-Mou Cheng |
Subject: |
Re: [Discuss-gnuradio] *much* faster filtering |
Date: |
Tue, 10 May 2005 23:49:30 -0400 |
That's awesome! :)
BTW while I was reading the following comments in gr_fir_fff_simd.cc
// Round input data address down to 16 byte boundary
// NB: depending on the alignment of input[], memory
// before input[] will be accessed. The contents don't matter since
// they'll be multiplied by zero coefficients. I can't conceive of any
// situation where this could cause a segfault since memory protection
// in the x86 machines is done on much larger boundaries.
You probably won't get segfaults; on the other hand you may not get
correct answers either-- if one or more memory locations preceding
input[] happen to be infinity (you'll get a nan, not-a-number, if you
multiply zero and infinity). We suspect we have encountered such a
situation because the nan stuff went away after we switched to
gr_fir_fff_generic.cc
On 5/10/05, Eric Blossom <address@hidden> wrote:
> Do to some fine assembly language hacking by Stephane Fillod, we now
> have SSE and 3DNow! versions of the guts of the "fcc" and "ccf" FIR
> filters. "fcc" is float input, complex output, complex taps. "ccf"
> is complex input, complex output, float taps. The "ccf" variant is
> especially handy when working with the usrp, since we're generally
> dealing with complex baseband data.
>
> The new code is more than 8 times faster on the P4!
>
> ----------------------------------------------------------------
>
> Pentium M (1.4 GHz)
>
> address@hidden tests]$ ./benchmark_dotprod_fcc
> generic: taps: 256 input: 4e+07 cpu: 110.310 taps/sec: 9.283e+07
> SSE: taps: 256 input: 4e+07 cpu: 22.379 taps/sec: 4.576e+08
> address@hidden tests]$ ./benchmark_dotprod_ccf
> generic: taps: 256 input: 4e+07 cpu: 118.765 taps/sec: 8.622e+07
> SSE: taps: 256 input: 4e+07 cpu: 22.093 taps/sec: 4.635e+08
> address@hidden tests]$ ./benchmark_dotprod_fff
> generic: taps: 256 input: 4e+07 cpu: 16.966 taps/sec: 6.035e+08
> SSE: taps: 256 input: 4e+07 cpu: 11.194 taps/sec: 9.148e+08
>
> Athlon 1800+ MP (1.5 GHz)
>
> address@hidden tests]$ ./benchmark_dotprod_fcc
> generic: taps: 256 input: 4e+07 cpu: 106.544 taps/sec: 9.611e+07
> 3DNow!: taps: 256 input: 4e+07 cpu: 17.698 taps/sec: 5.786e+08
> SSE: taps: 256 input: 4e+07 cpu: 21.805 taps/sec: 4.696e+08
> address@hidden tests]$ ./benchmark_dotprod_ccf
> generic: taps: 256 input: 4e+07 cpu: 102.456 taps/sec: 9.994e+07
> 3DNow!: taps: 256 input: 4e+07 cpu: 16.247 taps/sec: 6.303e+08
> SSE: taps: 256 input: 4e+07 cpu: 21.743 taps/sec: 4.71e+08
> address@hidden tests]$ ./benchmark_dotprod_fff
> generic: taps: 256 input: 4e+07 cpu: 13.662 taps/sec: 7.495e+08
> 3DNow!: taps: 256 input: 4e+07 cpu: 8.252 taps/sec: 1.241e+09
> SSE: taps: 256 input: 4e+07 cpu: 9.982 taps/sec: 1.026e+09
>
> P4 (1.7 GHz)
>
> address@hidden tests]$ ./benchmark_dotprod_fcc
> generic: taps: 256 input: 4e+07 cpu: 144.956 taps/sec: 7.064e+07
> SSE: taps: 256 input: 4e+07 cpu: 18.968 taps/sec: 5.399e+08
> address@hidden tests]$ ./benchmark_dotprod_ccf
> generic: taps: 256 input: 4e+07 cpu: 152.732 taps/sec: 6.705e+07
> SSE: taps: 256 input: 4e+07 cpu: 18.525 taps/sec: 5.528e+08
> address@hidden tests]$ ./benchmark_dotprod_fff
> generic: taps: 256 input: 4e+07 cpu: 18.059 taps/sec: 5.67e+08
> SSE: taps: 256 input: 4e+07 cpu: 6.792 taps/sec: 1.508e+09
>
> _______________________________________________
> Discuss-gnuradio mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>