discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] *much* faster filtering


From: Chen-Mou Cheng
Subject: Re: [Discuss-gnuradio] *much* faster filtering
Date: Tue, 10 May 2005 23:49:30 -0400

That's awesome! :)

BTW while I was reading the following comments in gr_fir_fff_simd.cc 

  // Round input data address down to 16 byte boundary
  // NB: depending on the alignment of input[], memory
  // before input[] will be accessed. The contents don't matter since
  // they'll be multiplied by zero coefficients. I can't conceive of any
  // situation where this could cause a segfault since memory protection
  // in the x86 machines is done on much larger boundaries.

You probably won't get segfaults; on the other hand you may not get
correct answers either-- if one or more memory locations preceding
input[] happen to be infinity (you'll get a nan, not-a-number, if you
multiply zero and infinity).  We suspect we have encountered such a
situation because the nan stuff went away after we switched to
gr_fir_fff_generic.cc

On 5/10/05, Eric Blossom <address@hidden> wrote:
> Do to some fine assembly language hacking by Stephane Fillod, we now
> have SSE and 3DNow! versions of the guts of the "fcc" and "ccf" FIR
> filters.  "fcc" is float input, complex output, complex taps.  "ccf"
> is complex input, complex output, float taps.  The "ccf" variant is
> especially handy when working with the usrp, since we're generally
> dealing with complex baseband data.
> 
> The new code is more than 8 times faster on the P4!
> 
> ----------------------------------------------------------------
> 
> Pentium M (1.4 GHz)
> 
> address@hidden tests]$ ./benchmark_dotprod_fcc
>   generic: taps:  256  input: 4e+07  cpu: 110.310  taps/sec:  9.283e+07
>       SSE: taps:  256  input: 4e+07  cpu:  22.379  taps/sec:  4.576e+08
> address@hidden tests]$ ./benchmark_dotprod_ccf
>   generic: taps:  256  input: 4e+07  cpu: 118.765  taps/sec:  8.622e+07
>       SSE: taps:  256  input: 4e+07  cpu:  22.093  taps/sec:  4.635e+08
> address@hidden tests]$ ./benchmark_dotprod_fff
>   generic: taps:  256  input: 4e+07  cpu:  16.966  taps/sec:  6.035e+08
>       SSE: taps:  256  input: 4e+07  cpu:  11.194  taps/sec:  9.148e+08
> 
> Athlon 1800+ MP (1.5 GHz)
> 
> address@hidden tests]$ ./benchmark_dotprod_fcc
>   generic: taps:  256  input: 4e+07  cpu: 106.544  taps/sec:  9.611e+07
>    3DNow!: taps:  256  input: 4e+07  cpu:  17.698  taps/sec:  5.786e+08
>       SSE: taps:  256  input: 4e+07  cpu:  21.805  taps/sec:  4.696e+08
> address@hidden tests]$ ./benchmark_dotprod_ccf
>   generic: taps:  256  input: 4e+07  cpu: 102.456  taps/sec:  9.994e+07
>    3DNow!: taps:  256  input: 4e+07  cpu:  16.247  taps/sec:  6.303e+08
>       SSE: taps:  256  input: 4e+07  cpu:  21.743  taps/sec:   4.71e+08
> address@hidden tests]$ ./benchmark_dotprod_fff
>   generic: taps:  256  input: 4e+07  cpu: 13.662  taps/sec:  7.495e+08
>    3DNow!: taps:  256  input: 4e+07  cpu:  8.252  taps/sec:  1.241e+09
>       SSE: taps:  256  input: 4e+07  cpu:  9.982  taps/sec:  1.026e+09
> 
> P4 (1.7 GHz)
> 
> address@hidden tests]$ ./benchmark_dotprod_fcc
>   generic: taps:  256  input: 4e+07  cpu: 144.956  taps/sec:  7.064e+07
>       SSE: taps:  256  input: 4e+07  cpu:  18.968  taps/sec:  5.399e+08
> address@hidden tests]$ ./benchmark_dotprod_ccf
>   generic: taps:  256  input: 4e+07  cpu: 152.732  taps/sec:  6.705e+07
>       SSE: taps:  256  input: 4e+07  cpu:  18.525  taps/sec:  5.528e+08
> address@hidden tests]$ ./benchmark_dotprod_fff
>   generic: taps:  256  input: 4e+07  cpu:  18.059  taps/sec:   5.67e+08
>       SSE: taps:  256  input: 4e+07  cpu:   6.792  taps/sec:  1.508e+09
> 
> _______________________________________________
> Discuss-gnuradio mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
> 
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]