Re: [Discuss-gnuradio] Re-writing blocks using intel libraries

discuss-gnuradio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Re-writing blocks using intel libraries

From:	Matt Ettus
Subject:	Re: [Discuss-gnuradio] Re-writing blocks using intel libraries
Date:	Tue, 11 Dec 2007 19:55:10 -0800
User-agent:	Thunderbird 2.0.0.9 (X11/20071115)

General curiosity questions:

 Are you using oprofile to measure performance?
I am a bit of a maverick, and for various reasons am using a pure C++environment. I hacked my own 'connect_block' function (can;t wait forv3.2, where these will be part of native gr). I am measuring theperformance using a custom block (gr_throughput) that simply reportsthe average number of samples processed per second.

While pure C++ may be desirable for some reasons, performance is notreally one of them. When you use Python, it isn't running anything thatis really performance critical.


 Which blocks are causing you the biggest problem?

I got a 2x improvement on all the filtering blocks.

That isn't surprising. I believe our SSE filtering code was optimizedfor prior generations of processors, so a new Core2 optimized versionwould be useful, and likely competitive with IPP. Also, are you surethat when you compile our code with Intel's compiler that you are evengetting the SSE versions? Or are the pure C++ versions called?

Another thing, which I believe was mentioned earlier -- if you reallycare about FIR filter performance, you should be using the FFT versionsof the filters. The difference in performance can be huge, making the2x you get from IPP insignificant.

About a 40% improvement for sine/cosine generation blocks. Thisincludes gr_expj, gr_rotate.

There is definitely room for improvement here.

 Are your problems caused primarily by lack of CPU cycles, cache
 misses or mis-predicted branches?
I am not sure, since I am not at all a software expect (mostlydsp/comm). My guess is that the SSE instructions are not being used(or not used to a full extent). Even the 'multiply' block is VERYslow compared to a vector x vector multiplication in the Intellibrary. Some of the gr_blocks process each sample using a separatefunction call (e.g.
for (n=0; n<noutput_samples; n++)
        scale(in[n])

Replacing this with a single vectorized function call is much faster.


Those function calls should be inlined if nothing else.

In any case, GCC is not vectorizing this, but it would be trivial towrite it in SSE or intrinsics, which would allow this to be done in opensource code, without having to resort to IPP. That would be a veryuseful contribution.


Matt

[Prev in Thread]

Current Thread

[Next in Thread]

[Discuss-gnuradio] Re-writing blocks using intel libraries, Eugene Grayver, 2007/12/11
- Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Eric Blossom, 2007/12/11
  - Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Eugene Grayver, 2007/12/11
    - Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Dan Halperin, 2007/12/11
    - Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Eric Blossom, 2007/12/11
    - Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Eric Blossom, 2007/12/11
    - Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Martin Dvh, 2007/12/11
    - Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Tom Rondeau, 2007/12/12
    - Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Martin Dvh, 2007/12/12
    - Re: [Discuss-gnuradio] Re-writing blocks using intel libraries, Matt Ettus <=

Prev by Date: Re: [Discuss-gnuradio] make check errors
Next by Date: Re: [Discuss-gnuradio] make check errors
Previous by thread: Re: [Discuss-gnuradio] Re-writing blocks using intel libraries
Next by thread: [Discuss-gnuradio] make check errors
Index(es):
- Date
- Thread