|
From: | Philip Balister |
Subject: | Re: [Discuss-gnuradio] Performance on ARM Cortex-A8 |
Date: | Fri, 15 Jul 2011 16:42:51 -0400 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.18) Gecko/20110621 Fedora/3.1.11-1.fc14 Thunderbird/3.1.11 |
On 07/15/2011 04:24 PM, Marcus D. Leech wrote:
On 07/13/2011 04:40 AM, Riadh Elloumi wrote:Hi all, I complied DAB demodulation for ARM Cortex-A8 (TI OMAP 3). It successfully demodulate DAB+ but spends 13 seconds decoding 1 second of radio baseband (USRP file). I used all the optimized code for Cortex-A8 like dotprod_ccf_armv7_a.c. My compilation flags are: -mcpu=cortex-a8 -mfloat-abi=softfp -mfpu=neon -O2. I used fftw-3.2.2.What does -mfloat-abi=softfp do? Does that cause software floating-point to be used? If it does, then your floating-point performance is going to be completely awful.
No, that chooses the soft float ABI only. Basically, return values can not be in NEON registers. This is not to bad, since we normally are passing pointers to arrays.
We can compile the entire system with the hard float ABI, but it is not a big win and adds some complexity for people using certain binary only libraries (which are usually built with soft float).
A good test for comparing oranges/oranges would be to construct simple C program that does, let's say, 10e6 single-precision floating-point multiply/accumulate operations, and compare among platforms with simiilar clock speeds, etc.
From a quick look at Tom's oprofile results, first find out who is calling into libm and see if you can change the block to stopp calling libm. For example, calculate sin/cos via a table approximation (I think GNU Radio already does that).
Then look at the signal processing blocks that are next in usage and do some NEON optimizations using ORC.
Philip
Why is gnu radio too slow demodulating DAB+? Do you have some figures of CPU consumption on ARM Cortex cores? Is there some optimization I missed for the platform?
[Prev in Thread] | Current Thread | [Next in Thread] |