On Tue, Jul 5, 2011 at 9:12 PM, Marcus D. Leech
<address@hidden> wrote:
> What sort of CPU are you using?
>
> --Colby
AMD Phenom II X6 1055T, with 6GB of 1333MT/s memory. Rough ballpark
calculations show
me that even a 4096-bin FFT shouldn't take more than about
0.45GFlop/sec at 25Msps, and the
CPU is easily capable of at least 8GFlop/sec/core. So I'm not sure
why it's baffing at 25Msps.
I've tried using both the on-mobo 1GiGE interface, and a PCI-resident
one. Neither of those makes
any difference to getting large numbers of 'O' at 25Msps.
If I decimate by 3 or more before the FFT (after vectorizing), I runs
OK, consuming about 40% of the
total system CPU, and not producing any 'O'. I could then,
theoretically, process the FFT output vector
to extract only the magnitudes of the bins that correspond to my
channels of interest. But
decimating by 3 means that I'm losing sensitivity by a factor of
sqrt(3), which I'd rather not have
to "swallow", the application is for radio astronomy, where
sensitivity is quite important.
--