discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss-gnuradio] Re: TPB update


From: 'Eric Blossom'
Subject: [Discuss-gnuradio] Re: TPB update
Date: Thu, 11 Nov 2010 18:40:51 -0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Nov 12, 2010 at 10:05:28AM +1100, Balint Seeber wrote:
> Dear Eric,
> 
> I realised I was actually getting ahead of myself regarding scenario (1),
> because - of course - the sample rate means nothing in terms of timing if it
> is not a synchronous graph, and as I stated I didn't use Throttle. So the
> behaviour in (1) is expected. would you agree?

Yes.

> Still not sure about (3) though. Did the graph make it through okay?
> 
> Thanks very much once again,
> 
> Balint
> 


Using the single graph (the one you sent me):

Running case (1):

htop shows it burning 95% of one core and 25% of another.
Seems reasonable to me.  (On my 8-core Xeon)

I started oprofile, ran the flow graph for a while (> 10s), then
looked at the output of opreport:

  $ opreport --long-filenames  --symbols -t 0.5 >/tmp/report

It gives the report below, which isn't surprising.  That is, 57% of
the samples are in ccomplex_dotprod_sse (the innerloop of the
gr_fir_ccc_simd_filter, used by the resampler), and 16% are in
gr_sig_source_c::work (generating the complex sinusoid).

The cycles chargable to the resampler include ccomplex_dotprod_sse,
gr_fir_ccc_simd_filter, and gr_rational_resampler_base_ccc, which
comes out to ~69%.

(It's normalized to total samples counted)


CPU: Core 2, speed 3000.07 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (Unhalted core cycles) count 100000
samples  %        app name                 symbol name
17535154 57.4244  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
ccomplex_dotprod_sse
4966060  16.2629  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
gr_sig_source_c::work(int, std::vector<void const*, std::allocator<void const*> 
>&, std::vector<void*, std::allocator<void*> >&)
2909663   9.5286  /no-vmlinux              /no-vmlinux
2490431   8.1557  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
gr_fir_ccc_simd::filter(std::complex<float> const*)
1094391   3.5839  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
gr_rational_resampler_base_ccc::general_work(int, std::vector<int, 
std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> 
>&, std::vector<void*, std::allocator<void*> >&)
235207    0.7703  /lib64/libpthread-2.12.1.so pthread_mutex_lock



Running case (3):

htop shows it burning 95% of TWO cores and 25% of another.
Also seems reasonable to me.  One core for each of the two rational
resamplers, and 25% for the rest.

  $ opreport --long-filenames  --symbols -t 0.5 >/tmp/report3


CPU: Core 2, speed 3000.07 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask 
of 0x00 (Unhalted core cycles) count 100000
samples  %        app name                 symbol name
3931690  63.0917  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
ccomplex_dotprod_sse
611059    9.8056  /no-vmlinux              /no-vmlinux
557861    8.9520  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
gr_fir_ccc_simd::filter(std::complex<float> const*)
550223    8.8294  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
gr_sig_source_c::work(int, std::vector<void const*, std::allocator<void const*> 
>&, std::vector<void*, std::allocator<void*> >&)
248420    3.9864  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
gr_rational_resampler_base_ccc::general_work(int, std::vector<int, 
std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> 
>&, std::vector<void*, std::allocator<void*> >&)
55851     0.8962  /lib64/libpthread-2.12.1.so pthread_mutex_lock
31423     0.5042  /usr/local/lib64/libgnuradio-core-3.3.1git.so.0.0.0 
gr_tpb_detail::notify_upstream(gr_block_detail*)


In this case, it's about 76% from the two rational resamplers, 9% for
the sig gen, and 1.5% scheduler overhead (pthread_mutex_lock and
notify_upstream).  In reality, the ticks in the kernel should be
charged towards overhead too.


Is there any chance that you had some kind of power control or
frequency scaling going on?  If it's a laptop, be sure that it's in
"performance mode" and not "I want the battery to last a long time
mode"


Remember that Amdahl's Law gives the maximum speedup within a given
graph.  https://secure.wikimedia.org/wikipedia/en/wiki/Amdahl%27s_law


In any case, I think that you'll find a combination of htop and
oprofile should help shed some light on where the cycles are being
burned.

Eric



reply via email to

[Prev in Thread] Current Thread [Next in Thread]