discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Recei


From: Andy Walls
Subject: Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Receivers
Date: Sun, 06 Mar 2016 16:08:24 -0500

On Sun, 2016-03-06 at 08:49 -0500, address@hidden
wrote:
> Message: 5
> Date: Sun, 06 Mar 2016 06:45:13 +0000 (GMT)
> From: Joshua Lilly 


> Hello,
> My name is Josh and I am interested in getting involved in GNU radio.
> Specifically, I would like to work on the above project idea for
> google summer of code 2016 by implementing Viterbi and demux
> algorithms in volk and testing the speed improvements. I have
> experience with python, c/c++, boost, and profiling with valgrind. I
> currently have read the getting involved page, compiled the code, I am
> working my way through some of the tutorials, and I have read through
> the code in volk. Even if I don't get accepted to google summer of
> code, I would still like to get involved in fixing bugs, or something
> since this seems like a really awesome project.

Hi Josh:

I'm only a kibitzer when it comes to the project, so I can't say
anything about GSoC acceptance.


> If it isn't too much to ask could someone point me to a nice beginner
> bug to fix in order to get my hands in the code?

However I can give you (and anyone who wants it) a relevant beginner
+intermediate thing to get your hands in the code.  The "intermediate"
part comes from your request to play in volk, which I don't consider
stuff for beginners.

So we'll start with a very conceptually simple thing to improve: adding
constant(s) to a sample stream.  Specifically measuring and improving
the performance of the add_const_vXX and add_const_XX blocks in
gnuradio/gr-blocks/lib.

See the attached GRC flowgraph and hand-tweaked add_const_performance.py
python script.


1. Measure the baseline performance of both the add_const_vss and
add_const_ss blocks at the high sample rate of 160 Msps.

$ ps -eLo pcpu,pid,tid,cls,rtprio,pcpu,comm

shows the add_const_vss or add_const_ss thread hovering around 70% and
57% repsectively.

For meaningful measurements you must run the flowgraph RT prioirty.


2. For an immediate performance increase for most users, add a new
gnuradio/gr-blocks/grc/blocks_add_const_xx.xml to the build that allows
users to select the faster, non-vector version of the add const block
from the GUI.


3. Measure the baseline of where the most CPU is being consumed in these
blocks.
You can use perf tools or oprofile tools or whatever works for you.  
For meaningful measurements you must run the flowgraph RT priority.
Odds are, it's the block's work() function that is consuming most of the
CPU.


4. Create volk kernels to replace the main operations in the work()
functions of these blocks, if you can.  Since adding a constant is so
simple, and ORC is very good about optimizing simple things, the volk
implementations should include an ORC implementation if possible.  Odds
are the ORC implementation will beat hand-written SIMD versions for x86
processors.  Use volk_profile to prove my guess about ORC right or
wrong. :)


5. Create volk-ized versions of the add_const blocks and remeasure their
performance.  How much improvement did you get?


6. Don't forget to add QA tests for the new volk functions.


As an alternate to the above:

1. Improve the performance of the nlog10_ff block by using log2,
algebra, volk, and skipping the add of k at the end, if k == 0.0.

2. Create a new approx_nlog10_ff block by taking advantage of the fact
that the log2 exponent in IEEE floats can be obtained with a mask and
shift operation.  Don't forget to add a GRC .xml file for the block and
QA test code.

> Thank you,
> Josh


Regards,
Andy

Attachment: add_const_performance.grc
Description: application/xml

Attachment: add_const_performance.py
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]