[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Recei
From: |
Joshua Lilly |
Subject: |
Re: [Discuss-gnuradio] Speed Optimization and Application for ATSC Receivers |
Date: |
Fri, 11 Mar 2016 22:56:15 -0500 |
Hey Andy,
Thanks for the reply. I will take another look at the code I think I know what
to do now. I will make sure the mailing list is included from now on.
Thanks again.
Josh
> On Mar 11, 2016, at 10:20 AM, Andy Walls <address@hidden> wrote:
>
> Hi Josh:
>
> I misread your question. See my additional answer below
>
>> On Fri, 2016-03-11 at 02:34 +0000, Joshua Lilly wrote:
>> Hey Andy,
>>
>> Just had a quick question about item number two on this list.
>>
>>
>>
>> 2. For an immediate performance increase for most users, add a new
>> gnuradio/gr-blocks/grc/blocks_add_const_xx.xml to the build that
>> allows
>> users to select the faster, non-vector version of the add const block
>> from the GUI.
>>
>>
>> After reading through the tweaked python script it looked like the
>> add_const_xx block should consist of the add_const_ss block? However,
>> if that is the case isn't this already taken care of with the add_xx
>> block?
>
> No. add_xx adds multiple input streams together. add_const_vxx adds a
> constant to the input stream.
>
> Drop both types of add blocks in the flowgraph within the GRC GUI, and
> you will immediately see the difference.
>
> Regards,
> Andy
>
>>
>>
>> Thanks for your help,
>>
>> Josh
>
>
>
>>
>> On Mar 06, 2016, at 01:08 PM, Andy Walls <address@hidden>
>> wrote:
>>
>>> On Sun, 2016-03-06 at 08:49 -0500, address@hidden
>>> wrote:
>>>> Message: 5
>>>> Date: Sun, 06 Mar 2016 06:45:13 +0000 (GMT)
>>>> From: Joshua Lilly
>>>
>>>
>>>> Hello,
>>>> My name is Josh and I am interested in getting involved in GNU
>>>> radio.
>>>> Specifically, I would like to work on the above project idea for
>>>> google summer of code 2016 by implementing Viterbi and demux
>>>> algorithms in volk and testing the speed improvements. I have
>>>> experience with python, c/c++, boost, and profiling with valgrind.
>>>> I
>>>> currently have read the getting involved page, compiled the code,
>>>> I am
>>>> working my way through some of the tutorials, and I have read
>>>> through
>>>> the code in volk. Even if I don't get accepted to google summer of
>>>> code, I would still like to get involved in fixing bugs, or
>>>> something
>>>> since this seems like a really awesome project.
>>>
>>> Hi Josh:
>>>
>>> I'm only a kibitzer when it comes to the project, so I can't say
>>> anything about GSoC acceptance.
>>>
>>>
>>>> If it isn't too much to ask could someone point me to a nice
>>>> beginner
>>>> bug to fix in order to get my hands in the code?
>>>
>>> However I can give you (and anyone who wants it) a relevant beginner
>>> +intermediate thing to get your hands in the code. The
>>> "intermediate"
>>> part comes from your request to play in volk, which I don't consider
>>> stuff for beginners.
>>>
>>> So we'll start with a very conceptually simple thing to improve:
>>> adding
>>> constant(s) to a sample stream. Specifically measuring and improving
>>> the performance of the add_const_vXX and add_const_XX blocks in
>>> gnuradio/gr-blocks/lib.
>>>
>>> See the attached GRC flowgraph and hand-tweaked
>>> add_const_performance.py
>>> python script.
>>>
>>>
>>> 1. Measure the baseline performance of both the add_const_vss and
>>> add_const_ss blocks at the high sample rate of 160 Msps.
>>>
>>> $ ps -eLo pcpu,pid,tid,cls,rtprio,pcpu,comm
>>>
>>> shows the add_const_vss or add_const_ss thread hovering around 70%
>>> and
>>> 57% repsectively.
>>>
>>> For meaningful measurements you must run the flowgraph RT prioirty.
>>>
>>>
>>> 2. For an immediate performance increase for most users, add a new
>>> gnuradio/gr-blocks/grc/blocks_add_const_xx.xml to the build that
>>> allows
>>> users to select the faster, non-vector version of the add const
>>> block
>>> from the GUI.
>>>
>>>
>>> 3. Measure the baseline of where the most CPU is being consumed in
>>> these
>>> blocks.
>>> You can use perf tools or oprofile tools or whatever works for you.
>>> For meaningful measurements you must run the flowgraph RT priority.
>>> Odds are, it's the block's work() function that is consuming most of
>>> the
>>> CPU.
>>>
>>>
>>> 4. Create volk kernels to replace the main operations in the work()
>>> functions of these blocks, if you can. Since adding a constant is so
>>> simple, and ORC is very good about optimizing simple things, the
>>> volk
>>> implementations should include an ORC implementation if possible.
>>> Odds
>>> are the ORC implementation will beat hand-written SIMD versions for
>>> x86
>>> processors. Use volk_profile to prove my guess about ORC right or
>>> wrong. :)
>>>
>>>
>>> 5. Create volk-ized versions of the add_const blocks and remeasure
>>> their
>>> performance. How much improvement did you get?
>>>
>>>
>>> 6. Don't forget to add QA tests for the new volk functions.
>>>
>>>
>>> As an alternate to the above:
>>>
>>> 1. Improve the performance of the nlog10_ff block by using log2,
>>> algebra, volk, and skipping the add of k at the end, if k == 0.0.
>>>
>>> 2. Create a new approx_nlog10_ff block by taking advantage of the
>>> fact
>>> that the log2 exponent in IEEE floats can be obtained with a mask
>>> and
>>> shift operation. Don't forget to add a GRC .xml file for the block
>>> and
>>> QA test code.
>>>
>>>> Thank you,
>>>> Josh
>>>
>>>
>>> Regards,
>>> Andy
>
>