|
From: | Marcus Müller |
Subject: | Re: [Discuss-gnuradio] questions about the GNURadio Scheduler |
Date: | Sun, 7 Feb 2016 22:35:31 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 |
Hello Gonzalo, On 07.02.2016 20:58, Gonzalo Arcos
wrote:
Basically, each block_executor has its own loop; roughly it's about this: * Wait for additional input to come in or output buffer to be consumed and hence, ready for overwriting, or for new messages to come in * If messages came in, handle them! * Ask the block (via forecast) whether it can run (general_)work() * run that! * Notify upstream blocks of how many items you've consumed, freeing space in their output buffers, * Notify downstream blocks of how many items you've produced, so they can start to work on that. * begin from the top. Yes. Exactly! Indeed, GNU Radio asks A to produce up to [size of A's output buffer in items]/2, so that as soon as its finished, B can start working, but A can go back to work right away, maximizing parallelism. If A is faster than B, A's input buffer will most of the time be empty, while A's output == B's input buffer will be full; as long as A has no space to write items to, it won't get asked to work(). Yes! Current versions of GNU Radio (I think since 3.7.2 or so) have proper thread names, so running `htop` or a similar Unix program will show which thread is running, consuming CPU etc. For more in-depth analysis, I'd recommend having a look at `perf record` / `perf report` [1], or even more advanced, GNU Radio's built-in performance counters and performance monitor; activate them as explained in [2], and add a "performance monitor" to your GRC flowgraph, if you use GRC, or run `gr-perf-monitorx`. The point is that you might, for example, be using a block that uses a certain hardware accelerator, which is "close" to one CPU, but not to another. For most PC-style workstations, this won't happen, and it's best to let GNU Radio and your OS figure out on which CPU to schedule threads on their own. I've personally yet to discover a case where this is useful. Sure! Often, especially on HyperThreading machines, it makes a lot of sense to let one operation be fast really quick, because all the data it accesses needs to go to the CPU caches only once. For example, even in relatively complex flow graphs with lots of blocks where all CPU cores are kept busy all the time, FFTs that are run with multiple threads tend to increase overall performance. This is basically another variation of the "old" truth that on modern hardware, it's typically better to process data in large chunks uniformly; that might increase latency, but typically, the latency lost is made up by higher system throughput. Best regards, Marcus [1] https://lists.gnu.org/archive/html/discuss-gnuradio/2015-05/msg00320.html [2] https://gnuradio.org/redmine/projects/gnuradio/wiki/PerformanceCounters
|
[Prev in Thread] | Current Thread | [Next in Thread] |