discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] AMD 64 X2


From: Eric Blossom
Subject: Re: [Discuss-gnuradio] AMD 64 X2
Date: Fri, 4 Nov 2005 16:19:37 -0800
User-agent: Mutt/1.5.6i

On Fri, Nov 04, 2005 at 04:49:43PM -0500, cswiger wrote:
> Since AMD 64 dual core chips are getting affordable, any thoughts on if
> gnuradio can take advantage of them as it is? Quoth one review, they're:
> 
> "aimed at users who need high arithmetic performance and use mainly
> multithreaded applications."

Currently if the flow graph is partitionable into two or more disjoint
graphs, each subgraph runs it's own thread.  Pretty much any
transceiver code falls in this category.

Taking advantage of more than one processor within the same subgraph
will take more work, but should be doable.


>From an earlier posting:

    The idea in the 2.x world is to dynamically partition the workload
    between the processors available, while taking advantage of
    thread/processor affinity.  To give a simplified example, imagine a
    signal processing graph with 8 blocks in it and a dual processor
    system.  Topologically sort the graph and assign the first 4 blocks to
    cpu 0, and the final 4 blocks to cpu 1.  The cpus can run pretty much
    independently of each other with good memory and cache locality, with
    the proviso that there's a buffer that's shared at the boundary of the
    partition.  cpu 0 writes into the buffer, cpu 1 reads from the buffer.
    Access to this buffer is of course serialized with a mutex, etc.  When
    the producer/consumer rendezvous occurs, on the average (assuming that
    each block requires a relatively constant amount of cpu, memory
    bandwidth, etc for a particular throughput), either cpu 0 is going to
    be write blocked or cpu 1 is going to be read blocked.  If cpu 0 is
    write blocked (meaning that it's getting done with its work before cpu
    1 is), we can change the partitioning by migrating the fifth block
    from cpu 1 to to cpu 0, thereby changing the relative work loads.
    Assume we low-pass filter this repartitioning activity, so that we're
    not migrating the same block back and forth.  It should settle down to
    a partitioning that's getting everything it can out of all cpus
    available while maintaining good locality and low coordination
    overhead.

Eric




reply via email to

[Prev in Thread] Current Thread [Next in Thread]