discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [VOLK] a += b*c ?


From: Marcus Müller
Subject: Re: [VOLK] a += b*c ?
Date: Thu, 18 Aug 2022 14:03:03 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0

Do we have documentation that an `add` implementation must be able to work in-place? Otherwise, we should probably write that down :)

Also, on the API: C99 wise, I'm pretty sure this is a strict aliasing rule violation: pointers of different types mustn't point to the same data. The compiler is totally allowed to assume the first and second argument to volk_32f_x2_add_32f are pointing *distinct* objects, and hence could optimize as if the (const) a never changes as soon as the function has been entered. But exactly that happens. Don't see how this can go wrong for an operation like addition, but I honestly think the semantics of type_x2_operation_type should be that two inputs that are not the output are passed. If we want in-place kernels, we should probably have them separately.

Cheers,
Marcus


On 8/16/22 14:13, Johannes Demel wrote:
Hi Randall,

in your case,

https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32f_x2_multiply_32f.h

followed by
https://github.com/gnuradio/volk/blob/main/kernels/volk/volk_32f_x2_add_32f.h

would be the way to go at the moment.
```
volk_32f_x2_multiply_32f(multiply_result, b, c, num_samples);
volk_32f_x2_add_32f(a, a, multiply_result, num_samples);
```

You're welcome to start a new kernel
```
volk_32f_x3_multiply_add_32f(out, a, b, c, num_samples);
```
In fact, it would be a great addition to VOLK.

Cheers
Johannes


On 16.08.22 01:38, Randall Wayth wrote:
Thanks for the suggestions and apologies for not being 100% clear at the start.
I'm not looking for a dot product. I'm looking for
a[i] += b[i]*c[i]     specifically for floating point

So it would be the equivalent of IPP's ippsAddProduct_32f.
The application is to apply a window to a set of samples before accumulating, to implement a weighted overlap add PFB. In my case the samples are real-valued, but I could also see a case for a and b being complex, or the case for b being 8 or 16-bit ints with a and c being floating point.

Cheers,
Randall



reply via email to

[Prev in Thread] Current Thread [Next in Thread]