[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: conv2 performance
From: |
Michael D. Godfrey |
Subject: |
Re: conv2 performance |
Date: |
Mon, 01 Mar 2010 19:01:26 -0800 |
User-agent: |
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100216 Thunderbird/3.0.2 |
On 3/1/10 10:28 AM, Robert T. Short wrote:
John Swensen wrote:
On Mar 1, 2010, at 10:22 AM, Robert T. Short wrote:
John Swensen wrote:
On Feb 28, 2010, at 10:53 AM, John W. Eaton wrote:
Maybe there is still room for improvement here. I would happily
use a
free software library with a GPL-compatible license to implement this
function, but I don't know whether one is available.
jwe
I have recently been doing a lot of 2D convolutions. I think the
fastest method should not involve loops of any kind. As
convolution in the time domain (or spatial domain when considering
images) is multiplication in the frequency domain, the fastest
method is to take the FFT of both image and kernel, dot-multiply
them, then take the inverse FFT. Since the FFT method is usually
provided by FFTW, this should be optimized and quite fast. Of
course, there has to be some padding that takes place to make sure
both 'images' are the same size. I was using Matlab for this
computation and the speed improvement of the FFT method over the
Matlab-provided conv2 was considerable (100 seconds versus 2
second; I was convolving a 2048x2048 image with a 256x256 kernel).
I think the method is formally called the overlap-add method
(http://en.wikipedia.org/wiki/Overlap-add_method). I used a script
from MatlabCentral (no flaming please, as I already saw the
discussion that has been going on for a week or two). This is the
method used for many GPGPU implementations. There is an in-depth
description of the best way to do the padding in an NVidia white
paper that can be found at
http://developer.download.nvidia.com/compute/cuda/sdk/website/projects/convolutionFFT2D/doc/convolutionFFT2D.pdf
John
Certainly that is a faster way, especially for large convolutions,
but I don't think conv or conv2 should do it this way. The straight
approach has important uses as well. Overlap/add and overlap/save
can also be used when the convolution is over multiple blocks and
that would be a useful library function as well, but again I think
that should be separate from conv and conv2.
Bob
I don't see why it should necessarily be separate, but simply a
conditional usage based on the size of the convolutions. Shouldn't
should try to implement something that is fast for both small and
large convolutions, without the user having to download an extra
package? One is already implemented and the other would take a
little bit of work.
John
Sorry I sent that just to John instead of the list.
Personally, I am not in favor of complicated argument lists to decide
the algorithm a function should use. I know MATLAB uses this a lot,
but I think it obscures the basic simplicity of the function. Look
back through the history of conv and conv2 - for such simple
functions, you would think that stability would have been achieved
long ago, but just a few months ago I submitted a change to conv.
Adding more stuff inside will make it worse.
I feel that conv and conv2 should be MATLAB compatible both in form
and function - don't add other stuff. Create separate functions for
dft-based convolutions (fftconv?). It would be worth adding
overlap-add and overlap-save functions as well (I might even have some
1d functions around). I don't know whether this stuff should go in
the core either.
I agree with Michael about the MATLAB engineer's analysis being a bit
shallow, but I have seen similar analyses. I don't know the real
answer though.
BTW, the padding is not just to make the sequences the same size, but
(normally) to maintain linear rather than circular convolution. For
large images and long impulse responses, this can get pretty yucky.
Bob
First, I would like to comment on John's suggestion that an "automatic"
switch of algorithms might
be good. This is possibly true if it can be made correct in the sense
that it is entirely transparent to
the user. A key to doing this is to ensure that the differences between
step n and n+1 of an iterative
set of calls always have the correct sign. An iterative numerical
solution algorithm typically uses
first differences in a search for extreme values. If the sign changes,
an extreme point has been reached.
It takes some care to ensure that the signs only change when they should.
This non-uniformity behavior of algorithms has been a source of serious
problems since the beginning
of digital computing. It took a while to get even the algorithms for
the elementary functions to
satisfy the uniform convergence condition (IBM was one of the most
prominent offenders -- their sin and
cos routines "wiggled" near 0 and pi). We have Kahan to thank for
straightening much of this out.
In any case (not necessarily including matlab compatibility) it seems
best to provide conv and conv2
(accelerated as much as reasonably possible) and fftconv and fftconv2.
Then the user can choose.
For most cases which require a large amount of compute time the
dimensions will be large enough
to justify using fftconv(2) For simple quick computations the
difference between the conv(2) and
fftconv(2) will, of course, be very small.
Michael
- Re: conv2 performance, (continued)
Re: conv2 performance, Søren Hauberg, 2010/03/01
Re: conv2 performance, John Swensen, 2010/03/01
- Re: conv2 performance, Søren Hauberg, 2010/03/01
- Message not available
- Re: conv2 performance,
Michael D. Godfrey <=
conv2 performance, Lukas Reichlin, 2010/03/03