Re: conv2 performance

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: conv2 performance

From:	Michael D. Godfrey
Subject:	Re: conv2 performance
Date:	Mon, 01 Mar 2010 19:01:26 -0800
User-agent:	Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.8) Gecko/20100216 Thunderbird/3.0.2

On 3/1/10 10:28 AM, Robert T. Short wrote:

John Swensen wrote:
On Mar 1, 2010, at 10:22 AM, Robert T. Short wrote:
John Swensen wrote:
On Feb 28, 2010, at 10:53 AM, John W. Eaton wrote:
Maybe there is still room for improvement here. I would happilyuse a
free software library with a GPL-compatible license to implement this
function, but I don't know whether one is available.

jwe
I have recently been doing a lot of 2D convolutions. I think thefastest method should not involve loops of any kind. Asconvolution in the time domain (or spatial domain when consideringimages) is multiplication in the frequency domain, the fastestmethod is to take the FFT of both image and kernel, dot-multiplythem, then take the inverse FFT. Since the FFT method is usuallyprovided by FFTW, this should be optimized and quite fast. Ofcourse, there has to be some padding that takes place to make sureboth 'images' are the same size. I was using Matlab for thiscomputation and the speed improvement of the FFT method over theMatlab-provided conv2 was considerable (100 seconds versus 2second; I was convolving a 2048x2048 image with a 256x256 kernel).
I think the method is formally called the overlap-add method(http://en.wikipedia.org/wiki/Overlap-add_method). I used a scriptfrom MatlabCentral (no flaming please, as I already saw thediscussion that has been going on for a week or two). This is themethod used for many GPGPU implementations. There is an in-depthdescription of the best way to do the padding in an NVidia whitepaper that can be found athttp://developer.download.nvidia.com/compute/cuda/sdk/website/projects/convolutionFFT2D/doc/convolutionFFT2D.pdf
John
Certainly that is a faster way, especially for large convolutions,but I don't think conv or conv2 should do it this way. The straightapproach has important uses as well. Overlap/add and overlap/savecan also be used when the convolution is over multiple blocks andthat would be a useful library function as well, but again I thinkthat should be separate from conv and conv2.
Bob
I don't see why it should necessarily be separate, but simply aconditional usage based on the size of the convolutions. Shouldn'tshould try to implement something that is fast for both small andlarge convolutions, without the user having to download an extrapackage? One is already implemented and the other would take alittle bit of work.
John
Sorry I sent that just to John instead of the list.
Personally, I am not in favor of complicated argument lists to decidethe algorithm a function should use. I know MATLAB uses this a lot,but I think it obscures the basic simplicity of the function. Lookback through the history of conv and conv2 - for such simplefunctions, you would think that stability would have been achievedlong ago, but just a few months ago I submitted a change to conv.Adding more stuff inside will make it worse.
I feel that conv and conv2 should be MATLAB compatible both in formand function - don't add other stuff. Create separate functions fordft-based convolutions (fftconv?). It would be worth addingoverlap-add and overlap-save functions as well (I might even have some1d functions around). I don't know whether this stuff should go inthe core either.
I agree with Michael about the MATLAB engineer's analysis being a bitshallow, but I have seen similar analyses. I don't know the realanswer though.
BTW, the padding is not just to make the sequences the same size, but(normally) to maintain linear rather than circular convolution. Forlarge images and long impulse responses, this can get pretty yucky.
Bob

First, I would like to comment on John's suggestion that an "automatic"switch of algorithms mightbe good. This is possibly true if it can be made correct in the sensethat it is entirely transparent tothe user. A key to doing this is to ensure that the differences betweenstep n and n+1 of an iterativeset of calls always have the correct sign. An iterative numericalsolution algorithm typically usesfirst differences in a search for extreme values. If the sign changes,an extreme point has been reached.

It takes some care to ensure that the signs only change when they should.

This non-uniformity behavior of algorithms has been a source of seriousproblems since the beginningof digital computing. It took a while to get even the algorithms forthe elementary functions tosatisfy the uniform convergence condition (IBM was one of the mostprominent offenders -- their sin andcos routines "wiggled" near 0 and pi). We have Kahan to thank forstraightening much of this out.

In any case (not necessarily including matlab compatibility) it seemsbest to provide conv and conv2(accelerated as much as reasonably possible) and fftconv and fftconv2.Then the user can choose.For most cases which require a large amount of compute time thedimensions will be large enoughto justify using fftconv(2) For simple quick computations thedifference between the conv(2) and

fftconv(2) will, of course, be very small.

Michael

[Prev in Thread]

Current Thread

[Next in Thread]

Re: conv2 performance, (continued)
- Re: conv2 performance, Søren Hauberg, 2010/03/01
- Re: conv2 performance, John Swensen, 2010/03/01
  - Re: conv2 performance, Søren Hauberg, 2010/03/01
    - Re: conv2 performance, Michael D. Godfrey, 2010/03/01
  - Message not available
    - Message not available
    - Re: conv2 performance, Robert T. Short, 2010/03/01
    - Re: conv2 performance, Søren Hauberg, 2010/03/01
    - Re: conv2 performance, Michael D. Godfrey, 2010/03/02
    - Re: conv2 performance, Michael D. Godfrey <=
- conv2 performance, Lukas Reichlin, 2010/03/03
  - conv2 performance, John W. Eaton, 2010/03/03
    - Re: conv2 performance, Lukas Reichlin, 2010/03/04

Prev by Date: Re: svds.m change to avoid test failure
Next by Date: Re: conv2 performance
Previous by thread: Re: conv2 performance
Next by thread: conv2 performance
Index(es):
- Date
- Thread