Re: [Help-gsl] C/C++ speed optimization bible/resources/pointers needed

help-gsl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-gsl] C/C++ speed optimization bible/resources/pointers needed

From:	Gordan Bobic
Subject:	Re: [Help-gsl] C/C++ speed optimization bible/resources/pointers needed, and about using GSL...
Date:	Fri, 27 Jul 2007 08:51:19 +0100 (BST)

On Fri, 27 Jul 2007, Michael wrote:

I am in the middle of programming to solve an engineering problem
where the speed is huge concern. The project involving lots of
numerical integration and then there are several loops/levels of
optimization on top of the function evaluation engine.


A few general rules I've found to help a lot:

- Don't use unnecessary precision if you don't need it. Don't use a doubleif a float will do. This is particularly important it code that thecompiler can vectorize. Even if your SIMD vector unit can handle doubles,it can typically handle twice as many floats as doubles for the sameoperation in the same amount of time.

- Use static variables in your functions wherever possible. Thisespecially goes for arrays, if you can get away with it. If you know themaximum size you'll need for an array, static declare it once, and justuse that. malloc()/new is slow.

- Use a good compiler for your target platform. Sadly, GCC isn't great.It's not even good. When it comes to tight vectorized loops and you don'twant to reach for the hand crafted assembler, I have seen performanceboosts of between 4.5x (on P3 / Core2) and 8.5x (P4) when using Intel'sICC compiler instead of GCC. GSL certainly compiles and works fine withICC. IBM have a compiler for the PowerPC, and I believe Sun have their ownoptimizing compiler for the SPARC. They are almost certainly worth lookinginto.

As you probably
know, the key to a successful optimization is a fast underlying
objective function evaluator. The faster it is, the more promising the
optimization result(perhaps global optimal). However our project
requires many numerical integrations which prohibits us from making it
super fast. At the heart of the numerical integration is a smart
integrator and a super-fast integrand function evaluator. Even worse,
our function evaluation is in complex-domain. So the key point is how
to arrange our C/C++ code to make it highly efficient in every aspect.
Could anybody give some advice/pointers on how to improve the speed of
C/C++ program? How to arrange code? How to make it highly efficient
and super fast? What options do I have if I don't have luxury to use
multi-threaded, multi-core or distributed computing? But I do have a
P4 at least. Please recommend some good bibles and resources! Thank
you!

On a P4, ICC will utterly annihilate GCC in terms of performance of theresulting code, especially when it comes to heavy maths. Get a copy andtry it. Enable -vec-report3 and see which of your loops don't vectorize.Where possible, re-arrange them so that they do vectorize. The compileroften needs a hand with assorted little hacks to help it vectorize thecode, but they are generally not ugly, and are most of the time limitedto:

1) Copy the object property into a local variable. This will help persuadecompiler that there is no vector dependance it needs to worry about.

2) If you have a loop where you are operating with the iterator on anarray, have a shadow iterator of a vectorizable type. Remember you cannotuse mixed type in vector operations. For example, you can do:


static unsigned int     i;
static float            ii;
static float            baz[16];

for (i = 0, ii = 0; i < 16; i++)
        baz[i] *= ii; // vectorizes
        //baz[i] *= i; // doesn't vectorize

3) If your function parameters are changing partially, evaluate thempartially and cache the results for each part so you don't have tore-calculate. For example, if your function is something like:


Y = a * (bX^2 + c) + d;

Arrange your loops so the outer-most one works on the inner-mostevaluation (in this case X*X, followed by multiplication by b, followed byaddition of c, followed my multiplication by a, followed by addition of din the innermost loop. You can then cache the values of X*X (which is,incidentally, much faster than (pow(X,2)), b*X^2, b*X^2+c, etc, so whenyour outer parameters change, you don't have to re-calculate the innerterms. How you design your caches is also important. This can cause moreoverhead than it saves, so you have to optimize it very carefully whilekeeping your underlying algorithm structure in mind.

4) Keep your data sizes in mind. If your frequently used data doesn't fitin the CPU caches, you are likely to start experiencing slow-downs on theorder of 20x or so due to latencies. Use a float when you don't need adouble, as they are half the size.

5) Write the optimized code yourself. GSL and similar libraries are greatfor a rapid proof of concept prototype, but there is a price to pay interms of performance when using generic code vs. bespoke code specificallyoptimized for a particular task.

6) Learn the compiler switches for your compiler. Test the accuracy ofyour resulting numbers. When you start cutting corners (e.g. "-ffast-math-mfpmath=sse,387" on GCC, "-fp-model fast=2 -rcd" on ICC) you may get morespeed, but sometimes the precision on your floats will reduce. This may ormay not be acceptable for your application.

7) Think about your algorithms. If you are doing a calculation that is amodulo of a power of 2 on an int, use a bit-wise AND instead. It is anorder of magnitude faster. (e.g. instead of X % 4, do X &= 0x3).

There are hundreds of little hacks you can do to speed your code up. It isimpossible to simmarize them all, and they will differ from project toproject and they won't all be appropriate all the time. I hope this getsyou started on the right path, though. :-)


Good luck.

Gordan

[Prev in Thread]

Current Thread

[Next in Thread]

[Help-gsl] C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Michael, 2007/07/27
- Re: [Help-gsl] C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Gordan Bobic <=
  - [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Lionel B, 2007/07/27
    - Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Gordan Bobic, 2007/07/27
    - Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Jochen Küpper, 2007/07/27
    - Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Gordan Bobic, 2007/07/27
    - Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Jochen Küpper, 2007/07/27
    - Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Gordan Bobic, 2007/07/27
    - Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Marc Baaden, 2007/07/27

Prev by Date: [Help-gsl] seeking highly efficient caches scheme for demanding engineering computing?
Next by Date: Re: [Help-gsl] seeking highly efficient caches scheme for demanding engineering computing?
Previous by thread: [Help-gsl] C/C++ speed optimization bible/resources/pointers needed, and about using GSL...
Next by thread: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL...
Index(es):
- Date
- Thread