Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers nee

help-gsl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers nee

From:	Gordan Bobic
Subject:	Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL...
Date:	Mon, 06 Aug 2007 19:01:16 +0100
User-agent:	Thunderbird 2.0.0.6 (Windows/20070728)

Oliver Jennrich wrote:

On 7/27/07, Gordan Bobic <address@hidden> wrote:

On Fri, 27 Jul 2007, Jochen Küpper wrote:

[...example..]
Using floats instead of doubles can lead to quite significant performance
differences.

On you Pentium 3, not the average number cruncher these days.
A Opteron or any of the modern Intel CPUs would be more appropriate.

*sigh*

On an x86-64 Core2/1.9GHz, CentOS/x86-64 v5, ICC v9.1.051/x86-64
Using the small sample program I posted earlier.
Compiled with: icc -msse3 -xP -fp-model fast=2

Using floats: 2.65 seconds
Using doubles: 5.29 seconds

Twice as many floats vectorize per operation as doubles. Thus it goes
twice as fast. How much more evidence do you require?


No you guys got me interested.

Here is what I tried:

#include <stdio.h>
#include <math.h>
int main ()
{
  const float foo = 29.123;

  unsigned int    j,k;
  unsigned int    i;
  double a[] = {1,2,3,4,5,6,7,8};
  double b[] = {5,6,7,8,9,10,11,12};
  double c[] = {0,0,0,0,0,0,0,0};

  for (k=0;k<100000;k++){
    for (j=0;j<10000;j++){
      for (i = 0; i < 8; i++)
        {
          c[ i ] = (j*k*(a[ i ]+b[ i ]));
        }
    }
  }
  printf("%f", c[3]);
  return 0;
}

with gcc 4.1.1
gcc -O3 -march=pentium-m -malign-double -mfpmath=sse -msse2  -Wall -o
vect vect.c -ftree-vectorize -ftree-vectorizer-verbose=5

on a
x86 Family 6 Model 13 Stepping 8 GenuineIntel ~1862 Mhz

The multiplication with j and k ist just so that -O3 doesn't optimize
the outer loops to oblivion, and to raise the overall times above the
clock noise

The results are puzzling:

double, no vectorization: 23.797s
double vectorization: 23.858s
float, no vec: 15.561s
float, vec: 5.843s

long double, no vec (as sse2 is not enough...): 33.344s

Ok, I do understand why long double is slower than double (I think).
But why does vectorization not make the slightest bit of difference
when using doubles?

Assuming that GCC's optimizer doesn't do something daft here (and that'sa pretty big assumption), you are only getting partial vectorizationhere. You cannot mix types in vectorizable statements. Mixing typesmakes them non-vectorizable. Use a shadow iterator of the same type asyour other data elements:


 for (k = 0, kk = 0;k < 100000; k++)
    for (j = 0, jj = 0;j < 10000; j++)
      for (i = 0; i < 8; i++)
          c[i] = (jj++ * kk++ * (a[i] + b[i]));

where kk and jj are of the same type as a[], b[] and c[];
You'll find that goes faster and vectorizes better.

Gordan

[Prev in Thread]

Current Thread

[Next in Thread]

[Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Bojan Nikolic, 2007/08/02
- Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Oliver Jennrich, 2007/08/06
  - Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL..., Gordan Bobic <=

Prev by Date: Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL...
Next by Date: Re: [Help-gsl] GSL 1.9: Levenberg-Marquardt non-linear fit with selected invariant coefficients not to be fit, HOWTO?
Previous by thread: Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL...
Next by thread: [Help-gsl] gsl_linalg_LU_decomp Function
Index(es):
- Date
- Thread