On 7/27/07, Gordan Bobic <address@hidden> wrote:
On Fri, 27 Jul 2007, Jochen Küpper wrote:
[...example..]
Using floats instead of doubles can lead to quite significant performance
differences.
On you Pentium 3, not the average number cruncher these days.
A Opteron or any of the modern Intel CPUs would be more appropriate.
*sigh*
On an x86-64 Core2/1.9GHz, CentOS/x86-64 v5, ICC v9.1.051/x86-64
Using the small sample program I posted earlier.
Compiled with: icc -msse3 -xP -fp-model fast=2
Using floats: 2.65 seconds
Using doubles: 5.29 seconds
Twice as many floats vectorize per operation as doubles. Thus it goes
twice as fast. How much more evidence do you require?
No you guys got me interested.
Here is what I tried:
#include <stdio.h>
#include <math.h>
int main ()
{
const float foo = 29.123;
unsigned int j,k;
unsigned int i;
double a[] = {1,2,3,4,5,6,7,8};
double b[] = {5,6,7,8,9,10,11,12};
double c[] = {0,0,0,0,0,0,0,0};
for (k=0;k<100000;k++){
for (j=0;j<10000;j++){
for (i = 0; i < 8; i++)
{
c[ i ] = (j*k*(a[ i ]+b[ i ]));
}
}
}
printf("%f", c[3]);
return 0;
}
with gcc 4.1.1
gcc -O3 -march=pentium-m -malign-double -mfpmath=sse -msse2 -Wall -o
vect vect.c -ftree-vectorize -ftree-vectorizer-verbose=5
on a
x86 Family 6 Model 13 Stepping 8 GenuineIntel ~1862 Mhz
The multiplication with j and k ist just so that -O3 doesn't optimize
the outer loops to oblivion, and to raise the overall times above the
clock noise
The results are puzzling:
double, no vectorization: 23.797s
double vectorization: 23.858s
float, no vec: 15.561s
float, vec: 5.843s
long double, no vec (as sse2 is not enough...): 33.344s
Ok, I do understand why long double is slower than double (I think).
But why does vectorization not make the slightest bit of difference
when using doubles?