[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: memchr2 speed, gcc
From: |
Brian Dessent |
Subject: |
Re: memchr2 speed, gcc |
Date: |
Mon, 03 Mar 2008 19:20:26 -0800 |
Bruno Haible wrote:
> Btw, how do you need to write code such that gcc uses the SSE3 instructions?
You mean auto-vectorization, as opposed to explicitly using the
mmintrin.h or __builtin_foo APIs? I think you need to specify a -march=
that names an architecture that has sse3 (or just -msse3, but that
should be implied by an appropriate -march=) as well as
-ftree-vectorize. I think that -ftree-vectorize is enabled at -O3 but
I'm not positive.
Two other notes: starting with 4.2, the gcc default -mtune= is now
'generic' (instead of the old default of pentiumpro) which is meant to
be a blended tuning that is appropriate for a wide class of today's most
common architectures - Athlon, Opteron, Pentium M, Pentium 4, and Core
2. Thus with gcc >= 4.2 you would expect to see less difference between
[no -mtune= specified] and [-mtune=athlon specified] than with older
versions given this new default.
Also, gcc >= 4.2 offers -mtune=native and -march=native which sets the
arch and tune respectively to whatever is appropriate for the host
machine, based on cpuid.
Brian