[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-gnubg] Vectorization of neural nets just commited into CVS
From: |
Jim Segrave |
Subject: |
Re: [Bug-gnubg] Vectorization of neural nets just commited into CVS |
Date: |
Sun, 1 May 2005 10:35:04 +0200 |
User-agent: |
Mutt/1.4.2.1i |
On Thu 28 Apr 2005 (22:51 +0200), Øystein Johansen wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> I've just added the code that adds neural net evaluation vectorized for
> sse. To compile: #define USE_SSE_VECTORIZE 1
>
> If someone can add the Makefile magic that would have been fine.
>
> I have done vectorization of both Evaluate and EvaluateFromBase. I not
> vectorized the evaluation with pruning nets.
>
> Also, I have not aligned the ar and arInput arrays in neuralnet. This
> may lead to some problems.
I'm getting coredumps from this when analysing a match.
Core was generated by `gnubg'.
Program terminated with signal 10, Bus error.
#0 Evaluate128 (pnn=0x8650060, arInput=0xbfbfa4f0, ar=0xbfbfa2c0,
arOutput=0xbfbfaf60, saveAr=0x8703410) at
/usr/include/xmmintrin.h:852
#1 0x08124038 in NeuralNetEvaluate128 (pnn=0x8650060,
arInput=0xbfbfa4f0,
arOutput=0xbfbfaf60, t=140836960) at neuralnet.c:1256
#2 0x0807bee9 in EvalRace (anBoard=0xbfbfaf80, arOutput=0xbfbfaf60,
bgv=VARIATION_STANDARD) at eval.c:2175
#3 0x0807ccc3 in EvaluatePositionFull (anBoard=0xbfbfaf80,
arOutput=0xbfbfaf60, pci=0xbfbfaf20, pec=0x82419a4, nPlies=0,
pc=CLASS_RACE) at eval.c:2899
#4 0x0807cf60 in EvaluatePositionCache (anBoard=0xbfbfaf80,
arOutput=0xbfbfaf60, pci=0xbfbfaf20, pecx=0x82419a4, nPlies=0,
pc=CLASS_RACE) at eval.c:3061
#5 0x0807d10f in EvaluatePosition (anBoard=0xbfbfaf80,
arOutput=0xbfbfaf60,
pci=0xbfbfaf20, pec=0x0) at eval.c:3125
(gdb) f 1
case NNEVAL_SAVE:
{
memcpy(pnn->savedIBase, arInput, pnn->cInput * sizeof(*ar));
=> Evaluate128(pnn, arInput, ar, arOutput, pnn->savedBase);
break;
}
case NNEVAL_FROMBASE:
{
int i;
(gdb) f 0
#0 Evaluate128 (pnn=0x8650060, arInput=0xbfbfa4f0, ar=0xbfbfa2c0,
arOutput=0xbfbfaf60, saveAr=0x8703410) at
/usr/include/xmmintrin.h:852
/* Load four SPFP values from P. The address must be 16-byte
aligned. */
static __inline __m128
_mm_load_ps (float const *__P)
{
=>return (__m128) __builtin_ia32_loadaps (__P);
}
The inlines make it hard to know which mm_load_ps caused the failure,
but looking at the code for Evaluate128, this is one possibility -
that prWeight is not aligned on a 16 byte boundary
(gdb) p pr
$4 = (float *) 0xbfbfa2c0
(gdb) p prWeight
$5 = (float *) 0x2947944c
(gdb) p ari
$6 = 1.44269502
1091 /* Calculate activity at hidden nodes */
1092 memcpy(ar, pnn->arHiddenThreshold, HIDDEN_NODES *
sizeof(float));
1093
1094 prWeight = pnn->arHiddenWeight;
1095
1096 for (i = 0; i < pnn->cInput; i++)
1097 {
1098 float const ari = arInput[i];
1099
(gdb)
1100 if (ari)
1101 {
1102 float *pr = ar;
1103 if (ari == 1.0f)
1104 {
1105 for( j = 32; j; j--, pr += 4,
prWeight \
+= 4 )
1106 {
1107 vec0 = _mm_load_ps( pr );
1108 vec1 = _mm_load_ps( prWeight );
1109 sum = _mm_add_ps(vec0, vec1);
--
Jim Segrave address@hidden
- Re: [Bug-gnubg] Vectorization of neural nets just commited into CVS,
Jim Segrave <=