I have tried some experiments, and it looks like the training dataset (for contact positions) with the current input features, do indeed like some of the more modern methods. Briefly summarized:
Things that improves supervised learning on the dataset:
* Deeper nets, 5-6 hidden layers combined with ReLU activation functions.
* Adam (and AdamW) optimizer.
* A tiny bit of weight decay.
* Mini-batch training.
Things that does not work:
* Dropout.
* PCA of inputs.
* RMSProp optimizer (About the same performance as SGD).
I've tried training with Keras and on GPU's, and the training is really fast. However a plain CPU implementation of modern neural network training algorithms is actually not much slower for me. Also porting GPU code over into the GNU Backgammon application might not be faster as a lot of cycles will be used shuffling data back and forth between main memory and GPU memory.
So the process I ended up using was:
1. Test out what works with Keras+GPU
2. implement that working method in C code for CPU.
3. Train NN with that code.
I've only worked with the contact neural network, as I see some strange issues with race dataset, and I think it require a re-rollout.
-Øystein