bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-gnubg] Training neural nets: How does size matter?


From: Douglas Zare
Subject: [Bug-gnubg] Training neural nets: How does size matter?
Date: Wed, 28 Aug 2002 21:34:07 -0400
User-agent: Internet Messaging Program (IMP) 3.1

I'm training some neural nets other than gnu, and would love to exchange some 
ideas on training, architecture, etc. with the gnubg developers, among others. 

I have a few questions I hope some on this mailing list have the experience to 
answer. Some were prompted when a test network with 250K parameters that I was 
training surpassed (on some benchmarks, but perhaps not playing strength) a 
network with 1000K parameters, to my surprise.

First, roughly what level of improvement do you expect with mature networks of 
different numbers of hidden nodes? The quality of a neural net is hard to 
quantify abstractly, so one could pin it down to, say, correct absolute 
evaluations in non-contact positions for the racing net, or elo, or cubeless 
ppg against a decent standard.

I don't think Snowie 3's nets were mature, but if they and Snowie 4's nets are, 
then how much of an improvement should one expect to see if Snowie 4 has neural 
nets with twice as many hidden nodes?

Second, how many fewer nodes can you use for the same quality, if you release 
the net from predicting what is covered in the racing database?

Third, Tesauro mentions that a neural network seems to learn a linear 
regression first. Are there other describable qualitative phases that one 
encounters? For example, does a neural network with 50 nodes first imitate the 
linear regression, then a typical mature 5 node network, then 10 node? 

It might be wishful thinking, but if it is the case, it might be possible to 
retain most of the information by training a smaller network to imitate the 
larger network's evaluations. The smaller network might be faster to train, and 
then one could pass the information back.

Are there thresholds for the number of nodes necessary with one hidden layer 
before particular backgammon concepts begin to be understood? In chess, people 
say that with enough lookahead, strategy becomes tactics, but how many nodes do 
you need before the timing issues of a high anchor holding game are understood 
by static evaluations? How many for a deep anchor holding game?

Douglas Zare





reply via email to

[Prev in Thread] Current Thread [Next in Thread]