Re: [gnubg] Temporal difference learning. Lambda parameter.

bug-gnubg

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gnubg] Temporal difference learning. Lambda parameter.

From:	Timothy Y. Chow
Subject:	Re: [gnubg] Temporal difference learning. Lambda parameter.
Date:	Sun, 22 Dec 2019 16:59:24 -0500 (EST)
User-agent:	Alpine 2.21 (LRH 202 2017-01-01)

Philippe Michel wrote:

The engine doesn't "plan ahead", does it ? It approximates theprobabilities of the game outcomes from the current position (or we cansay its equity for simplification).
My understanding is that its potential accuracy depends on the neuralnetwork (architecture + input features) and the training method(including the training database in the case of supervised learning) hasinfluence on how close to this potential one can go, and how fast.

I haven't done any actual training of backgammon nets, but I think whatOysetein was saying is that TD learning is a method of trying to figureout (crudely speaking) "where you made your mistake when you lost," and itworks well when you don't have to "backtrack too far" when you'rereadjusting your weights. But for positions where there's "long-termplanning" (e.g., rolling the prime around the board), one intuitivelyexpects TD learning not to work so well.

It's true that once you have a reasonably good network, you can"fine-tune" it using other methods. For example, for a perfect bot,0-ply, 1-ply, 2-ply, etc., should all give the same answer, but an actualbot won't, so you can get some improvement just by forcing the bot to ironout these inconsistencies. This can be done using various supervisedtraining methods and not necessarily TD learning. But my understanding(which could be flawed) is that TD learning still enters the picture atthe very first step, when you're starting from scratch (with only therules and no heuristics).

If there's some area of the game where your network is still doing verypoorly, then you may need to do more "from scratch" training, rather thanjust bootstrapping off what you already have. I think this is why Oysteinis suggesting revisiting TD learning.

Tim

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [gnubg] Temporal difference learning. Lambda parameter., Timothy Y. Chow <=

Prev by Date: Re: Temporal difference learning. Lambda parameter.
Next by Date: Question regarding python scripting
Previous by thread: Temporal difference learning. Lambda parameter.
Next by thread: Question regarding python scripting
Index(es):
- Date
- Thread