Temporal difference learning. Lambda parameter.

From:

Øystein Schønning-Johansen

Subject:

Date:

Sat, 14 Dec 2019 13:13:01 +0100

Hi!

As we discussed a bit last week I've started to think again. How can we improve a backgammon engine?

I think that most of us agree that what can be improved is play in positions where some kind of long term plan is needed. Like "snake"-positions and backgames.

The reinforcement learning that has been used up til now is plain temporal difference learning like described in Sutton and Barto (and done by several science projects) with TD(lambda=0).

Do you think that the engine can be better at planning ahead, if lambda is increased? Has anyone done a lot of experiments with lambda other than 0? (I don't think it's code in the repo to do anything else than lambda=0, so maybe someone with some other research code base on this can answer?) Or someone with general knowledge of RL can answer?

-Øystein

[Prev in Thread]

Current Thread

[Next in Thread]

Temporal difference learning. Lambda parameter., Øystein Schønning-Johansen <=

Re: Temporal difference learning. Lambda parameter., Philippe Michel, 2019/12/21
- Re: Temporal difference learning. Lambda parameter., boomslang, 2019/12/22