bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: current development


From: Joseph Heled
Subject: Re: current development
Date: Thu, 5 Dec 2019 09:34:45 +1300

The main difference, if I understand correctly (and I know very little here) is to bootstrap from the ground. That is, no pre-computed inputs. and let the network figure it out by self play.

We have a great test case in that we can start with just racing.

That said, I think we will need a net for each match score, since cubeless -> cubeful is where things get messy.

Also, given that 0-ply rollouts are relatively fast, when playing against a human - if you can wait a second or two, you can play using cubeful 0-ply. Testing how good this is will be problematic.

-Joseph


On Thu, 5 Dec 2019 at 09:23, Øystein Schønning-Johansen <address@hidden> wrote:
But let's chat about the idea instead. What will it actually mean to 'apply "AlphaZero methods" to backgammon.' ?

AlphaZero (and AlphaGo and Lc0 and SugaR NN) is just more or less the same thing as reinforcement learning in backgammon. So, from my understanding, it is rather AlphaZero, who has applied the backgammon methods. They are both the chess and go variants trains with reinforcement learning pretty much like the original GNU Backgammon, Jellyfish and Snowie. In Go they had to make a move selection subroutine based on human play and then add MCTS to train. Also the neural networks are deeper and more complex. The nn inputs features are also so more complex and can to some extend resemble convolutions known from convolutional neural network (And that the inputs are not properly described in the high level articles.)

Apart from that, it is actually same thing: Reinforcement learning.

But how can we improve: We believe (at least I do) that the current state of backgammon bots are so strong that it plays close to perfect in standard positions. It is in uncommon and long term plan positions (like deep backgames and snake rolling prime positions) bots still can improve. Let me throw some ideas up in the air for discussion:

Can we make a RL algorithm that is so fast that it can learn on the fly? Say we during play find a position where some indicator (that may be another challenge) indicates that this is a position that requires long term planning. If we then have the ability to RL train a neural net for that specific position, that could be an huge improvement in my opinion. (Lot's of details missing.)

And then, could the evaluations be improved if we specialize neural networks in to specific position types, and then make a kind of nn selection system based on k-means of the input features. I tried that many years ago with only four classes. Those experiments showed that it's not hopeless approach, and with faster computers it can easily create much more than just four classes (fours was only the first number that popped into my head those days)

Then next idea: What about huge scale distributed rollouts? Maybe we could have a system like BOINQ to do rollouts on the fly? I'm not sure how this should be used in a practical sense, and I'm not sure how hard it would be to implement (with or without BOINQ framework) but I'm just kind of brainstorming here.

-Øystein


On Wed, Dec 4, 2019 at 6:47 PM Joseph Heled <address@hidden> wrote:
I was intentionally rude because I thought his original post was inappropriate.

-Joseph

On Thu, 5 Dec 2019 at 06:42, Ralph Corderoy <address@hidden> wrote:
>
> Hi Joseph,
>
> > I thought so.
> >
> > I had the same idea the day I heard they cracked go, but just saying
> > something is a good idea is not helpful at all in my book.
>
> I think you're wrong.  And also a bit rude to boot.
>
> It's fine for Tim to suggest or ponder an idea to the list.  It may
> encourage another subscriber, or draw out news of what a lurker has been
> working on that's related.
>
> --
> Cheers, Ralph.
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]