Total Moves: 0
Death Zone Tragedies: 0
Glorious Goals: 0
Manhattan Efficiency:
Manhattan Efficiency Target:
Moves To Hit Target Efficency:



The need for speed

Choose the speed at which the agen will move. Very fast only renders every 1000 steps or when a goal is achieved

The classics - Alpha how quick to learn

The learning rate or step size determines to what extent the newly acquired information will override the old information.

A factor of 0 will make the agent not learn anything, while a factor of 1 would make the agent consider only the most recent information.

The classics - Gamma how much do we value future rewards

The discount factor gamma determines the importance of future rewards.

A factor of 0 will make the agent "myopic" (or short-sighted) by only considering current rewards, while a factor approaching 1 will make it strive for a long-term high reward.

Is this all there is?

Our agent can find a good route and soon find that it's stuck in a rut! Just like in life, if we want to have a chance of enjoying the better things, we need to get off the beaten track. Don't be deceived though, every reward requires some sort of risk and danger lurks there!

Also known as epsilon!


In life, our problems aren't constant and neither is our goal. It's just the same for our special agent. Choose how much the deathzones, obstacles and the goal wander about. It's not cheating, it's just that sometimes bad things do happen. And sometimes we stumble unexpectedly onto our goal!

Take me away from here!

Want a new life? Try one of these pretrained models

On my command!

It's great that the agent goes quickly but if you want to control the exact pace of the moves, click here

Footnote from the author: PBS