Reinforcement learning: exploitation and exploration
How is the agent to pick an action?
One possibility is exploitation, to pick the
action that has the highest Q-value for the current state.
But the agent can only learn about the value of actions that it tries.
Thus it should try a variety of actions.
This may involved ignoring what it thinks is best some of the time:
exploration.
Exploration makes more sense early on in learning when the agent
doesn't know much.
One possibility for selecting an action;
pick the "best" action with probability
P = 1 - e-E a, where a is the number
of training samples (the "age" of the agent).
Here is how the probability of selecting the "best" action
depends on age
when E is 0.1.
Here is how it depends on age when E is 0.01
A smarter possibility would be to have the probability of picking an action
depend on how high its value is relative to the values of all of the other
possible actions.
Here is one way:
where the vs represent all of the possible actions in state
xt.
Cognitive Science at Indiana University | Fall 2004