A reinforcement learning system. More...
#include <Learner.h>
Inherited by rl::QLearn, and rl::WireFitQLearn.
Public Member Functions | |
virtual Action | chooseBestAction (State currentState)=0 |
Gets the action that the network deems most beneficial for the currentState. More... | |
virtual Action | chooseBoltzmanAction (State currentState, double explorationConstant)=0 |
Gets an action using the Boltzman softmax probability distribution. More... | |
virtual void | applyReinforcementToLastAction (double reward, State newState)=0 |
Apply reinforcement to the last action. More... | |
virtual void | reset ()=0 |
Randomizes the leaner. More... | |
A reinforcement learning system.
|
pure virtual |
Apply reinforcement to the last action.
Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the correct reward value
reward | the reward given for the last action taken |
newState | the new state |
Implemented in rl::WireFitQLearn, rl::QLearn, and rl::FidoControlSystem.
Gets the action that the network deems most beneficial for the currentState.
currentState | the state for which to choose the action |
Implemented in rl::WireFitQLearn, and rl::QLearn.
|
pure virtual |
Gets an action using the Boltzman softmax probability distribution.
A non-random search heuristic is used such that the neural network explores actions despite their reward value. The lower the exploration constant, the more likely it is to pick the best action for the current state.
currentState | the state for which to choose the action |
explorationConstant | the Boltzmann temperature constant, determining "exploration" |
Implemented in rl::WireFitQLearn, and rl::QLearn.
|
pure virtual |
Randomizes the leaner.
Implemented in rl::QLearn, rl::WireFitQLearn, and rl::FidoControlSystem.