Fido
 All Classes Namespaces Files Functions Variables Typedefs Macros Pages
Public Member Functions | List of all members
rl::Learner Class Referenceabstract

A reinforcement learning system. More...

#include <Learner.h>

Inherited by rl::QLearn, and rl::WireFitQLearn.

Public Member Functions

virtual Action chooseBestAction (State currentState)=0
 Gets the action that the network deems most beneficial for the currentState. More...
 
virtual Action chooseBoltzmanAction (State currentState, double explorationConstant)=0
 Gets an action using the Boltzman softmax probability distribution. More...
 
virtual void applyReinforcementToLastAction (double reward, State newState)=0
 Apply reinforcement to the last action. More...
 
virtual void reset ()=0
 Randomizes the leaner. More...
 

Detailed Description

A reinforcement learning system.

Member Function Documentation

virtual void rl::Learner::applyReinforcementToLastAction ( double  reward,
State  newState 
)
pure virtual

Apply reinforcement to the last action.

Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the correct reward value

Parameters
rewardthe reward given for the last action taken
newStatethe new state

Implemented in rl::WireFitQLearn, rl::QLearn, and rl::FidoControlSystem.

virtual Action rl::Learner::chooseBestAction ( State  currentState)
pure virtual

Gets the action that the network deems most beneficial for the currentState.

Parameters
currentStatethe state for which to choose the action

Implemented in rl::WireFitQLearn, and rl::QLearn.

virtual Action rl::Learner::chooseBoltzmanAction ( State  currentState,
double  explorationConstant 
)
pure virtual

Gets an action using the Boltzman softmax probability distribution.

A non-random search heuristic is used such that the neural network explores actions despite their reward value. The lower the exploration constant, the more likely it is to pick the best action for the current state.

Parameters
currentStatethe state for which to choose the action
explorationConstantthe Boltzmann temperature constant, determining "exploration"

Implemented in rl::WireFitQLearn, and rl::QLearn.

virtual void rl::Learner::reset ( )
pure virtual

Randomizes the leaner.

Implemented in rl::QLearn, rl::WireFitQLearn, and rl::FidoControlSystem.


The documentation for this class was generated from the following file: