A reinforcement learning system. More...

#include <Learner.h>

Inherited by rl::QLearn, and rl::WireFitQLearn.

Public Member Functions
virtual Action	chooseBestAction (State currentState)=0
	Gets the action that the network deems most beneficial for the currentState. More...

virtual Action	chooseBoltzmanAction (State currentState, double explorationConstant)=0
	Gets an action using the Boltzman softmax probability distribution. More...

virtual void	applyReinforcementToLastAction (double reward, State newState)=0
	Apply reinforcement to the last action. More...

virtual void	reset ()=0
	Randomizes the leaner. More...

Detailed Description

A reinforcement learning system.

Member Function Documentation

virtual void rl::Learner::applyReinforcementToLastAction	(	double	reward,
		State	newState
	)

pure virtual

Apply reinforcement to the last action.

Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the correct reward value

Parameters

reward	the reward given for the last action taken
newState	the new state

Implemented in rl::WireFitQLearn, rl::QLearn, and rl::FidoControlSystem.

virtual Action rl::Learner::chooseBestAction ( State currentState )

pure virtual

Gets the action that the network deems most beneficial for the currentState.

Parameters

currentState the state for which to choose the action

Implemented in rl::WireFitQLearn, and rl::QLearn.

virtual Action rl::Learner::chooseBoltzmanAction	(	State	currentState,
		double	explorationConstant
	)

pure virtual

Gets an action using the Boltzman softmax probability distribution.

A non-random search heuristic is used such that the neural network explores actions despite their reward value. The lower the exploration constant, the more likely it is to pick the best action for the current state.

Parameters

currentState	the state for which to choose the action
explorationConstant	the Boltzmann temperature constant, determining "exploration"

Implemented in rl::WireFitQLearn, and rl::QLearn.

virtual void rl::Learner::reset ( )

pure virtual

Randomizes the leaner.

Implemented in rl::QLearn, rl::WireFitQLearn, and rl::FidoControlSystem.

The documentation for this class was generated from the following file:

/home/truell20/Documents/Fido/Fido/include/Learner.h

Public Member Functions

Detailed Description

Member Function Documentation