A Learner that follows the Q-Learning algorithm. More...

#include <QLearn.h>

Public Member Functions
	QLearn (net::NeuralNet modelNetwork, net::Trainer trainer_, double learningRate_, double devaluationFactor_, std::vector< Action > possibleActions_)
	Initializes a QLearn object with a model network and the values of learning rate and devaluationFactor. More...

	QLearn (std::vector< Model > models_, net::Trainer *trainer_, double learningRate_, double devaluationFactor_)
	Initializes a QLearn object with an vector of networks and the values of learning rate and devaluationFactor. More...

	QLearn ()
	Initializes an empty, non-valid Q-learning object. More...

Action	chooseBestAction (State currentState)
	Gets the action that the network deems most benificial for the current state. More...

Action	chooseBoltzmanAction (State currentState, double explorationConstant)
	Gets an action using the Boltzman softmax probability distribution. More...

void	applyReinforcementToLastAction (double reward, State newState)
	Updates expected reward values. More...

void	reset ()
	Reverts the system to a newely initialized state. More...

void	store (std::ofstream *output)
	Stores this model in a stream. More...

Detailed Description

A Learner that follows the Q-Learning algorithm.

Constructor & Destructor Documentation

QLearn::QLearn	(	net::NeuralNet *	modelNetwork,
		net::Trainer *	trainer_,
		double	learningRate_,
		double	devaluationFactor_,
		std::vector< Action >	possibleActions_
	)

Initializes a QLearn object with a model network and the values of learning rate and devaluationFactor.

Parameters

modelNetwork	a neural network that is used as a model architecture for the networks that will rate the reward of each action.
backprop_	the model that will train the neural networks that will rate the reward of each action
learningRate_	a constant between 0 and 1 that dictates how fast the robot learns from reinforcement.
devaluationFactor_	a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward.
possibleActions_	all of the actions that this object could possibly choose

QLearn::QLearn	(	std::vector< Model >	models_,
		net::Trainer *	trainer_,
		double	learningRate_,
		double	devaluationFactor_
	)

Initializes a QLearn object with an vector of networks and the values of learning rate and devaluationFactor.

Parameters

models_	he vector of networks contains the networks that will rate the reward of each action
backprop_	the model that will train the neural networks that will rate the reward of each action
learningRate_	a constant between 0 and 1 that dictates how fast the robot learns from reinforcement.
devaluationFactor_	a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward.

QLearn::QLearn ( )

Initializes an empty, non-valid Q-learning object.

Member Function Documentation

void QLearn::applyReinforcementToLastAction	(	double	reward,
		State	newState
	)

virtual

Updates expected reward values.

Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the corect reward value.

Parameters

reward	the reward value from the last action
newState	the new state (aka. inputs) to the system

Implements rl::Learner.

Action QLearn::chooseBestAction ( State currentState )

virtual

Gets the action that the network deems most benificial for the current state.

Parameters

currentState a vector of doubles representing the "inputs" or sensor values of the system

Implements rl::Learner.

Action QLearn::chooseBoltzmanAction	(	State	currentState,
		double	explorationConstant
	)

virtual

Gets an action using the Boltzman softmax probability distribution.

Non-random search heuristic used so that the neural network explores actions despite their reward value. The lower the exploration constanstant, the more likely it is to pick the best action for the current state.

Parameters

currentState	a vector of doubles representing the "inputs" or sensor values of the system
explorationConstant	a positive floating point number representing the exploration level of the system. Common values range from 0.01 to 1. The higher this number is, the more likely it is that the system will pick worse actions.

Incase a floating point error resulted in no action

Implements rl::Learner.

void QLearn::reset ( )

virtual

Reverts the system to a newely initialized state.

Reset's the system's model and wipes the system's memory of past actions, states, and rewards.

Implements rl::Learner.

void rl::QLearn::store ( std::ofstream * output )

Stores this model in a stream.

The documentation for this class was generated from the following files:

/home/truell20/Documents/Fido/Fido/include/QLearn.h
/home/truell20/Documents/Fido/Fido/src/QLearn.cpp

Public Member Functions

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation