A Learner using Q-learning that works with continous state action spaces (Gaskett et al.). More...

#include <WireFitQLearn.h>

Public Member Functions
	WireFitQLearn (unsigned int stateDimensions, unsigned int actionDimensions_, unsigned int numHiddenLayers, unsigned int numNeuronsPerHiddenLayer, unsigned int numberOfWires_, Action minAction_, Action maxAction_, unsigned int baseOfDimensions_, Interpolator interpolator_, net::Trainer trainer_, double learningRate_, double devaluationFactor_)
	Initializes a completely new WireFitQLearn object with all necesary values. More...

	WireFitQLearn ()
	Initializes an empty, non-valid WireFitQLearn object. More...

	WireFitQLearn (std::ifstream *input)
	Initializes a WireFitQLearn object from a stream. More...

Action	chooseBestAction (State currentState)
	Gets the action that the network deems most benificial for the current state. More...

Action	chooseBoltzmanAction (State currentState, double explorationConstant)
	Gets an action using the Boltzman softmax probability distribution. More...

void	applyReinforcementToLastAction (double reward, State newState)
	Updates expected reward values. More...

void	reset ()
	Resets the system's model so that to a newely initialized state. More...

void	store (std::ofstream *output)
	Stores this model in a stream. More...

Public Attributes
net::NeuralNet *	network

Interpolator *	interpolator

net::Trainer *	trainer

unsigned int	numberOfWires

unsigned int	actionDimensions

double	learningRate

double	devaluationFactor

double	controlPointsGDErrorTarget

double	controlPointsGDLearningRate

int	controlPointsGDMaxIterations

unsigned int	baseOfDimensions

State	lastState

Action	minAction

Action	maxAction

Action	lastAction

net::NeuralNet *	modelNet

Protected Member Functions
std::vector< Wire >	getWires (State state)

std::vector< Wire >	getSetOfWires (const State &state, int baseOfDimensions)

std::vector< double >	getRawOutput (std::vector< Wire > wires)

double	highestReward (State state)

Action	bestAction (State state)

double	getQValue (double reward, const State &oldState, const State &newState, const Action &action, const std::vector< Wire > &controlWires)

std::vector< Wire >	newControlWires (const Wire &correctWire, std::vector< Wire > controlWires)

Detailed Description

A Learner using Q-learning that works with continous state action spaces (Gaskett et al.).

A wire fitted interpolator function is used in conjunction with a neural network to extend Q-Learning to continuous actions and state vectors.

Constructor & Destructor Documentation

WireFitQLearn::WireFitQLearn	(	unsigned int	stateDimensions,
		unsigned int	actionDimensions_,
		unsigned int	numHiddenLayers,
		unsigned int	numNeuronsPerHiddenLayer,
		unsigned int	numberOfWires_,
		Action	minAction_,
		Action	maxAction_,
		unsigned int	baseOfDimensions_,
		Interpolator *	interpolator_,
		net::Trainer *	trainer_,
		double	learningRate_,
		double	devaluationFactor_
	)

Initializes a completely new WireFitQLearn object with all necesary values.

Parameters

stateDimensions	the dimensions of the state vector being fed to the system
actionDimensions_	the dimensions of the action vector that will be outputted by the system
numHiddenLayers	the number of hidden layers of the neural network that will be approximating the reward values of each action
numNeuronsPerHiddenLayer	the number of neurons per hidden layer of the neural network that will be approximating the reward values of each action
numberOfWires_	the number of multi-dimensional data-points or "wires" that will be outputted by the neural network. Wires are interpolated so that a continuous function of action versus expected reward may be generated by the system. The more complex the task, the more wires are needed.
minAction_	the minimum action vector that the system may output
maxAction_	the maximum action vector that the system may output
baseOfDimensions_	the number of possible descrete values in each dimension. Ex. if baseOfDimensions=2, minAction={0, 0}, maxAction={1, 1}, possibleActions={{0, 0}, {0, 1}, {1, 0}, {1, 1}}.
interpolator_	the object that will interpolate the data-points or "wires" of action versus reward that the neural network will output
trainer_	the model that will train the neural network which is fed the state and outputs data-points or "wires" of action versus reward
learningRate_	a constant between 0 and 1 that dictates how fast the robot learns from reinforcement.
devaluationFactor_	a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward.

WireFitQLearn::WireFitQLearn ( )

Initializes an empty, non-valid WireFitQLearn object.

WireFitQLearn::WireFitQLearn ( std::ifstream * input )

Initializes a WireFitQLearn object from a stream.

Member Function Documentation

void WireFitQLearn::applyReinforcementToLastAction	(	double	reward,
		State	newState
	)

virtual

Updates expected reward values.

Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the corect reward value.

Parameters

reward	the reward value from the last action
newState	the new state (aka. inputs) to the system

Implements rl::Learner.

Action WireFitQLearn::bestAction ( State state )

protected

Action WireFitQLearn::chooseBestAction ( State currentState )

virtual

Gets the action that the network deems most benificial for the current state.

Parameters

currentState a vector of doubles representing the "inputs" or sensor values of the system

Implements rl::Learner.

Action WireFitQLearn::chooseBoltzmanAction	(	State	currentState,
		double	explorationConstant
	)

virtual

Gets an action using the Boltzman softmax probability distribution.

Non-random search heuristic used so that the neural network explores actions despite their reward value. The lower the exploration constanstant, the more likely it is to pick the best action for the current state.

Parameters

currentState	a vector of doubles representing the "inputs" or sensor values of the system
explorationConstant	a positive floating point number representing the exploration level of the system. Common values range from 0.01 to 1. The higher this number is, the more likely it is that the system will pick worse actions.

Incase a floating point error resulted in no wire being chosen

Implements rl::Learner.

double WireFitQLearn::getQValue	(	double	reward,
		const State &	oldState,
		const State &	newState,
		const Action &	action,
		const std::vector< Wire > &	controlWires
	)

protected

Update Q value according to adaptive learning

std::vector< double > WireFitQLearn::getRawOutput ( std::vector< Wire > wires )

protected

std::vector< Wire > WireFitQLearn::getSetOfWires	(	const State &	state,
		int	baseOfDimensions
	)

protected

Increment iterator vector

std::vector< Wire > WireFitQLearn::getWires ( State state )

protected

double WireFitQLearn::highestReward ( State state )

protected

std::vector< Wire > WireFitQLearn::newControlWires	(	const Wire &	correctWire,
		std::vector< Wire >	controlWires
	)

protected

void WireFitQLearn::reset ( )

virtual

Resets the system's model so that to a newely initialized state.

Implements rl::Learner.

void WireFitQLearn::store ( std::ofstream * output )

Stores this model in a stream.

Member Data Documentation

unsigned int rl::WireFitQLearn::actionDimensions

unsigned int rl::WireFitQLearn::baseOfDimensions

double rl::WireFitQLearn::controlPointsGDErrorTarget

double rl::WireFitQLearn::controlPointsGDLearningRate

int rl::WireFitQLearn::controlPointsGDMaxIterations

double rl::WireFitQLearn::devaluationFactor

Interpolator* rl::WireFitQLearn::interpolator

Action rl::WireFitQLearn::lastAction

State rl::WireFitQLearn::lastState

double rl::WireFitQLearn::learningRate

Action rl::WireFitQLearn::maxAction

Action rl::WireFitQLearn::minAction

net::NeuralNet* rl::WireFitQLearn::modelNet

net::NeuralNet* rl::WireFitQLearn::network

unsigned int rl::WireFitQLearn::numberOfWires

net::Trainer* rl::WireFitQLearn::trainer

The documentation for this class was generated from the following files:

/home/truell20/Documents/Fido/Fido/include/WireFitQLearn.h
/home/truell20/Documents/Fido/Fido/src/WireFitQLearn.cpp

Public Member Functions

Public Attributes

Protected Member Functions

Detailed Description

Constructor & Destructor Documentation

Member Function Documentation

Member Data Documentation