Fido
 All Classes Namespaces Files Functions Variables Typedefs Macros Pages
Public Member Functions | Public Attributes | Protected Member Functions | List of all members
rl::WireFitQLearn Class Reference

A Learner using Q-learning that works with continous state action spaces (Gaskett et al.). More...

#include <WireFitQLearn.h>

Inherits rl::Learner.

Inherited by rl::FidoControlSystem.

Public Member Functions

 WireFitQLearn (unsigned int stateDimensions, unsigned int actionDimensions_, unsigned int numHiddenLayers, unsigned int numNeuronsPerHiddenLayer, unsigned int numberOfWires_, Action minAction_, Action maxAction_, unsigned int baseOfDimensions_, Interpolator *interpolator_, net::Trainer *trainer_, double learningRate_, double devaluationFactor_)
 Initializes a completely new WireFitQLearn object with all necesary values. More...
 
 WireFitQLearn ()
 Initializes an empty, non-valid WireFitQLearn object. More...
 
 WireFitQLearn (std::ifstream *input)
 Initializes a WireFitQLearn object from a stream. More...
 
Action chooseBestAction (State currentState)
 Gets the action that the network deems most benificial for the current state. More...
 
Action chooseBoltzmanAction (State currentState, double explorationConstant)
 Gets an action using the Boltzman softmax probability distribution. More...
 
void applyReinforcementToLastAction (double reward, State newState)
 Updates expected reward values. More...
 
void reset ()
 Resets the system's model so that to a newely initialized state. More...
 
void store (std::ofstream *output)
 Stores this model in a stream. More...
 

Public Attributes

net::NeuralNetnetwork
 
Interpolatorinterpolator
 
net::Trainertrainer
 
unsigned int numberOfWires
 
unsigned int actionDimensions
 
double learningRate
 
double devaluationFactor
 
double controlPointsGDErrorTarget
 
double controlPointsGDLearningRate
 
int controlPointsGDMaxIterations
 
unsigned int baseOfDimensions
 
State lastState
 
Action minAction
 
Action maxAction
 
Action lastAction
 
net::NeuralNetmodelNet
 

Protected Member Functions

std::vector< WiregetWires (State state)
 
std::vector< WiregetSetOfWires (const State &state, int baseOfDimensions)
 
std::vector< double > getRawOutput (std::vector< Wire > wires)
 
double highestReward (State state)
 
Action bestAction (State state)
 
double getQValue (double reward, const State &oldState, const State &newState, const Action &action, const std::vector< Wire > &controlWires)
 
std::vector< WirenewControlWires (const Wire &correctWire, std::vector< Wire > controlWires)
 

Detailed Description

A Learner using Q-learning that works with continous state action spaces (Gaskett et al.).

A wire fitted interpolator function is used in conjunction with a neural network to extend Q-Learning to continuous actions and state vectors.

Constructor & Destructor Documentation

WireFitQLearn::WireFitQLearn ( unsigned int  stateDimensions,
unsigned int  actionDimensions_,
unsigned int  numHiddenLayers,
unsigned int  numNeuronsPerHiddenLayer,
unsigned int  numberOfWires_,
Action  minAction_,
Action  maxAction_,
unsigned int  baseOfDimensions_,
Interpolator interpolator_,
net::Trainer trainer_,
double  learningRate_,
double  devaluationFactor_ 
)

Initializes a completely new WireFitQLearn object with all necesary values.

Parameters
stateDimensionsthe dimensions of the state vector being fed to the system
actionDimensions_the dimensions of the action vector that will be outputted by the system
numHiddenLayersthe number of hidden layers of the neural network that will be approximating the reward values of each action
numNeuronsPerHiddenLayerthe number of neurons per hidden layer of the neural network that will be approximating the reward values of each action
numberOfWires_the number of multi-dimensional data-points or "wires" that will be outputted by the neural network. Wires are interpolated so that a continuous function of action versus expected reward may be generated by the system. The more complex the task, the more wires are needed.
minAction_the minimum action vector that the system may output
maxAction_the maximum action vector that the system may output
baseOfDimensions_the number of possible descrete values in each dimension. Ex. if baseOfDimensions=2, minAction={0, 0}, maxAction={1, 1}, possibleActions={{0, 0}, {0, 1}, {1, 0}, {1, 1}}.
interpolator_the object that will interpolate the data-points or "wires" of action versus reward that the neural network will output
trainer_the model that will train the neural network which is fed the state and outputs data-points or "wires" of action versus reward
learningRate_a constant between 0 and 1 that dictates how fast the robot learns from reinforcement.
devaluationFactor_a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward.
WireFitQLearn::WireFitQLearn ( )

Initializes an empty, non-valid WireFitQLearn object.

WireFitQLearn::WireFitQLearn ( std::ifstream *  input)

Initializes a WireFitQLearn object from a stream.

Member Function Documentation

void WireFitQLearn::applyReinforcementToLastAction ( double  reward,
State  newState 
)
virtual

Updates expected reward values.

Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the corect reward value.

Parameters
rewardthe reward value from the last action
newStatethe new state (aka. inputs) to the system

Implements rl::Learner.

Action WireFitQLearn::bestAction ( State  state)
protected
Action WireFitQLearn::chooseBestAction ( State  currentState)
virtual

Gets the action that the network deems most benificial for the current state.

Parameters
currentStatea vector of doubles representing the "inputs" or sensor values of the system

Implements rl::Learner.

Action WireFitQLearn::chooseBoltzmanAction ( State  currentState,
double  explorationConstant 
)
virtual

Gets an action using the Boltzman softmax probability distribution.

Non-random search heuristic used so that the neural network explores actions despite their reward value. The lower the exploration constanstant, the more likely it is to pick the best action for the current state.

Parameters
currentStatea vector of doubles representing the "inputs" or sensor values of the system
explorationConstanta positive floating point number representing the exploration level of the system. Common values range from 0.01 to 1. The higher this number is, the more likely it is that the system will pick worse actions.

Incase a floating point error resulted in no wire being chosen

Implements rl::Learner.

double WireFitQLearn::getQValue ( double  reward,
const State oldState,
const State newState,
const Action action,
const std::vector< Wire > &  controlWires 
)
protected

Update Q value according to adaptive learning

std::vector< double > WireFitQLearn::getRawOutput ( std::vector< Wire wires)
protected
std::vector< Wire > WireFitQLearn::getSetOfWires ( const State state,
int  baseOfDimensions 
)
protected

Increment iterator vector

std::vector< Wire > WireFitQLearn::getWires ( State  state)
protected
double WireFitQLearn::highestReward ( State  state)
protected
std::vector< Wire > WireFitQLearn::newControlWires ( const Wire correctWire,
std::vector< Wire controlWires 
)
protected
void WireFitQLearn::reset ( )
virtual

Resets the system's model so that to a newely initialized state.

Implements rl::Learner.

void WireFitQLearn::store ( std::ofstream *  output)

Stores this model in a stream.

Member Data Documentation

unsigned int rl::WireFitQLearn::actionDimensions
unsigned int rl::WireFitQLearn::baseOfDimensions
double rl::WireFitQLearn::controlPointsGDErrorTarget
double rl::WireFitQLearn::controlPointsGDLearningRate
int rl::WireFitQLearn::controlPointsGDMaxIterations
double rl::WireFitQLearn::devaluationFactor
Interpolator* rl::WireFitQLearn::interpolator
Action rl::WireFitQLearn::lastAction
State rl::WireFitQLearn::lastState
double rl::WireFitQLearn::learningRate
Action rl::WireFitQLearn::maxAction
Action rl::WireFitQLearn::minAction
net::NeuralNet* rl::WireFitQLearn::modelNet
net::NeuralNet* rl::WireFitQLearn::network
unsigned int rl::WireFitQLearn::numberOfWires
net::Trainer* rl::WireFitQLearn::trainer

The documentation for this class was generated from the following files: