A Learner using Q-learning that works with continous state action spaces (Gaskett et al.).
More...
#include <WireFitQLearn.h>
Inherits rl::Learner.
Inherited by rl::FidoControlSystem.
|
| WireFitQLearn (unsigned int stateDimensions, unsigned int actionDimensions_, unsigned int numHiddenLayers, unsigned int numNeuronsPerHiddenLayer, unsigned int numberOfWires_, Action minAction_, Action maxAction_, unsigned int baseOfDimensions_, Interpolator *interpolator_, net::Trainer *trainer_, double learningRate_, double devaluationFactor_) |
| Initializes a completely new WireFitQLearn object with all necesary values. More...
|
|
| WireFitQLearn () |
| Initializes an empty, non-valid WireFitQLearn object. More...
|
|
| WireFitQLearn (std::ifstream *input) |
| Initializes a WireFitQLearn object from a stream. More...
|
|
Action | chooseBestAction (State currentState) |
| Gets the action that the network deems most benificial for the current state. More...
|
|
Action | chooseBoltzmanAction (State currentState, double explorationConstant) |
| Gets an action using the Boltzman softmax probability distribution. More...
|
|
void | applyReinforcementToLastAction (double reward, State newState) |
| Updates expected reward values. More...
|
|
void | reset () |
| Resets the system's model so that to a newely initialized state. More...
|
|
void | store (std::ofstream *output) |
| Stores this model in a stream. More...
|
|
A Learner using Q-learning that works with continous state action spaces (Gaskett et al.).
A wire fitted interpolator function is used in conjunction with a neural network to extend Q-Learning to continuous actions and state vectors.
WireFitQLearn::WireFitQLearn |
( |
unsigned int |
stateDimensions, |
|
|
unsigned int |
actionDimensions_, |
|
|
unsigned int |
numHiddenLayers, |
|
|
unsigned int |
numNeuronsPerHiddenLayer, |
|
|
unsigned int |
numberOfWires_, |
|
|
Action |
minAction_, |
|
|
Action |
maxAction_, |
|
|
unsigned int |
baseOfDimensions_, |
|
|
Interpolator * |
interpolator_, |
|
|
net::Trainer * |
trainer_, |
|
|
double |
learningRate_, |
|
|
double |
devaluationFactor_ |
|
) |
| |
Initializes a completely new WireFitQLearn object with all necesary values.
- Parameters
-
stateDimensions | the dimensions of the state vector being fed to the system |
actionDimensions_ | the dimensions of the action vector that will be outputted by the system |
numHiddenLayers | the number of hidden layers of the neural network that will be approximating the reward values of each action |
numNeuronsPerHiddenLayer | the number of neurons per hidden layer of the neural network that will be approximating the reward values of each action |
numberOfWires_ | the number of multi-dimensional data-points or "wires" that will be outputted by the neural network. Wires are interpolated so that a continuous function of action versus expected reward may be generated by the system. The more complex the task, the more wires are needed. |
minAction_ | the minimum action vector that the system may output |
maxAction_ | the maximum action vector that the system may output |
baseOfDimensions_ | the number of possible descrete values in each dimension. Ex. if baseOfDimensions=2, minAction={0, 0}, maxAction={1, 1}, possibleActions={{0, 0}, {0, 1}, {1, 0}, {1, 1}}. |
interpolator_ | the object that will interpolate the data-points or "wires" of action versus reward that the neural network will output |
trainer_ | the model that will train the neural network which is fed the state and outputs data-points or "wires" of action versus reward |
learningRate_ | a constant between 0 and 1 that dictates how fast the robot learns from reinforcement. |
devaluationFactor_ | a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward. |
WireFitQLearn::WireFitQLearn |
( |
| ) |
|
WireFitQLearn::WireFitQLearn |
( |
std::ifstream * |
input | ) |
|
void WireFitQLearn::applyReinforcementToLastAction |
( |
double |
reward, |
|
|
State |
newState |
|
) |
| |
|
virtual |
Updates expected reward values.
Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the corect reward value.
- Parameters
-
reward | the reward value from the last action |
newState | the new state (aka. inputs) to the system |
Implements rl::Learner.
Action WireFitQLearn::chooseBestAction |
( |
State |
currentState | ) |
|
|
virtual |
Gets the action that the network deems most benificial for the current state.
- Parameters
-
currentState | a vector of doubles representing the "inputs" or sensor values of the system |
Implements rl::Learner.
Action WireFitQLearn::chooseBoltzmanAction |
( |
State |
currentState, |
|
|
double |
explorationConstant |
|
) |
| |
|
virtual |
Gets an action using the Boltzman softmax probability distribution.
Non-random search heuristic used so that the neural network explores actions despite their reward value. The lower the exploration constanstant, the more likely it is to pick the best action for the current state.
- Parameters
-
currentState | a vector of doubles representing the "inputs" or sensor values of the system |
explorationConstant | a positive floating point number representing the exploration level of the system. Common values range from 0.01 to 1. The higher this number is, the more likely it is that the system will pick worse actions. |
Incase a floating point error resulted in no wire being chosen
Implements rl::Learner.
double WireFitQLearn::getQValue |
( |
double |
reward, |
|
|
const State & |
oldState, |
|
|
const State & |
newState, |
|
|
const Action & |
action, |
|
|
const std::vector< Wire > & |
controlWires |
|
) |
| |
|
protected |
Update Q value according to adaptive learning
std::vector< double > WireFitQLearn::getRawOutput |
( |
std::vector< Wire > |
wires | ) |
|
|
protected |
std::vector< Wire > WireFitQLearn::getSetOfWires |
( |
const State & |
state, |
|
|
int |
baseOfDimensions |
|
) |
| |
|
protected |
Increment iterator vector
std::vector< Wire > WireFitQLearn::getWires |
( |
State |
state | ) |
|
|
protected |
double WireFitQLearn::highestReward |
( |
State |
state | ) |
|
|
protected |
std::vector< Wire > WireFitQLearn::newControlWires |
( |
const Wire & |
correctWire, |
|
|
std::vector< Wire > |
controlWires |
|
) |
| |
|
protected |
void WireFitQLearn::reset |
( |
| ) |
|
|
virtual |
Resets the system's model so that to a newely initialized state.
Implements rl::Learner.
void WireFitQLearn::store |
( |
std::ofstream * |
output | ) |
|
Stores this model in a stream.
unsigned int rl::WireFitQLearn::actionDimensions |
unsigned int rl::WireFitQLearn::baseOfDimensions |
double rl::WireFitQLearn::controlPointsGDErrorTarget |
double rl::WireFitQLearn::controlPointsGDLearningRate |
int rl::WireFitQLearn::controlPointsGDMaxIterations |
double rl::WireFitQLearn::devaluationFactor |
Action rl::WireFitQLearn::lastAction |
State rl::WireFitQLearn::lastState |
double rl::WireFitQLearn::learningRate |
Action rl::WireFitQLearn::maxAction |
Action rl::WireFitQLearn::minAction |
unsigned int rl::WireFitQLearn::numberOfWires |
The documentation for this class was generated from the following files: