A highly effective reinforcement learning control system (Truell and Gruenstein) More...
#include <FidoControlSystem.h>
Inherits rl::WireFitQLearn.
Classes | |
struct | History |
Public Member Functions | |
FidoControlSystem (int stateDimensions, Action minAction, Action maxAction, int baseOfDimensions) | |
Initializes a FidoControlSystem. More... | |
std::vector< double > | chooseBoltzmanActionDynamic (State state) |
void | applyReinforcementToLastAction (double reward, State newState) |
Update the control system's model, by giving it reward for its last action. More... | |
void | reset () |
Reverts the control system to a newely initialized state. More... | |
Public Member Functions inherited from rl::WireFitQLearn | |
WireFitQLearn (unsigned int stateDimensions, unsigned int actionDimensions_, unsigned int numHiddenLayers, unsigned int numNeuronsPerHiddenLayer, unsigned int numberOfWires_, Action minAction_, Action maxAction_, unsigned int baseOfDimensions_, Interpolator *interpolator_, net::Trainer *trainer_, double learningRate_, double devaluationFactor_) | |
Initializes a completely new WireFitQLearn object with all necesary values. More... | |
WireFitQLearn () | |
Initializes an empty, non-valid WireFitQLearn object. More... | |
WireFitQLearn (std::ifstream *input) | |
Initializes a WireFitQLearn object from a stream. More... | |
Action | chooseBestAction (State currentState) |
Gets the action that the network deems most benificial for the current state. More... | |
Action | chooseBoltzmanAction (State currentState, double explorationConstant) |
Gets an action using the Boltzman softmax probability distribution. More... | |
void | applyReinforcementToLastAction (double reward, State newState) |
Updates expected reward values. More... | |
void | reset () |
Resets the system's model so that to a newely initialized state. More... | |
void | store (std::ofstream *output) |
Stores this model in a stream. More... | |
Public Attributes | |
const double | initialExploration = 1 |
const unsigned int | samplesOfHistory = 10 |
double | explorationLevel |
double | lastUncertainty |
Public Attributes inherited from rl::WireFitQLearn | |
net::NeuralNet * | network |
Interpolator * | interpolator |
net::Trainer * | trainer |
unsigned int | numberOfWires |
unsigned int | actionDimensions |
double | learningRate |
double | devaluationFactor |
double | controlPointsGDErrorTarget |
double | controlPointsGDLearningRate |
int | controlPointsGDMaxIterations |
unsigned int | baseOfDimensions |
State | lastState |
Action | minAction |
Action | maxAction |
Action | lastAction |
net::NeuralNet * | modelNet |
Protected Member Functions | |
std::vector < FidoControlSystem::History > | selectHistories () |
void | trainOnHistories (std::vector< FidoControlSystem::History > selectedHistories) |
void | adjustExploration (double uncertainty) |
double | getError (std::vector< double > input, std::vector< double > correctOutput) |
std::vector< Wire > | newControlWiresForHistory (History history) |
Protected Member Functions inherited from rl::WireFitQLearn | |
std::vector< Wire > | getWires (State state) |
std::vector< Wire > | getSetOfWires (const State &state, int baseOfDimensions) |
std::vector< double > | getRawOutput (std::vector< Wire > wires) |
double | highestReward (State state) |
Action | bestAction (State state) |
double | getQValue (double reward, const State &oldState, const State &newState, const Action &action, const std::vector< Wire > &controlWires) |
std::vector< Wire > | newControlWires (const Wire &correctWire, std::vector< Wire > controlWires) |
Protected Attributes | |
std::vector< History > | histories |
A highly effective reinforcement learning control system (Truell and Gruenstein)
FidoControlSystem::FidoControlSystem | ( | int | stateDimensions, |
Action | minAction, | ||
Action | maxAction, | ||
int | baseOfDimensions | ||
) |
Initializes a FidoControlSystem.
stateDimensions | the number of dimensions of the state being fed to the control system (aka. number of elements in the state vector) |
minAction | the minimum possible action (e.g. a vector of doubles) that the control system may output |
maxAction | the maximum possible action (e.g. a vector of doubles) that the control system may output |
baseOfDimensions | the number of possible descrete values in each dimension. Ex. if baseOfDimensions=2, minAction={0, 0}, maxAction={1, 1}, possibleActions={{0, 0}, {0, 1}, {1, 0}, {1, 1}}. |
|
protected |
|
virtual |
Update the control system's model, by giving it reward for its last action.
reward | the reward associated with the control system's last action |
newState | the new state vector (needed because states may change after performing an action) |
Implements rl::Learner.
std::vector< double > FidoControlSystem::chooseBoltzmanActionDynamic | ( | State | state | ) |
|
protected |
|
virtual |
Reverts the control system to a newely initialized state.
Reset's the control system's model and wipes the system's memory of past actions, states, and rewards.
Implements rl::Learner.
|
protected |
|
protected |
double rl::FidoControlSystem::explorationLevel |
|
protected |
const double rl::FidoControlSystem::initialExploration = 1 |
double rl::FidoControlSystem::lastUncertainty |
const unsigned int rl::FidoControlSystem::samplesOfHistory = 10 |