A highly effective reinforcement learning control system (Truell and Gruenstein) More...
#include <FidoControlSystem.h>
Inherits rl::WireFitQLearn.
Classes | |
| struct | History |
Public Member Functions | |
| FidoControlSystem (int stateDimensions, Action minAction, Action maxAction, int baseOfDimensions) | |
| Initializes a FidoControlSystem. More... | |
| std::vector< double > | chooseBoltzmanActionDynamic (State state) |
| void | applyReinforcementToLastAction (double reward, State newState) |
| Update the control system's model, by giving it reward for its last action. More... | |
| void | reset () |
| Reverts the control system to a newely initialized state. More... | |
Public Member Functions inherited from rl::WireFitQLearn | |
| WireFitQLearn (unsigned int stateDimensions, unsigned int actionDimensions_, unsigned int numHiddenLayers, unsigned int numNeuronsPerHiddenLayer, unsigned int numberOfWires_, Action minAction_, Action maxAction_, unsigned int baseOfDimensions_, Interpolator *interpolator_, net::Trainer *trainer_, double learningRate_, double devaluationFactor_) | |
| Initializes a completely new WireFitQLearn object with all necesary values. More... | |
| WireFitQLearn () | |
| Initializes an empty, non-valid WireFitQLearn object. More... | |
| WireFitQLearn (std::ifstream *input) | |
| Initializes a WireFitQLearn object from a stream. More... | |
| Action | chooseBestAction (State currentState) |
| Gets the action that the network deems most benificial for the current state. More... | |
| Action | chooseBoltzmanAction (State currentState, double explorationConstant) |
| Gets an action using the Boltzman softmax probability distribution. More... | |
| void | applyReinforcementToLastAction (double reward, State newState) |
| Updates expected reward values. More... | |
| void | reset () |
| Resets the system's model so that to a newely initialized state. More... | |
| void | store (std::ofstream *output) |
| Stores this model in a stream. More... | |
Public Attributes | |
| const double | initialExploration = 1 |
| const unsigned int | samplesOfHistory = 10 |
| double | explorationLevel |
| double | lastUncertainty |
Public Attributes inherited from rl::WireFitQLearn | |
| net::NeuralNet * | network |
| Interpolator * | interpolator |
| net::Trainer * | trainer |
| unsigned int | numberOfWires |
| unsigned int | actionDimensions |
| double | learningRate |
| double | devaluationFactor |
| double | controlPointsGDErrorTarget |
| double | controlPointsGDLearningRate |
| int | controlPointsGDMaxIterations |
| unsigned int | baseOfDimensions |
| State | lastState |
| Action | minAction |
| Action | maxAction |
| Action | lastAction |
| net::NeuralNet * | modelNet |
Protected Member Functions | |
| std::vector < FidoControlSystem::History > | selectHistories () |
| void | trainOnHistories (std::vector< FidoControlSystem::History > selectedHistories) |
| void | adjustExploration (double uncertainty) |
| double | getError (std::vector< double > input, std::vector< double > correctOutput) |
| std::vector< Wire > | newControlWiresForHistory (History history) |
Protected Member Functions inherited from rl::WireFitQLearn | |
| std::vector< Wire > | getWires (State state) |
| std::vector< Wire > | getSetOfWires (const State &state, int baseOfDimensions) |
| std::vector< double > | getRawOutput (std::vector< Wire > wires) |
| double | highestReward (State state) |
| Action | bestAction (State state) |
| double | getQValue (double reward, const State &oldState, const State &newState, const Action &action, const std::vector< Wire > &controlWires) |
| std::vector< Wire > | newControlWires (const Wire &correctWire, std::vector< Wire > controlWires) |
Protected Attributes | |
| std::vector< History > | histories |
A highly effective reinforcement learning control system (Truell and Gruenstein)
| FidoControlSystem::FidoControlSystem | ( | int | stateDimensions, |
| Action | minAction, | ||
| Action | maxAction, | ||
| int | baseOfDimensions | ||
| ) |
Initializes a FidoControlSystem.
| stateDimensions | the number of dimensions of the state being fed to the control system (aka. number of elements in the state vector) |
| minAction | the minimum possible action (e.g. a vector of doubles) that the control system may output |
| maxAction | the maximum possible action (e.g. a vector of doubles) that the control system may output |
| baseOfDimensions | the number of possible descrete values in each dimension. Ex. if baseOfDimensions=2, minAction={0, 0}, maxAction={1, 1}, possibleActions={{0, 0}, {0, 1}, {1, 0}, {1, 1}}. |
|
protected |
|
virtual |
Update the control system's model, by giving it reward for its last action.
| reward | the reward associated with the control system's last action |
| newState | the new state vector (needed because states may change after performing an action) |
Implements rl::Learner.
| std::vector< double > FidoControlSystem::chooseBoltzmanActionDynamic | ( | State | state | ) |
|
protected |
|
virtual |
Reverts the control system to a newely initialized state.
Reset's the control system's model and wipes the system's memory of past actions, states, and rewards.
Implements rl::Learner.
|
protected |
|
protected |
| double rl::FidoControlSystem::explorationLevel |
|
protected |
| const double rl::FidoControlSystem::initialExploration = 1 |
| double rl::FidoControlSystem::lastUncertainty |
| const unsigned int rl::FidoControlSystem::samplesOfHistory = 10 |
1.8.6