A Learner that follows the Q-Learning algorithm. More...
#include <QLearn.h>
Inherits rl::Learner.
Public Member Functions | |
QLearn (net::NeuralNet *modelNetwork, net::Trainer *trainer_, double learningRate_, double devaluationFactor_, std::vector< Action > possibleActions_) | |
Initializes a QLearn object with a model network and the values of learning rate and devaluationFactor. More... | |
QLearn (std::vector< Model > models_, net::Trainer *trainer_, double learningRate_, double devaluationFactor_) | |
Initializes a QLearn object with an vector of networks and the values of learning rate and devaluationFactor. More... | |
QLearn () | |
Initializes an empty, non-valid Q-learning object. More... | |
Action | chooseBestAction (State currentState) |
Gets the action that the network deems most benificial for the current state. More... | |
Action | chooseBoltzmanAction (State currentState, double explorationConstant) |
Gets an action using the Boltzman softmax probability distribution. More... | |
void | applyReinforcementToLastAction (double reward, State newState) |
Updates expected reward values. More... | |
void | reset () |
Reverts the system to a newely initialized state. More... | |
void | store (std::ofstream *output) |
Stores this model in a stream. More... | |
A Learner that follows the Q-Learning algorithm.
QLearn::QLearn | ( | net::NeuralNet * | modelNetwork, |
net::Trainer * | trainer_, | ||
double | learningRate_, | ||
double | devaluationFactor_, | ||
std::vector< Action > | possibleActions_ | ||
) |
Initializes a QLearn object with a model network and the values of learning rate and devaluationFactor.
modelNetwork | a neural network that is used as a model architecture for the networks that will rate the reward of each action. |
backprop_ | the model that will train the neural networks that will rate the reward of each action |
learningRate_ | a constant between 0 and 1 that dictates how fast the robot learns from reinforcement. |
devaluationFactor_ | a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward. |
possibleActions_ | all of the actions that this object could possibly choose |
QLearn::QLearn | ( | std::vector< Model > | models_, |
net::Trainer * | trainer_, | ||
double | learningRate_, | ||
double | devaluationFactor_ | ||
) |
Initializes a QLearn object with an vector of networks and the values of learning rate and devaluationFactor.
models_ | he vector of networks contains the networks that will rate the reward of each action |
backprop_ | the model that will train the neural networks that will rate the reward of each action |
learningRate_ | a constant between 0 and 1 that dictates how fast the robot learns from reinforcement. |
devaluationFactor_ | a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward. |
QLearn::QLearn | ( | ) |
Initializes an empty, non-valid Q-learning object.
|
virtual |
Updates expected reward values.
Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the corect reward value.
reward | the reward value from the last action |
newState | the new state (aka. inputs) to the system |
Implements rl::Learner.
Gets the action that the network deems most benificial for the current state.
currentState | a vector of doubles representing the "inputs" or sensor values of the system |
Implements rl::Learner.
Gets an action using the Boltzman softmax probability distribution.
Non-random search heuristic used so that the neural network explores actions despite their reward value. The lower the exploration constanstant, the more likely it is to pick the best action for the current state.
currentState | a vector of doubles representing the "inputs" or sensor values of the system |
explorationConstant | a positive floating point number representing the exploration level of the system. Common values range from 0.01 to 1. The higher this number is, the more likely it is that the system will pick worse actions. |
Incase a floating point error resulted in no action
Implements rl::Learner.
|
virtual |
Reverts the system to a newely initialized state.
Reset's the system's model and wipes the system's memory of past actions, states, and rewards.
Implements rl::Learner.
void rl::QLearn::store | ( | std::ofstream * | output | ) |
Stores this model in a stream.