Fido
 All Classes Namespaces Files Functions Variables Typedefs Macros Pages
Public Member Functions | List of all members
rl::QLearn Class Reference

A Learner that follows the Q-Learning algorithm. More...

#include <QLearn.h>

Inherits rl::Learner.

Public Member Functions

 QLearn (net::NeuralNet *modelNetwork, net::Trainer *trainer_, double learningRate_, double devaluationFactor_, std::vector< Action > possibleActions_)
 Initializes a QLearn object with a model network and the values of learning rate and devaluationFactor. More...
 
 QLearn (std::vector< Model > models_, net::Trainer *trainer_, double learningRate_, double devaluationFactor_)
 Initializes a QLearn object with an vector of networks and the values of learning rate and devaluationFactor. More...
 
 QLearn ()
 Initializes an empty, non-valid Q-learning object. More...
 
Action chooseBestAction (State currentState)
 Gets the action that the network deems most benificial for the current state. More...
 
Action chooseBoltzmanAction (State currentState, double explorationConstant)
 Gets an action using the Boltzman softmax probability distribution. More...
 
void applyReinforcementToLastAction (double reward, State newState)
 Updates expected reward values. More...
 
void reset ()
 Reverts the system to a newely initialized state. More...
 
void store (std::ofstream *output)
 Stores this model in a stream. More...
 

Detailed Description

A Learner that follows the Q-Learning algorithm.

Constructor & Destructor Documentation

QLearn::QLearn ( net::NeuralNet modelNetwork,
net::Trainer trainer_,
double  learningRate_,
double  devaluationFactor_,
std::vector< Action possibleActions_ 
)

Initializes a QLearn object with a model network and the values of learning rate and devaluationFactor.

Parameters
modelNetworka neural network that is used as a model architecture for the networks that will rate the reward of each action.
backprop_the model that will train the neural networks that will rate the reward of each action
learningRate_a constant between 0 and 1 that dictates how fast the robot learns from reinforcement.
devaluationFactor_a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward.
possibleActions_all of the actions that this object could possibly choose
QLearn::QLearn ( std::vector< Model models_,
net::Trainer trainer_,
double  learningRate_,
double  devaluationFactor_ 
)

Initializes a QLearn object with an vector of networks and the values of learning rate and devaluationFactor.

Parameters
models_he vector of networks contains the networks that will rate the reward of each action
backprop_the model that will train the neural networks that will rate the reward of each action
learningRate_a constant between 0 and 1 that dictates how fast the robot learns from reinforcement.
devaluationFactor_a constant between 0 and 1 that weighs future reward vs immediate reward. A value of 0 will make the network only value immediate reward, while a value of 1 will make it consider future reward with the same weight as immediate reward.
QLearn::QLearn ( )

Initializes an empty, non-valid Q-learning object.

Member Function Documentation

void QLearn::applyReinforcementToLastAction ( double  reward,
State  newState 
)
virtual

Updates expected reward values.

Given the immediate reward from the last action taken and the new state, this function updates the correct value for the longterm reward of the lastAction and trains the network in charge of the lastAction to output the corect reward value.

Parameters
rewardthe reward value from the last action
newStatethe new state (aka. inputs) to the system

Implements rl::Learner.

Action QLearn::chooseBestAction ( State  currentState)
virtual

Gets the action that the network deems most benificial for the current state.

Parameters
currentStatea vector of doubles representing the "inputs" or sensor values of the system

Implements rl::Learner.

Action QLearn::chooseBoltzmanAction ( State  currentState,
double  explorationConstant 
)
virtual

Gets an action using the Boltzman softmax probability distribution.

Non-random search heuristic used so that the neural network explores actions despite their reward value. The lower the exploration constanstant, the more likely it is to pick the best action for the current state.

Parameters
currentStatea vector of doubles representing the "inputs" or sensor values of the system
explorationConstanta positive floating point number representing the exploration level of the system. Common values range from 0.01 to 1. The higher this number is, the more likely it is that the system will pick worse actions.

Incase a floating point error resulted in no action

Implements rl::Learner.

void QLearn::reset ( )
virtual

Reverts the system to a newely initialized state.

Reset's the system's model and wipes the system's memory of past actions, states, and rewards.

Implements rl::Learner.

void rl::QLearn::store ( std::ofstream *  output)

Stores this model in a stream.


The documentation for this class was generated from the following files: