Main Content

Policies and Value Functions

Define policy and value function approximators, such as actors and critics

A reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. A value function is a mapping from an environment observation (or observation-action pair) to the value (the expected cumulative long-term reward) of a policy. During training, the agent tunes the parameters of its policy and value function approximators to maximize the long-term reward.

Reinforcement Learning Toolbox™ software provides approximator objects for actors and critics. The actor learns the policy that selects the best action to take. The critic learns the value (or Q-value) function that estimates the value of the current policy. Depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. For more information, see Create Policies and Value Functions.

Blocks

PolicyReinforcement learning policy

Functions

expand all

rlTableValue table or Q table
rlValueFunctionValue function approximator object for reinforcement learning agents
rlQValueFunction Q-Value function approximator object for reinforcement learning agents
rlVectorQValueFunction Vector Q-value function approximator for reinforcement learning agents
rlContinuousDeterministicActor Deterministic actor with a continuous action space for reinforcement learning agents
rlDiscreteCategoricalActorStochastic categorical actor with a discrete action space for reinforcement learning agents
rlContinuousGaussianActorStochastic Gaussian actor with a continuous action space for reinforcement learning agents
getActorExtract actor from reinforcement learning agent
setActorSet actor of reinforcement learning agent
getCriticExtract critic from reinforcement learning agent
setCriticSet critic of reinforcement learning agent
getModelGet function approximator model from actor or critic
setModelSet function approximation model for actor or critic
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
rlOptimizerOptionsOptimization options for actors and critics
getGreedyPolicyExtract greedy (deterministic) policy object from agent
getExplorationPolicyExtract exploratory (stochastic) policy object from agent
rlMaxQPolicyPolicy object to generate discrete max-Q actions for custom training loops and application deployment
rlEpsilonGreedyPolicyPolicy object to generate discrete epsilon-greedy actions for custom training loops
rlDeterministicActorPolicyPolicy object to generate continuous deterministic actions for custom training loops and application deployment
rlAdditiveNoisePolicyPolicy object to generate continuous noisy actions for custom training loops
rlStochasticActorPolicyPolicy object to generate stochastic actions for custom training loops and application deployment
getActionObtain action from agent, actor, or policy object given environment observations
getValueObtain estimated value from a critic given environment observations and actions
getMaxQValueObtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
evaluateEvaluate function approximator object given observation (or observation-action) input data
gradientEvaluate gradient of function approximator object given observation and action input data
accelerateOption to accelerate computation of gradient for approximator object based on neural network
quadraticLayerQuadratic layer for actor or critic network
scalingLayerScaling layer for actor or critic network
softplusLayerSoftplus layer for actor or critic network
featureInputLayerFeature input layer
reluLayerRectified Linear Unit (ReLU) layer
tanhLayerHyperbolic tangent (tanh) layer
fullyConnectedLayerFully connected layer
lstmLayerLong short-term memory (LSTM) layer for recurrent neural network (RNN)
softmaxLayerSoftmax layer

Topics