Main Content

Create Simulink Reinforcement Learning Environments

In a reinforcement learning scenario, where you train an agent to complete a task, the environment models the dynamics with which the agent interacts. As shown in the following figure, the environment:

  1. Receives actions from the agent.

  2. Outputs observations in response to the actions.

  3. Generates a reward measuring how well the action contributes to achieving the task.

Diagram showing an agent that interacts with its environment. The observation signal goes from the environment to the agent, and the action signal goes from the agent to the environment. The reward signal goes from the environment to the reinforcement learning algorithm inside the agent. The reinforcement learning algorithm uses the available information to update a policy. The agent uses a policy to map an observation to an action. This is similar to a control diagram, shown below, in which a controller senses an error between a desired reference and a plant output and uses the error to acts on a plant input.

Creating an environment model includes defining the following:

  • Action and observation signals that the agent uses to interact with the environment.

  • Reward signal that the agent uses to measure its success. For more information, see Define Reward Signals.

  • Environment dynamic behavior.

Action and Observation Signals

When you create an environment object, you must specify the action and observation signals that the agent uses to interact with the environment. You can create both discrete and continuous action spaces. For more information, see rlNumericSpec and rlFiniteSetSpec, respectively.

What signals you select as actions and observations depends on your application. For example, for control system applications, the integrals (and sometimes derivatives) of error signals are often useful observations. Also, for reference-tracking applications, having a time-varying reference signal as an observation is helpful.

When you define your observation signals, ensure that all the system states are observable through the observations. For example, an image observation of a swinging pendulum has position information but does not have enough information to determine the pendulum velocity. In this case, you can specify the pendulum velocity as a separate observation.

Predefined Simulink Environments

Reinforcement Learning Toolbox™ software provides predefined Simulink® environments for which the actions, observations, rewards, and dynamics are already defined. You can use these environments to:

  • Learn reinforcement learning concepts.

  • Gain familiarity with Reinforcement Learning Toolbox software features.

  • Test your own reinforcement learning agents.

For more information, see Load Predefined Simulink Environments.

Custom Simulink Environments

To specify your own custom reinforcement learning environment, create a Simulink model with an RL Agent block. In this model, connect the action, observation, and reward signals to the RL Agent block. For an example, see Water Tank Reinforcement Learning Environment Model.

For the action and observation signals, you must create specification objects using rlNumericSpec for continuous signals and rlFiniteSetSpec for discrete signals. For bus signals, create specifications using bus2RLSpec.

For the reward signal, construct a scalar signal in the model and connect this signal to the RL Agent block. For more information, see Define Reward Signals.

After configuring the Simulink model, create an environment object for the model using the rlSimulinkEnv function.

If you have a reference model with an appropriate action input port, observation output port, and scalar reward output port, you can automatically create a Simulink model that includes this reference model and an RL Agent block. For more information, see createIntegratedEnv. This function returns the environment object, action specifications, and observation specifications for the model.

Your environment can include third-party functionality. For more information, see Integrate with Existing Simulation or Environment (Simulink).

See Also

Functions

Related Topics