Semantic Segmentation on NVIDIA DRIVE

This example shows how to generate and deploy a CUDA® executable for an image segmentation application that uses deep learning. It uses the GPU Coder™ Support Package for NVIDIA GPUs to deploy the executable on NVIDIA DRIVE™ platform. This example performs code generation on the host computer and builds the generated code on the target platform by using remote build capability of the support package. For more information on Semantic Segmentation, see Code Generation for Semantic Segmentation Network.

Prerequisites

Target Board Requirements

NVIDIA DRIVE PX2 embedded platform.
Ethernet crossover cable to connect the target board and host PC (if the target board cannot be connected to a local network).
NVIDIA CUDA toolkit installed on the board.
NVIDIA cuDNN library (v5 and above) on the target.
OpenCV 3.0 or higher library on the target for reading and displaying images/videoo
Environment variables on the target for the compilers and libraries. For information on the supported versions of the compilers and libraries and their setup, see installing and setting up prerequisites for NVIDIA boards.

Development Host Requirements

GPU Coder (TM) for code generation. For an overview and tutorials, visit the GPU Coder product page.
Deep Learning Toolbox™ to use a DAG network object.
GPU Coder Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.
NVIDIA CUDA toolkit on the host.
Environment variables on the host for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products. For setting up the environment variables, see Environment Variables.

Create a Folder and Copy Relevant Files

The following line of code creates a folder in your current working directory (host), and copies all the relevant files into this folder. If you cannot generate files in this folder, change your current working directory before running this commnad.

gpucoderdemo_setup('segnet_deploy');

Connect to the NVIDIA Hardware

The GPU Coder Support Package for NVIDIA GPUs uses an SSH connection over TCP/IP to execute commands while building and running the generated CUDA code on the DRIVE platform. You must therefore connect the target platform to the same network as the host computer or use an Ethernet crossover cable to connect the board directly to the host computer. Refer to the NVIDIA documentation on how to set up and configure your board.

To communicate with the NVIDIA hardware, you must create a live hardware connection object by using the drive function. You must know the host name or IP address, username, and password of the target board to create a live hardware connection object. For example,

hwobj = drive('drive-px2-name','ubuntu','ubuntu');

NOTE:

In case of a connection failure, a diagnostics error message is reported on the MATLAB command line. If the connection has failed, the most likely cause is incorrect IP address or hostname.

Verify the GPU Environment

Use the coder.checkGpuInstall function and verify that the compilers and libraries needed for running this example are set up correctly.

envCfg = coder.gpuEnvConfig('drive');
envCfg.BasicCodegen = 1;
envCfg.Quiet = 1;
envCfg.HardwareObject = hwobj;
coder.checkGpuInstall(envCfg);

Get the Pretrained SegNet DAG Network Object

net = getSegNet();

The DAG network contains 91 layers including convolution, batch normalization, pooling, unpooling and the pixel classification output layers. To see all the layers of the network, enter net.Layers in the MATLAB® command window.

Generate CUDA Code for the Target Using GPU Coder

This example uses segnet_predict.m, is the entry-point function for code generation. To generate a CUDA executable that can deployed on to an NVIDIA target, create a GPU code configuration object for generating an executable.

cfg = coder.gpuConfig('exe');

When there are multiple live connection objects for different targets, the code generator performs remote build on the target for which a recent live object was created. To choose a hardware board for performing remote build, use the setupCodegenContext() method of the respective live hardware object. If only one live connection object was created, it is not necessary to call this method.

hwobj.setupCodegenContext;

Use the coder.hardware function to create a configuration object for the DRIVE platform and assign it to the Hardware property of the code configuration object cfg.

cfg.Hardware = coder.hardware('NVIDIA Drive');

Use the BuildDir property to specify the directory to perform remote build process on the target. If the specified build directory does not exist on the target then the software creates a directory with the given name. If no value is assigned to cfg.Hardware.BuildDir, the remote build process happens in the last specified build directory. If there is no stored build directory value, the build process takes place in the home directory of the user associated with the live connection object.

cfg.Hardware.BuildDir = '~/';

Certain NVIDIA platforms such as DRIVE PX2 contain multiple GPUs. On such platforms, use the SelectCudaDevice property in the GPU configuration object to select a specific GPU.

cfg.GpuConfig.SelectCudaDevice = 0;

The custom main file is a wrapper that calls the predict function in the generated code. Post processing steps are added in the main file using OpenCV interfaces. The output of segnet prediction is a 11-channel image. The eleven channels here represent the prediction scores of eleven different classes. In post processing, each pixel is assigned a class label that has the maximum score among the 11 channels. Each class is associated with unique color for visualization. The final output is shown using the OpenCV imshow function.

cfg.CustomSource  = fullfile('main.cu');

In this example, code generation is done using image as input to the network. However, the custom main file is coded to take video as input and do segnet prediction for each frame in the video sequence. The compiler and linker flags required to build with OpenCV library are updated in the buildinfo in segnet_predict.m file.

Generate sample image input for code generation.

img = imread('peppers.png');
img = imresize(img,[360 480]);

To generate CUDA code, use the codegen function and pass the GPU code configuration object. After the code generation takes place on the host, the generated files are copied over and built on the target.

codegen('-config ', cfg, 'segnet_predict', '-args', {img},'-report');

Run the Executable on the Target

Copy the input test video to the target workspace directory, using the workspaceDir property of the hardware object. This property contains the path to the codegen folder on the target.

hwobj.putFile('CamVid.avi', hwobj.workspaceDir);

Use the runApplication() method of the hardware object to launch the exectuable on the target hardware.

hwobj.runApplication('segnet_predict','CamVid.avi');

The segmented image output is displayed in a window on the monitor connected to the target board.

You can kill the running executable on the target from the MATLAB environment on the host by using the killApplication() method of the hardware object. This method uses the name of the application and not the executable.

hwobj.killApplication('segnet_predict');

Cleanup

Remove the files and return to the original folder.

cleanup