Developing OpenVINO Object Detection Inferencing Plugin for Unity*...

By Christian Mills

Overview

In Part 1 of the tutorial, we first installed Unity*, OpenVINO™, and the prerequisite software. We then downloaded some pretrained models in the OpenVINO™ Intermediate Representation format, along with some videos to test the models.

In Part 2, we walked through the steps to create a Dynamic link library (DLL) in Visual Studio to perform inference with the pre-trained deep learning models.

In this part, we will demonstrate how to create a Unity project to access the DLL as a plugin.

Create New Project

Open Unity* Hub and click the New button.

We can stick with the default 3D template and name the project OpenVINO_YOLOX_Demo. Take a note of the path where the project will be generated and click Create.

Add GUI

At this moment, we cannot use our OpenVINO™ plugin inside the Unity* editor. There appears to be a dependency conflict, between one of the dependencies for this version of OpenVINO™ and the Unity* editor. However, this should be resolved for future versions of OpenVINO™.

For now, we need to build our Unity* project without the plugin files, and then add them to the build folder where the project executable file is located.

We will be using the free Graphy asset to display frame rate and other performance metrics at runtime. A simple user interface is already set up in the prefab linked below.

Canvas Prefab

Import Canvas Prefab

Download the Canvas.prefab file from the above link and drop it into the assets folder in the Unity* editor.

Drag and drop the prefab from the Assets folder into the Hierarchy tab. A TMP Importer popup window will appear. Click Import TMP Essentials. Close the popup window once the import is complete.

If we select the Game tab, we can see the interface we just added. Don't worry if it looks squished.

Install Graphy Package

The next step requires being signed up into a Unity* account. Open the link to the Graphy Unity* Store page below and click Add to My Assets.

Graphy Asset Store Page: (link)

Add to My Assets button should change to Open In Unity*. Click the button again to open the asset in the Package Manager back in the editor.

Package Manager window should popup in the editor with the Graphy asset selected. Click the download button in the bottom right corner.

Click Import once the package is finished downloading. An Import Unity* Package popup window will appear.

Click Import in the popup window. Close the Package Manager window once the import is complete. There should now be a new folder called Graphy - Ultimate Stats Monitor in the Assets folder.

Inside the new folder, open the Prefab folder and drag the [Graphy] prefab into the Hierarchy tab. We will see that our game scene gets updated.

With the [Graphy] object still selected in the Hierarchy tab. Scroll down in the Inspector tab to the Graphy Manager (Script) section. Open the Graph modules position dropdown and select TOP_LEFT. Nothing will change in the game view, but the position will be updated while building the project.

Create Video Player

For this demo, we will obtain our input images from a video or webcam feed playing in the scene.

Create the Videos Folder

In the Assets window, right-click an empty space, select the Create option, and click Folder. Name the folder Videos.

Double-click the Videos folder to open it.

Add Video Files

Drag and drop any video files from the file explorer into the Videos folder. We will be using the file names to populate the dropdown menu in the UI, so rename the files according to their target object class.

Create the Video Screen

Use a Quad object for the screen. Right-click an empty space in the Hierarchy tab. Select the 3D Object section and click Quad. Name it VideoScreen.

We will be updating the VideoScreen dimensions in code based on the resolution of the video or webcam feed.

Add Video Player Component

Unity* has a Video Player component that provides the functionality to attach video files to the VideoScreen. With the VideoScreen object selected in the Hierarchy tab, click the Add Component button in the Inspector tab.

Type video into the search box and select Video Player from the search results.

We do not need to manually assign a video clip as this will be done in code.

Make the Video Loop

Tick the Loop checkbox in the Inspector tab to repeat the video while the project is running.

Add Label Canvas

Attach the labels for each bounding box to a separate Canvas rather than the one containing the user interface. This will allow us to hide the user interface without hiding the labels.

Right-click an empty space in the Hierarchy tab. Select the UI section and click Canvas. Name it Label Canvas.

Enable Unsafe Code

In order to pass references to Unity* variables to the OpenVINO™ plugin we need to enable unsafe code. Open the Edit menu at the top of the Unity* Editor and select Project Settings.

Select Player from the side menu and open the Other Settings dropdown menu.

Scroll down to the Allow 'unsafe' Code checkbox and enable it.

Include Unlit Shaders

Use Unlit shaders for both video screen and the bounding boxes. By default, Unlit shaders are not included in project builds. We need include them manually in the project settings.

While still in the Project Settings window, select the Graphics submenu and scroll down to the Always Included Shaders section. Update the Size value to add an extra Element spot.

Select the new bottom shader spot.

Type Unlit/Texture shader into the Select Shader window and select Unlit/Texture from the available options. We can then close the Select Shader window.

The Unlit/Color shader are required for the bounding boxes so repeat these steps to add it as well.

Now close the project settings window.

Set Background Color

One last thing to make things a bit cleaner is to set the background color to a solid color. This will look a bit nicer when we play videos that have a different aspect ratio than the screen.

Select the Main Camera object in the Hierarchy tab. Over in the Inspector tab, open the Clear Flags dropdown and select Solid Color.

Click on the color bar next to Background to set the color.

We can make it pure black by setting the RGB color values to 0.

Set Up Project Code

Now, we can implement the code for the project.

ImageProcessing Compute Shader

The pixel data from Unity* gets flipped when loaded into the cv::Mat texture variable in the DLL. Flip the image before sending it to the plugin so that it will be in the correct orientation for the model. Implement these steps on the GPU in a compute shader.

Create the Asset File

Create a new assets folder called Shaders. Enter the Shaders folder and right-click an empty space. Select Shader in the Create submenu and click Compute Shader. Name it ImageProcessing.

Remove the Default Codez

Open the ImageProcessing file in the code editor. By default, the ComputeShader will contain the following code.

// Each #kernel tells which function to compile; you can have many kernels 
#pragma kernel CSMain 

// Create a RenderTexture with enableRandomWrite flag and set it 
// with cs.SetTexture 
RWTexture2D<float4> Result; 

[numthreads(8,8,1)] 
void CSMain (uint3 id : SV_DispatchThreadID) 
{ 

    // TODO: insert actual code here! 
    
    Result[id.xy] = float4(id.x & id.y, (id.x & 15)/15.0, (id.y & 15)/15.0, 0.0); 
}

Delete the CSMain function along with the #pragma kernel CSMain.

Create FlipXAxis() Function

Next, add a Texture2D variable to store the input image. Name it InputImage and give it a data type of <half4>. Use the same data type for the Result variable as well.

We also need to know the height and width of the image to swap the pixel values. Store these values in int variables and store the new (x,y) coordinates for individual pixel values in an int2 variable.

Code:

// Each #kernel tells which function to compile; you can have many kernels 
#pragma kernel FlipXAxis 

// Create a RenderTexture with enableRandomWrite flag and set it 
// with cs.SetTexture 
RWTexture2D<half4> Result; 
// Stores the input image and is set with cs.SetTexture 
Texture2D<half4> InputImage; 

// The height of the input image 
int height; 
// Stores the new location for individual pixel values 
int2 coords; 

[numthreads(4, 4, 1)] 
void FlipXAxis(uint3 id : SV_DispatchThreadID) 
{ 
    // Update the y value for the pixel coordinates 
    coords = int2(id.x, height - id.y); 
    Result[id.xy] = float4(InputImage[coords].x, InputImage[coords].y, InputImage[coords].z, 1.0f); }

Create Utils Script

Create an Object struct for Unity* to match the one we defined for the OpenVINO™ code. Place the Object and the list of object classes inside a separate C# script called Utils so that we can access them from multiple scripts.

Create the Asset File

Create a new assets folder called Scripts. In the Scripts folder, right-click an empty space and select C# Script in the Create submenu. Name the script Utils.

Required Namespaces

We will start by adding the required namespaces at the top of the script.

System: Contains fundamental classes and base classes that define commonly-used value and reference data types, events and event handlers, interfaces, attributes, and processing exceptions
System.Runtime.InteropServices: Provides a wide variety of members that support COM interop and platform invoke services
using UnityEngine;
using System;
using System.Runtime.InteropServices;

Object Struct

The Object struct in Unity* has the exact same variables as the one in the DLL. We need to specify how the data in the struct is organized so that we can properly modify the Object array in the PopulateObjectsArray() function. We can use the StructLayout attribute to indicate how the data is laid out. In our case, the data is laid out sequentially, so we specify LayoutKind.Sequential.

Code:

// Indicate that the members of the struct are laid out sequentially [StructLayout(LayoutKind.Sequential)] 
/// <summary> 
/// Stores the information for a single object 
/// </summary> 
public struct Object 
{ 
   // The X coordinate for the top left bounding box corner 
   public float x0; 
   // The Y coordinate for the top left bounding box corner 
   public float y0; 
   // The width of the bounding box 
   public float width; 
   // The height of the bounding box 
   public float height; 
   // The object class index for the detected object 
   public int label; 
   // The model confidence score for the object 
   public float prob; 

   public Object(float x0, float y0, float width, float height, int label, 
float prob) 
   { 

      this.x0 = x0; 
      this.y0 = y0; 
      this.width = width; 
      this.height = height; 
      this.label = label; 
      this.prob = prob; 
   } 
}

object_classes Tuple

Store the ordered list of object classes along with unique color codes in a Tuple array. The color codes will be used to set the color for the bounding boxes.

Note: The object classes need to be in the same order as what the model was trained on. This array will need to be updated when using models trained on a different dataset.

Code:

/// <summary> 
/// The color coded, ordered list of object classes the model was trained to detect 
/// </summary> 
public static Tuple<string, Color>[] object_classes = new Tuple<string, 
Color>[] 
{ 
   Tuple.Create("person",              new Color(0.000f, 0.447f, 0.741f)), 
   Tuple.Create("bicycle",             new Color(0.850f, 0.325f, 0.098f)), 
   Tuple.Create("car",                 new Color(0.929f, 0.694f, 0.125f)), 
   Tuple.Create("motorcycle",          new Color(0.494f, 0.184f, 0.556f)), 
   Tuple.Create("airplane",            new Color(0.466f, 0.674f, 0.188f)), 
   Tuple.Create("bus",                 new Color(0.301f, 0.745f, 0.933f)), 
   Tuple.Create("train",               new Color(0.635f, 0.078f, 0.184f)), 
   Tuple.Create("truck",               new Color(0.300f, 0.300f, 0.300f)), 
   Tuple.Create("boat",                new Color(0.600f, 0.600f, 0.600f)), 
   Tuple.Create("traffic light",       new Color(1.000f, 0.000f, 0.000f)), 
   Tuple.Create("fire hydrant",        new Color(1.000f, 0.500f, 0.000f)), 
   Tuple.Create("stop sign",           new Color(0.749f, 0.749f, 0.000f)), 
   Tuple.Create("parking meter",       new Color(0.000f, 1.000f, 0.000f)), 
   Tuple.Create("bench",               new Color(0.000f, 0.000f, 1.000f)), 
   Tuple.Create("bird",                new Color(0.667f, 0.000f, 1.000f)), 
   Tuple.Create("cat",                 new Color(0.333f, 0.333f, 0.000f)), 
   Tuple.Create("dog",                 new Color(0.333f, 0.667f, 0.000f)), 
   Tuple.Create("horse",               new Color(0.333f, 1.000f, 0.000f)), 
   Tuple.Create("sheep",               new Color(0.667f, 0.333f, 0.000f)), 
   Tuple.Create("cow",                 new Color(0.667f, 0.667f, 0.000f)), 
   Tuple.Create("elephant",            new Color(0.667f, 1.000f, 0.000f)), 
   Tuple.Create("bear",                new Color(1.000f, 0.333f, 0.000f)), 
   Tuple.Create("zebra",               new Color(1.000f, 0.667f, 0.000f)), 
   Tuple.Create("giraffe",             new Color(1.000f, 1.000f, 0.000f)), 
   Tuple.Create("backpack",            new Color(0.000f, 0.333f, 0.500f)), 
   Tuple.Create("umbrella",            new Color(0.000f, 0.667f, 0.500f)), 
   Tuple.Create("handbag",             new Color(0.000f, 1.000f, 0.500f)), 
   Tuple.Create("tie",                 new Color(0.333f, 0.000f, 0.500f)), 
   Tuple.Create("suitcase",            new Color(0.333f, 0.333f, 0.500f)), 
   Tuple.Create("frisbee",             new Color(0.333f, 0.667f, 0.500f)), 
   Tuple.Create("skis",                new Color(0.333f, 1.000f, 0.500f)), 
   Tuple.Create("snowboard",           new Color(0.667f, 0.000f, 0.500f)), 
   Tuple.Create("sports ball",         new Color(0.667f, 0.333f, 0.500f)),
   Tuple.Create("kite",                new Color(0.667f, 0.667f, 0.500f)), 
   Tuple.Create("baseball bat",        new Color(0.667f, 1.000f, 0.500f)), 
   Tuple.Create("baseball glove",      new Color(1.000f, 0.000f, 0.500f)), 
   Tuple.Create("skateboard",          new Color(1.000f, 0.333f, 0.500f)), 
   Tuple.Create("surfboard",           new Color(1.000f, 0.667f, 0.500f)), 
   Tuple.Create("tennis racket",       new Color(1.000f, 1.000f, 0.500f)), 
   Tuple.Create("bottle",              new Color(0.000f, 0.333f, 1.000f)), 
   Tuple.Create("wine glass",          new Color(0.000f, 0.667f, 1.000f)), 
   Tuple.Create("cup",                 new Color(0.000f, 1.000f, 1.000f)), 
   Tuple.Create("fork",                new Color(0.333f, 0.000f, 1.000f)), 
   Tuple.Create("knife",               new Color(0.333f, 0.333f, 1.000f)), 
   Tuple.Create("spoon",               new Color(0.333f, 0.667f, 1.000f)), 
   Tuple.Create("bowl",                new Color(0.333f, 1.000f, 1.000f)), 
   Tuple.Create("banana",              new Color(0.667f, 0.000f, 1.000f)), 
   Tuple.Create("apple",               new Color(0.667f, 0.333f, 1.000f)), 
   Tuple.Create("sandwich",            new Color(0.667f, 0.667f, 1.000f)), 
   Tuple.Create("orange",              new Color(0.667f, 1.000f, 1.000f)), 
   Tuple.Create("broccoli",            new Color(1.000f, 0.000f, 1.000f)), 
   Tuple.Create("carrot",              new Color(1.000f, 0.333f, 1.000f)), 
   Tuple.Create("hot dog",             new Color(1.000f, 0.667f, 1.000f)), 
   Tuple.Create("pizza",               new Color(0.333f, 0.000f, 0.000f)), 
   Tuple.Create("donut",               new Color(0.500f, 0.000f, 0.000f)), 
   Tuple.Create("cake",                new Color(0.667f, 0.000f, 0.000f)), 
   Tuple.Create("chair",               new Color(0.833f, 0.000f, 0.000f)), 
   Tuple.Create("couch",               new Color(1.000f, 0.000f, 0.000f)), 
   Tuple.Create("potted plant",        new Color(0.000f, 0.167f, 0.000f)), 
   Tuple.Create("bed",                 new Color(0.000f, 0.333f, 0.000f)), 
   Tuple.Create("dining table",        new Color(0.000f, 0.500f, 0.000f)), 
   Tuple.Create("toilet",              new Color(0.000f, 0.667f, 0.000f)), 
   Tuple.Create("tv",                  new Color(0.000f, 0.833f, 0.000f)), 
   Tuple.Create("laptop",              new Color(0.000f, 1.000f, 0.000f)), 
   Tuple.Create("mouse",               new Color(0.000f, 0.000f, 0.167f)), 
   Tuple.Create("remote",              new Color(0.000f, 0.000f, 0.333f)), 
   Tuple.Create("keyboard",            new Color(0.000f, 0.000f, 0.500f)), 
   Tuple.Create("cell phone",          new Color(0.000f, 0.000f, 0.667f)), 
   Tuple.Create("microwave",           new Color(0.000f, 0.000f, 0.833f)), 
   Tuple.Create("oven",                new Color(0.000f, 0.000f, 1.000f)), 
   Tuple.Create("toaster",             new Color(0.000f, 0.000f, 0.000f)), 
   Tuple.Create("sink",                new Color(0.143f, 0.143f, 0.143f)), 
   Tuple.Create("refrigerator",        new Color(0.286f, 0.286f, 0.286f)), 
   Tuple.Create("book",                new Color(0.429f, 0.429f, 0.429f)), 
   Tuple.Create("clock",               new Color(0.571f, 0.571f, 0.571f)), 
   Tuple.Create("vase",                new Color(0.714f, 0.714f, 0.714f)), 
   Tuple.Create("scissors",            new Color(0.857f, 0.857f, 0.857f)), 
   Tuple.Create("teddy bear",          new Color(0.000f, 0.447f, 0.741f)), 
   Tuple.Create("hair drier",          new Color(0.314f, 0.717f, 0.741f)), 
   Tuple.Create("toothbrush",          new Color(0.50f, 0.5f, 0f)) 
};

Create BoundingBox Script

Implement the functionality for creating bounding boxes in a new script called BoundingBox. The BoundingBox class will handle creating a single bounding box along with updating its object class, position, and size. We will be creating as many BoundingBox instances as needed to handle the number of objects specified by the GetObjectCount() function.

Required Namespaces

TMPro: Provides advanced text rendering capabilities
using UnityEngine;
using TMPro;

Properties

bbox: A new GameObject will be created for each bounding box
text: The text showing the label and confidence score will be stored in a separate GameObject as it needs to be attached to a Canvas
canvas: To store a reference to the Label Canvas object so that we can attach the text object to it
info: This will store the Object struct for the bounding box
color: This will store the color specific to the current object class
lineWidth: Set the line thickness for the bounding box based on the current screen size; feel free to make this larger or smaller based on personal preference
fontSize: Set the font size for the bounding box text based on the current screen size; feel free to adjust this based on personal preference
textContent: Displays the text for the bounding using the TextMeshProUGUI component; we will attach this component to the text object
lineRenderer: Draw the bounding box itself using a LineRenderer component; we will attach this component to the bbox object

Code:

// Contains the bounding box 
private GameObject bbox = new GameObject(); 
// Contains the label text 
private GameObject text = new GameObject(); 
// The canvas on which the bounding box labels will be drawn 
private GameObject canvas = GameObject.Find("Label Canvas"); 

// The object information for the bounding box 
private Utils.Object info;
// The object class color 
private Color color; 

// The adjusted line width for the bounding box 
private int lineWidth = (int)(Screen.width * 1.75e-3); 
// The adjusted font size based on the screen size 
private float fontSize = (float)(Screen.width * 9e-3); 

// The label text 
private TextMeshProUGUI textContent; 

// Draws the bounding box 
private LineRenderer lineRenderer;

InitializeLabel()

The text for each bounding box will contain the name of the object class and the confidence score from the model output.

Make the text the color specified by the object class, left-justified, and aligned in the middle of the text box. The default size of the text box might be too small for some of the longer object names, so we shall make it a bit wider as well.

The text will be positioned just above the top-left corner of the bounding box. We can use the WorldToScreenPoint() method to map the desired in-game position to screen space.

Code:

/// <summary> 
    /// Initialize the label for the bounding box 
    /// </summary> 
    /// <param name="label"></param> 
    private void InitializeLabel() 
    { 
       // Set the label text 
       textContent.text = $"{text.name}: {(info.prob * 100).ToString("0.##")}%"; 
       // Set the text color 
       textContent.color = color; 
       // Set the text alignment 
       textContent.alignment = TextAlignmentOptions.MidlineLeft; 
       // Set the font size 
       textContent.fontSize = fontSize; 
       // Resize the text area 
       RectTransform rectTransform = text.GetComponent<RectTransform>(); 
       rectTransform.sizeDelta = new Vector2(250, 50); 
       // Position the label above the top left corner of the bounding box 
       Vector3 textPos = Camera.main.WorldToScreenPoint(new Vector3(info.x0, info.y0, -10f)); 
       float xOffset = rectTransform.rect.width / 2;
       textPos = new Vector3(textPos.x + xOffset, textPos.y + textContent.fontSize, textPos.z); 
       text.transform.position = textPos; 
    }

ToggleBBox()

Toggle the visibility for the bounding box and text using the SetActive() method. Use this to hide any extra bounding boxes and to unhide them when they are needed again.

Code:

/// <summary> 
/// Toggle the visibility for the bounding box 
/// </summary> 
/// <param name="show"></param> 
public void ToggleBBox(bool show) 
{ 
    bbox.SetActive(show); 
    text.SetActive(show); 
}

InitializeBBox()

The bounding box will be the color specified by the object class. The lineRender will consist of five points to create a closed box; four for each corner and one to close the box.

Point #0: Top-left corner
Point #1: Top-right corner
Point #2: Bottom-right corner
Point #3: Bottom-left corner
Point #4: Top-left corner (close loop)

Code:

/// <summary> 
/// Initialize the position and dimensions for the bounding box 
/// </summary> 
private void InitializeBBox() 
{  

    // Set the material color 
    lineRenderer.material.color = color; 

    // The bbox will consist of five points 
    lineRenderer.positionCount = 5; 

    // Set the width from the start point 
    lineRenderer.startWidth = lineWidth;

    // Set the width from the end point 
    lineRenderer.endWidth = lineWidth; 

    // Get object information 
    float x0 = info.x0; 
    float y0 = info.y0; 
    float width = info.width; 
    float height = info.height; 

    // Offset value to align the bounding box points 
    float offset = lineWidth / 2; 

    // Top left point 
    Vector3 pos0 = new Vector3(x0, y0, 0); 
    lineRenderer.SetPosition(0, pos0); 
    // Top right point 
    Vector3 pos1 = new Vector3(x0 + width, y0, 0); 
    lineRenderer.SetPosition(1, pos1); 
    // Bottom right point 
    Vector3 pos2 = new Vector3(x0 + width, (y0 - height) + offset, 0); 
    lineRenderer.SetPosition(2, pos2); 
    // Bottom left point 
    Vector3 pos3 = new Vector3(x0 + offset, (y0 - height) + offset, 0); 
    lineRenderer.SetPosition(3, pos3); 
    // Closing Point 
    Vector3 pos4 = new Vector3(x0 + offset, y0 + offset, 0); 
    lineRenderer.SetPosition(4, pos4); 

    // Make sure the bounding box is visible 
    ToggleBBox(true); 
}

SetObjectInfo()

Call the InitializeLabel() and InitializeBBox() methods from a new method called SetObjectInfo(). This method will also update the names for the bbox and text objects along with the color. Use this method to update the bounding boxes based on the latest model output.

Code:

/// <summary> 
/// Update the object info for the bounding box 
/// </summary> 
/// <param name="objectInfo"></param> 
public void SetObjectInfo(Utils.Object objectInfo) 
{ 

     // Set the object info 
     info = objectInfo; 
     // Get the object class label 
     bbox.name = Utils.object_classes[objectInfo.label].Item1;
     text.name = bbox.name; 
     // Get the object class color 
     color = Utils.object_classes[objectInfo.label].Item2;

     // Initialize the label 
     InitializeLabel(); 
     // Initialize the position and dimensions 
     InitializeBBox(); 
}

Constructor

There are a few steps that needs to be taken whenever a new BoundingBox instance is created.

First, attach the textContent component to the text object and make the text object a child of the Label Canvas.

Next, add the lineRender component to the bbox object and give it a new material with an Unlit/Color shader.

Lastly, we call the SetObjectInfo()method to update the bounding box and text.

Code:

/// <summary> 
/// Constructor for the bounding box 
/// </summary> 
/// <param name="objectInfo"></param> 
public BoundingBox(Utils.Object objectInfo) 
{ 
    // Add a text component to store the label text 
    textContent = text.AddComponent<TextMeshProUGUI>(); 
    // Assign text object to the label canvas 
    text.transform.SetParent(canvas.transform); 

    // Add a line renderer to draw the bounding box 
    lineRenderer = bbox.AddComponent<LineRenderer>(); 
    // Make LineRenderer Shader Unlit 
    lineRenderer.material = new Material(Shader.Find("Unlit/Color")); 

    // Update the object info for the bounding box 
    SetObjectInfo(objectInfo); 
}

Create ObjectDetector Script

Implement the functionality for calling the plugin functions and handling the model output in a new script. Open the Scripts folder in the Assets section and create a new C# script called ObjectDetector.

Add Required Namespaces

System: Contains fundamental classes and base classes that define commonly-used value and reference data types, events and event handlers, interfaces, attributes, and processing exceptions
System.Runtime.InteropServices: Provides a wide variety of members that support COM interop and platform invoke services
UnityEngine.UI: Provides access to UI elements
UnityEngine.Video: Provides access to the functionality for the Video Player component
UnityEngine.Rendering: Provides access to the elements of the rendering pipeline

Code:

using System.Collections.Generic; 
using UnityEngine; 
using UnityEngine.Video; 
using UnityEngine.Rendering; 
using System; 
using System.Runtime.InteropServices; 
using UnityEngine.UI;

Add Public Variables

videoScreen: We need a reference to the VideoScreen object so that we can update its dimensions and access the VideoPlayer component
imageProcessingShader: We need a reference to the imageProcessing shader so that we can access the FlipXAxis function
deviceDropdown: We need access to the device dropdown menu, and update the list of available compute devices
modelDropdown: The script will look for available model files when the application first starts and we will update the menu options based on what model files are found
videoDropdown:
inference: Inference toggle button in the UI will be used to indicate whether we want to execute the mode or just play the video feed
useWebcam: Webcam toggle button in the UI will be used to switch between using a video file or webcam feed as input for the model
useAsync: Use Async toggle button will be used to turn asynchronous GPU readback on and off (more on this later)
consoleText: We don't have direct access to the console while running the application. However, we can capture the text that gets sent to the console and display it in the user interface. This can be useful for debugging purposes
videoClips: We will select those video files that we want to have available in the Inspector tab.

Code:

[Tooltip("The screen for viewing preprocessed images")] 
public Transform videoScreen; 

[Tooltip("Performs the preprocessing steps")] 
public ComputeShader imageProcessingShader; 

[Tooltip("Switch between the available compute devices for OpenVINO")] 
public TMPro.TMP_Dropdown deviceDropdown; 

[Tooltip("Switch between the available OpenVINO models")] 
public TMPro.TMP_Dropdown modelDropdown; 

[Tooltip("Switch between the available video files")] 
public TMPro.TMP_Dropdown videoDropdown; 

[Tooltip("Turn stylization on and off")] 
public Toggle inference; 

[Tooltip("Use webcam feed as input")] 
public Toggle useWebcam; 

[Tooltip("Turn AsyncGPUReadback on and off")] 
public Toggle useAsync; 

[Tooltip("Text area to display console output")] 
public Text consoleText; 

[Tooltip("List of available video files")] 
public VideoClip[] videoClips;

Create DLL Method Declarations

Specify the name of the .dll file. Store this in a const string variable called dll.

Indicate that a given method is from a DLL by using the DllImport attribute.

Note: The DLL functions that return string values have IntPtr as their return values here. We will use the Marshal.PtrToStringAnsi() method to get that string value at the memory location stored in the pointer.

Code:

// Name of the DLL file 
const string dll = "OpenVINO_YOLOX_DLL"; 

[DllImport(dll)] 
private static extern IntPtr GetAvailableDevices(); 

[DllImport(dll)] 
private static extern IntPtr InitOpenVINO(string model, int width, int 
height, int device); 

[DllImport(dll)] 
private static extern void PerformInference(IntPtr inputData); 

[DllImport(dll)] 
private static extern void PopulateObjectsArray(IntPtr objects); 

[DllImport(dll)] 
private static extern int GetObjectCount(); 

[DllImport(dll)] 
private static extern void SetNMSThreshold(float threshold); 

[DllImport(dll)] 
private static extern void SetConfidenceThreshold(float threshold);

Add Private Variables

webcamDims: Sets the target resolution for a webcam to 1280x720
videoDims: Keeps track of the dimensions of the video screen for different methods in the script
targetDims: Set the default target dims to 640x640. However, this will get updated to maintain the aspect ratio for the current video source
imageDims: Stores the unpadded dimensions of the image being fed to the model
webcamTexture: Provides access to live video input from a webcam
videoTexture: Stores the current source video texture
rTex: Stores the texture used to create the input texture
inputTex: Contains the input texture that will be sent to the OpenVINO™ inference engine
performInference: Keeps track of whether to execute the OpenVINO™ model
webcamFPS: Sets the target frame rate for the webcam to 60fps
aspectRatioScale: Used to scale the input image dimensions while maintaining aspect ratio
currentDevice: Current compute device for OpenVINO™
inputData: Stores the raw pixel data for inputTex
objectInfoArray: This is the Object array that we will update using the PopulateObjectsArray() function in the plugin
boundingBoxes: Stores the bounding boxes for detected objects
deviceList: Parsed list of compute devices for OpenVINO™
openVINOPaths: File paths for the OpenVINO™ IR models
openvinoModels: Names of the OpenVINO™ IR model
videoNames: Names of the available video files
canvas: A reference to the canvas for the user interface
graphy: A reference to the Graphy on-screen metrics
width: A references to the width input field for the target image dimensions
height: A references to the height input field for the target image dimensions

Code:

// The requested webcam dimensions 
    private Vector2Int webcamDims = new Vector2Int(1280, 720); 
    // The dimensions of the current video source 
    private Vector2Int videoDims; 
    // The targrt resolution for input images 
    private Vector2Int targetDims = new Vector2Int(640, 640); 
    // The unpadded dimensions of the image being fed to the model 
    private Vector2Int imageDims = new Vector2Int(0, 0); 
    
    // Live video input from a webcam 
    private WebCamTexture webcamTexture; 

    // The source video texture 
    private RenderTexture videoTexture; 
    // The texture used to create the input texture
    private RenderTexture rTex; 

    // Contains the input texture that will be sent to the OpenVINO inference engine 
    private Texture2D inputTex; 
  
    // Keeps track of whether to execute the OpenVINO model 
    private bool performInference = true; 

    // The requested webcam frame rate 
    private int webcamFPS = 60; 

    // Used to scale the input image dimensions while maintaining aspect ratio 
    private float aspectRatioScale; 

    // Current compute device for OpenVINO 
    private string currentDevice; 

    // Stores the raw pixel data for inputTex 
    private byte[] inputData; 
    // Stores information about detected objects 
    private Utils.Object[] objectInfoArray; 

    // Stores the bounding boxes for detected objects 
    private List<BoundingBox> boundingBoxes = new List<BoundingBox>(); 
    // Parsed list of compute devices for OpenVINO 
    private List<string> deviceList = new List<string>(); 
    // File paths for the OpenVINO IR models 
    private List<string> openVINOPaths = new List<string>(); 
    // Names of the OpenVINO IR model 
    private List<string> openvinoModels = new List<string>(); 
    // Names of the available video files 
    private List<string> videoNames = new List<string>(); 

    // A reference to the canvas for the user interface 
    private GameObject canvas; 
    // A reference to the Graphy on-screen metrics 
    private GameObject graphy; 
    // References to input fields for the target image dimensions 
    private TMPro.TMP_InputField width; 
    private TMPro.TMP_InputField height;

Create Log() Method

Capture the console output using the Applications.logMessageReceived callback. We need to append a function call to this callback that updates consoleText with the latest console message. This function will be called every time a log message is received. All it does is append the latest console message to the existing text for consoleText.

Code:

/// <summary> 
/// Updates on-screen console text 
/// </summary> 
/// <param name="logString"></param> 
/// <param name="stackTrace"></param> 
/// <param name="type"></param> 
public void Log(string logString, string stackTrace, LogType type) 
{ 
    consoleText.text = consoleText.text + "\n " + logString; 
}

Define OnEnable() Method

Append the Log() method to the Applications.logMessageReceived callback in the OnEnable() method. OnEnable() method is called when the GameObject script attached becomes enabled.

Code:

// Called when the object becomes enabled and active 
void OnEnable() 
{ 
   Application.logMessageReceived += Log; 
}

Create InitializeVideoScreen() Method

Update the position, orientation, and size of the VideoScreen object in a new method called InitializeVideoScreen. The method will take in width and height values.

First, we will set the video player component to render to a RenderTexture and set videoTexture as the target texture.

The default shader assigned to the VideoScreen object needs to be replaced with an Unlit/Texture shader. This will remove the need for the screen to be lit by an in-game light.

Lastly, we will update the dimension and position of the VideoScreen object.

Code:

/// <summary> 
/// Prepares the videoScreen GameObject to display the chosen video source. 
/// </summary> 
/// <param name="width"></param> 
/// <param name="height"></param> 
/// <param name="mirrorScreen"></param> 
private void InitializeVideoScreen(int width, int height) 
{ 

     // Set the render mode for the video player 
     videoScreen.GetComponent<VideoPlayer>().renderMode = VideoRenderMode.RenderTexture; 

     // Use new videoTexture for Video Player 
     videoScreen.GetComponent<VideoPlayer>().targetTexture = videoTexture; 

     // Apply the new videoTexture to the VideoScreen Gameobject 
     videoScreen.gameObject.GetComponent<MeshRenderer>().material.shader = Shader.Find("Unlit/Texture"); 

videoScreen.gameObject.GetComponent<MeshRenderer>().material.SetTexture("_MainTex", videoTexture); 
    // Adjust the VideoScreen dimensions for the new videoTexture 
    videoScreen.localScale = new Vector3(width, height, videoScreen.localScale.z); 
    // Adjust the VideoScreen position for the new videoTexture 
    videoScreen.position = new Vector3(width / 2, height / 2, 1); 
}

Create InitializeWebcam() Method

When using a webcam, first initialize the webcamTexture with the target resolution and framerate. We will not know if the target resolution is supported by the webcam until we try playing it.

If there is no physical webcam available, Unity* may try using a virtual camera from applications like OBS. We can tell when this happens by checking the actual dimensions of the webcamTexture. If the resolution is set to 16x16, then it is safe to say something went wrong. When this happens, we will disable the webcam functionality.

If the webcamTexture is properly initialized, we will limit the target framerate to the same as the target frame rate for the webcam. This can help the application run more smoothly.

We will also disable the Video Player component. Lastly, we will update the values for videoDims with the final dimensions for the webcamTexture.

Code:

/// <summary> 
/// Try to initialize and start a webcam 
/// </summary> 
private void InitializeWebcam() 
{  
     // Create a new WebCamTexture 
     webcamTexture = new WebCamTexture(webcamDims.x, webcamDims.y, webcamFPS); 

     // Start the Camera 
     webcamTexture.Play(); 

     if (webcamTexture.width == 16) 
     {
        webcamTexture.Stop(); 
        Debug.Log("\nUnable to initialize a webcam. Disabling option.\n"); 
        useWebcam.isOn = false; 
        useWebcam.enabled = false; 
     } 
     else 
     { 
        // Limit application framerate to the target webcam framerate 
        Application.targetFrameRate = webcamFPS; 

        // Deactivate the Video Player 
        videoScreen.GetComponent<VideoPlayer>().enabled = false; 

        // Update the videoDims.y 
        videoDims.y = webcamTexture.height; 
        // Update the videoDims.x 
        videoDims.x = webcamTexture.width; 
     } 
}

Create InitializeCamera() Method

Once the VideoScreen has been updated, either for a video or webcam feed, we need to resize and reposition the in-game camera. We will do so in a new method called InitializeCamera.

We can access the Main Camera object with GameObject.Find("Main Camera"). We will set the X and Y coordinates to the same as the VideoScreen position.

The camera also needs to be set to orthographic mode to remove perspective.

Lastly, we need to update the size of the camera. The orthographicSize attribute is actually the half size, so we need to divide videoDims.y (i.e. the height) by 2 as well.

Code:

/// <summary> 
/// Resizes and positions the in-game Camera to accommodate the video dimensions 
/// </summary> 
private void InitializeCamera() 
{ 
   // Get a reference to the Main Camera GameObject 
   GameObject mainCamera = GameObject.Find("Main Camera"); 
   // Adjust the camera position to account for updates to the VideoScreen 
   mainCamera.transform.position = new Vector3(videoDims.x / 2, videoDims.y / 2, -10f); 
   // Render objects with no perspective (i.e. 2D) 
   mainCamera.GetComponent<Camera>().orthographic = true; 
   // Adjust the camera size to account for updates to the VideoScreen
   int orthographicSize; 
   if (((float)Screen.width / Screen.height) < ((float)videoDims.x / 
 videoDims.y)){ 
        float scale = ((float)Screen.width / Screen.height) / 
            ((float)videoDims.x / videoDims.y); 
        orthographicSize = (int)((videoDims.y / 2) / scale); 
    } 
    else 
    { 
        orthographicSize = (int)(videoDims.y / 2); 
    } 

    Debug.Log($"Orthographic Size: {orthographicSize}"); 
    mainCamera.GetComponent<Camera>().orthographicSize = orthographicSize; 
}

Create InitializeTextures() Method

Whenever the target input resolution is updated, we need to make sure that it maintains the same aspect ratio as the source video or webcam feed. Feeding a stretched or squashed image to the model could impact the model's accuracy.

Once we have adjusted the target input resolution, we can update rTex and inputTex with the new dimensions. We will also update the width and height input fields to make sure the user knows the adjusted resolution.

Code:

/// <summary> 
/// Calculate the dimensions for the input image 
/// </summary> 
/// <param name="newVideo"></param> 
private void InitializeTextures(bool newVideo = false) 
{ 
    if (newVideo) 
    { 
       // Calculate scale for new aspect ratio 
       int min = Mathf.Min(videoTexture.width, videoTexture.height); 
       int max = Mathf.Max(videoTexture.width, videoTexture.height); 
       aspectRatioScale = (float)min / max; 

       // Adjust the smallest input dimension to maintain the new aspect ratio 
       if (max == videoTexture.height) 
       { 
           imageDims.x = (int)(targetDims.y * aspectRatioScale); 
           imageDims.y = targetDims.y; 
       } 
       else 
       { 
           imageDims.y = (int)(targetDims.x * aspectRatioScale);
           imageDims.x = targetDims.x; 
       } 
   } 
   else 
   { 
       // Adjust the input dimensions to maintain the current aspect ratio 
       if (imageDims.x != targetDims.x) 
       { 
          imageDims.x = targetDims.x; 
          aspectRatioScale = (float)videoTexture.height / videoTexture.width; 
          imageDims.y = (int)(targetDims.x * aspectRatioScale); 
          targetDims.y = imageDims.y; 
       } 
       if (imageDims.y != targetDims.y) 
       { 
          imageDims.y = targetDims.y; 
          aspectRatioScale = (float)videoTexture.width / videoTexture.height; 
          imageDims.x = (int)(targetDims.y * aspectRatioScale); 
          targetDims.x = imageDims.x; 
        } 
    } 

    // Initialize the RenderTexture that will store the processed input image 
    rTex = RenderTexture.GetTemporary(imageDims.x, imageDims.y, 24, RenderTextureFormat.ARGB32); 
    // Update inputTex with the new dimensions 
    inputTex = new Texture2D(imageDims.x, imageDims.y, TextureFormat.RGBA32, false); 

    // Update the values for the width and height input fields 
    Debug.Log($"Setting Input Dims to W: {imageDims.x} x H: {imageDims.y}"); 
    width.text = $"{imageDims.x}"; 
    height.text = $"{imageDims.y}"; 
 }

Create InitializeDropDowns() Method

This method will simply clear the default options for the dropdown menus in the UI, add the current options, and select the first option in the list.

Code:

/// <summary> 
/// Initialize the options for the dropdown menus 
/// </summary> 
private void InitializeDropdowns() 
{

    // Remove default dropdown options 
    deviceDropdown.ClearOptions(); 
    // Add OpenVINO compute devices to dropdown 
    deviceDropdown.AddOptions(deviceList); 
    // Set the value for the dropdown to the current compute device 
    deviceDropdown.SetValueWithoutNotify(deviceList.IndexOf(currentDevice)); 

    // Remove default dropdown options 
    videoDropdown.ClearOptions(); 
    // Add OpenVINO models to menu 
    videoDropdown.AddOptions(videoNames); 
    // Select the first option in the dropdown 
    videoDropdown.SetValueWithoutNotify(0); 

    // Remove default dropdown options 
    modelDropdown.ClearOptions(); 
    // Add OpenVINO models to menu 
    modelDropdown.AddOptions(openvinoModels); 
    // Select the first option in the dropdown 
    modelDropdown.SetValueWithoutNotify(0); 
}

Create InitializeOpenVINO() Method

We will call the InitiOpenVINO() function from the plugin in a new method called InitializeOpenVINO(). The InitiOpenVINO() function will only be called when the inference toggle is checked.

Code:

/// <summary> 
/// Called when a model option is selected from the dropdown 
/// </summary> 
public void InitializeOpenVINO() 
{ 
      // Only initialize OpenVINO when performing inference 
      if (performInference == false) return; 

      Debug.Log("Initializing OpenVINO"); 
      Debug.Log($"Selected Model: {openvinoModels[modelDropdown.value]}"); 
      Debug.Log($"Selected Model Path: {openVINOPaths[modelDropdown.value]}"); 
      Debug.Log($"Setting Input Dims to W: {imageDims.x} x H: {imageDims.y}"); 
      Debug.Log("Uploading IR Model to Compute Device"); 

      // Set up the neural network for the OpenVINO inference engine 
      currentDevice = Marshal.PtrToStringAnsi(InitOpenVINO( 
           openVINOPaths[modelDropdown.value], 
           inputTex.width, 
           inputTex.height, 
           deviceDropdown.value));

      Debug.Log($"OpenVINO using: {currentDevice}"); 
}

Create InitializationSteps() Method

We will call the initialization methods in a new method called InitializationSteps(). This method will be called each time the model input is updated. This could be from changing the target input resolution, selecting a different video, or switching between using a webcam.

Code:

/// <summary> 
/// Perform the initialization steps required when the model input is updated 
/// </summary> 
private void InitializationSteps() 
{ 
    if (useWebcam.isOn) 
    { 
       // Initialize webcam 
       InitializeWebcam(); 
    } 
    else 
    { 
       Debug.Log($"Selected Video: {videoDropdown.value}"); 
 
       // Set Initial video clip 
       videoScreen.GetComponent<VideoPlayer>().clip = videoClips[videoDropdown.value]; 
       // Update the videoDims.y 
       videoDims.y = (int)videoScreen.GetComponent<VideoPlayer>().height; 
       // Update the videoDims.x 
       videoDims.x = (int)videoScreen.GetComponent<VideoPlayer>().width; 
    } 

    // Create a new videoTexture using the current video dimensions 
    videoTexture = RenderTexture.GetTemporary(videoDims.x, videoDims.y, 24, RenderTextureFormat.ARGB32); 

    // Initialize the videoScreen 
    InitializeVideoScreen(videoDims.x, videoDims.y); 
    // Adjust the camera based on the source video dimensions 
    InitializeCamera(); 
    // Initialize the textures that store the model input 
    InitializeTextures(true); 
    // Set up the neural network for the OpenVINO inference engine 
    InitializeOpenVINO(); 
}

Create GetOpenVINOModels()

Before performing the initialization steps, get a list of available OpenVINO™ models. The models folder from part 1 of this tutorial will be placed in the same directory as the executable for the Unity* application. Search through the subfolders to obtain the paths to the .xml model files.

The actual .xml files are all named yolox_10.xml, so use the names of their respective parent directories for the dropdown menu options.

Code:

/// <summary> 
/// Get the list of available OpenVINO models 
/// </summary> 
private void GetOpenVINOModels() 
{ 
    // Get the subdirectories containing the available models 
    string[] modelDirs = System.IO.Directory.GetDirectories("models"); 

    // Get the model files in each subdirectory 
    List<string> openVINOFiles = new List<string>(); 
    foreach (string dir in modelDirs) 
    { 
        openVINOFiles.AddRange(System.IO.Directory.GetFiles(dir)); 
    } 
    
    // Get the paths for the .xml files for each model 
    Debug.Log("Available OpenVINO Models:"); 
    foreach (string file in openVINOFiles) 
    { 
       if (file.EndsWith(".xml")) 
       { 
          openVINOPaths.Add(file); 
          string modelName = file.Split('\\')[1]; 
          openvinoModels.Add(modelName.Substring(0, modelName.Length)); 

          Debug.Log($"Model Name: {modelName}"); 
          Debug.Log($"File Path: {file}"); 
        } 
     } 
     Debug.Log(""); 
}

Define Start() Method

When the application starts, we will get references to the canvas, graphy, width, and height objects in the Hierarchy tab.

The OpenVINO™ plugin project has only been tested on Intel hardware, so we will confirm that Intel hardware is available. If it is, we will get the available OpenVINO™ models and call
the GetAvailableDevices() function to see what compute devices are available for OpenVINO™.

If there is no Intel hardware available, we will disable the option to perform inference, but still call the other initialization steps.

Code:

// Start is called before the first frame update 
void Start() 
{ 
    // Get references to GameObjects in hierarchy 
    canvas = GameObject.Find("Canvas"); 
    graphy = GameObject.Find("[Graphy]"); 
    width = GameObject.Find("Width").GetComponent<TMPro.TMP_InputField>(); 
    height = GameObject.Find("Height").GetComponent<TMPro.TMP_InputField>(); 

    // Check if either the CPU of GPU is made by Intel 
    string processorType = SystemInfo.processorType.ToString(); 
    string graphicsDeviceName = SystemInfo.graphicsDeviceName.ToString(); 
    if (processorType.Contains("Intel") || 
graphicsDeviceName.Contains("Intel")) 
    { 
         // Get the list of available models 
         GetOpenVINOModels(); 

         // Get an unparsed list of available 
         string openvinoDevices = Marshal.PtrToStringAnsi(GetAvailableDevices()); 

         Debug.Log($"Available Devices:"); 
         // Parse list of available compute devices 
         foreach (string device in openvinoDevices.Split(',')) 
         { 
              // Add device name to list 
              deviceList.Add(device); 
              Debug.Log(device); 
         } 
      } 
      else 
      { 
          inference.isOn = performInference = inference.enabled = false; 
          Debug.Log("No Intel hardware detected"); 
      } 

      // Get the names of the video clips 
      foreach (VideoClip clip in videoClips) videoNames.Add(clip.name); 

      // Initialize the dropdown menus 
      InitializeDropdowns();

      // Perform the required 
      InitializationSteps(); 
}

Create FlipImage() Method

Next, make a new method to execute the FlipXAxis() function in our ComputeShader. This method will take in the image that needs to be processed as well as a function name to indicate which function we want to execute.

Method Steps

Get the ComputeShader index for the specified function
Create a temporary RenderTexture with random write access enabled to store the processed image
Execute the ComputeShader
Copy the processed image back into the original RenderTexture
Release the temporary RenderTexture

Code:

/// <summary> 
/// Perform a flip operation of the GPU 
/// </summary> 
/// <param name="image">The image to be flipped</param> 
/// <param name="tempTex">Stores the flipped image</param> 
/// <param name="functionName">The name of the function to execute in the compute shader</param> private void FlipImage(RenderTexture image, string functionName) 
{ 
    // Specify the number of threads on the GPU 
    int numthreads = 4; 
    // Get the index for the PreprocessResNet function in the ComputeShader 
    int kernelHandle = imageProcessingShader.FindKernel(functionName); 

    /// Allocate a temporary RenderTexture 
    RenderTexture result = RenderTexture.GetTemporary(image.width, image.height, 24, image.format); 
     // Enable random write access 
     result.enableRandomWrite = true; 
     // Create the RenderTexture 
     result.Create(); 

     // Set the value for the Result variable in the ComputeShader 
     imageProcessingShader.SetTexture(kernelHandle, "Result", result); 
     // Set the value for the InputImage variable in the ComputeShader 
     imageProcessingShader.SetTexture(kernelHandle, "InputImage", image); 
     // Set the value for the height variable in the ComputeShader
     imageProcessingShader.SetInt("height", image.height); 
     // Set the value for the width variable in the ComputeShader 
     imageProcessingShader.SetInt("width", image.width); 
     

     // Execute the ComputeShader 
     imageProcessingShader.Dispatch(kernelHandle, image.width / numthreads, image.height / numthreads, 1); 
     

     // Copy the flipped image to tempTex 
     Graphics.Blit(result, image); 
     
     // Release the temporary RenderTexture 
     RenderTexture.ReleaseTemporary(result); 
}

Create OnCompleteReadback() Callback

We currently need to download the pixel data for tempTex from the GPU to the CPU. This normally causes a pipeline stall as Unity* prevents any execution on the main thread to prevent the data from changing before it has finished downloading to the CPU. This can cause a noticeable performance bottleneck that increases with the amount of pixel data to download.

Unity* provides an alternative approach with AsyncGPUReadback that does not block the main thread. However, it adds a few frames of latency. This may or may not matter depending on the specific application.

This function will be called once the AsyncGPUReadback has completed. We can load the raw pixel data from the request directly to inputTex.

Code:

/// <summary> 
/// Called once AsyncGPUReadback has been completed 
/// </summary> 
/// <param name="request"></param> 
void OnCompleteReadback(AsyncGPUReadbackRequest request) 
{ 
     if (request.hasError) 
     { Debug.Log("GPU readback error detected."); 
       return; 
     } 
     
     // Fill Texture2D with raw data from the AsyncGPUReadbackRequest 
     inputTex.LoadRawTextureData(request.GetData<uint>()); 
     // Apply changes to Texture2D 
     inputTex.Apply(); 
}

Create UploadTexture() Method

In this menthod, send the current pixel data from inputTex to the OpenVINO™ plugin. Use the unsafe keyword since we will create a pointer to the inputData array. Unity* does not allow unsafe code by default, so we need to enable it in the Project Settings.

Pin the memory for inputData using a fixed statement and get a pointer to the variable. Call the PerformInference() method with the pointer as input.

After executing the model, call the GetObjectCount() so that objectInfoArray can be initialized.

Lastly, we need to pin the memory for the objectInfoArray, before calling the PopulateObjectsArray() function.

Code:

/// <summary> 
/// Pin memory for the input data and send it to OpenVINO for inference 
/// </summary> 
/// <param name="inputData"></param> 
public unsafe void UploadTexture(byte[] inputData) 
{ 
   //Pin Memory 
   fixed (byte* p = inputData) 
   { 
       // Perform inference 
       PerformInference((IntPtr)p); 
   } 

   // Get the number of detected objects 
   int numObjects = GetObjectCount(); 
   // Initialize the array 
   objectInfoArray = new Utils.Object[numObjects]; 

   // Pin memory 
   fixed (Utils.Object* o = objectInfoArray) 
   { 
       // Get the detected objects 
       PopulateObjectsArray((IntPtr)o); 
   } 
}

Create UpdateBoundingBoxes() Method

After getting the object info for the current output predictions, update the boundingBoxes array with the new info. The position and dimensions for the bounding boxes need to be scaled based on the dimensions of the videoTexture. Flip the bounding box coordinates vertically.

Since, the model does not keep track of unique objects across video frames, reuse the existing bounding box objects. Add only the new bounding boxes as needed.

We are unlikely to need the same number of bounding boxes all the time, so we will deactivate any extras.

Code:

/// <summary> 
/// Update the list of bounding boxes based on the latest output from the model 
/// </summary> 
private void UpdateBoundingBoxes() 
{ 
    // Process new detected objects 
    for (int i = 0; i < objectInfoArray.Length; i++) 
    { 
       // The smallest dimension of the videoTexture 
       int minDimension = Mathf.Min(videoTexture.width, videoTexture.height); 

       // The value used to scale the bbox locations up to the source resolution 
       float scale = (float)minDimension / Mathf.Min(imageDims.x, imageDims.y); 

       // Flip the bbox coordinates vertically 
       objectInfoArray[i].y0 = rTex.height - objectInfoArray[i].y0; 

       objectInfoArray[i].x0 *= scale; 
       objectInfoArray[i].y0 *= scale; 
       objectInfoArray[i].width *= scale; 
       objectInfoArray[i].height *= scale; 

       // Update bounding box list with new object info 
       try 
       { 
          boundingBoxes[i].SetObjectInfo(objectInfoArray[i]); 
       } 
       catch 
       { 
           // Add a new bounding box object when needed 
           boundingBoxes.Add(new BoundingBox(objectInfoArray[i])); 
       } 
    } 
    
    // Turn off extra bounding boxes 
    for (int i = 0; i < boundingBoxes.Count; i++) 
    { 

       if (i > objectInfoArray.Length - 1)
       { 
           boundingBoxes[i].ToggleBBox(false); 
       } 
    } 
}

Define Update() Method

The UI and performance metrics can take up a lot of the screen. Toggle the visibility of the UI and metrics using the SetActive() method.

If we are using a webcam, we need to copy the pixel data from the webcamTexture to the videoTexture.

When not performing inference, just skip the rest of this method.

While performing inference, we need to copy the videoTexture to rTex so that we don't manipulate the video feed displayed to the viewers.

Flip the image, download the pixel data from the CPU to GPU, execute the model, and update the bounding boxes.

Code:

// Update is called once per frame 
void Update() 
{ 
    // Toggle the user interface 
    if (Input.GetKeyDown("space")) 
    { 
       canvas.SetActive(!canvas.activeInHierarchy); 
       graphy.SetActive(!graphy.activeInHierarchy); 
    } 
   
    // Copy webcamTexture to videoTexture 
    if using webcam if (useWebcam.isOn) Graphics.Blit(webcamTexture, videoTexture); 

    // Toggle whether to perform inference 
    if (performInference == false) return; 

    // Copy the videoTexture to the rTex RenderTexture 
    Graphics.Blit(videoTexture, rTex); 

    // Flip image before sending to DLL 
    FlipImage(rTex, "FlipXAxis"); 

    // Download pixel data from GPU to CPU 
    if (useAsync.isOn) 
    {
         AsyncGPUReadback.Request(rTex, 0, TextureFormat.RGBA32, 
OnCompleteReadback); 
    } 
    else 
    { 
        RenderTexture.active = rTex; 
        inputTex.ReadPixels(new Rect(0, 0, rTex.width, rTex.height), 0, 0); 
        inputTex.Apply(); 
    } 

    // Send reference to inputData to DLL 
    UploadTexture(inputTex.GetRawTextureData()); 

    // Update bounding boxes with new object info 
    UpdateBoundingBoxes(); 
}

Create UpdateInputDims() Method

This method will be called when the input dimensions are updated from the user interface. We first need to parse the user input for the width and height input fields and use the values to update targetDims. We can then call InitializeTextures() and InitializeOpenVINO().

Code:

/// <summary> 
/// Called when the input dimensions are updated in the GUI 
/// </summary> 
public void UpdateInputDims() 
{ 
    // Pares the new width value 
    int newWidth; 
    int.TryParse(width.text, out newWidth); 
    // Parse the new height value 
    int newHeight; 
    int.TryParse(height.text, out newHeight); 
    // Update target dims 
    targetDims = new Vector2Int(newWidth, newHeight); 
    // Initialize the textures that store the model input 
    InitializeTextures(); 
    // Set up the neural network for the OpenVINO inference engine 
    InitializeOpenVINO(); 
}

Create UpdateInferenceValue() Method

This method will be called when the user interacts with the inference toggle. If the toggle is turned on, we will call the InitializeOpenVINO() method. Otherwise, we will disable all the bounding boxes.

Code:

/// <summary> 
/// Called when the value for the Inference toggle is updated 
/// </summary> 
public void UpdateInferenceValue() 
{ 
    // Only update the performInference value if the canvas is active 
    performInference = inference.isOn; 

    if (performInference) 
    { 
        InitializeOpenVINO(); 
    } 
    else 
    { 
       // Hide all bounding boxes when not performing inference 
       for (int i = 0; i < boundingBoxes.Count; i++) 
       { 
          boundingBoxes[i].ToggleBBox(false); 
       } 
    } 
}

Create UpdateNMSThreshold() Method

This method will be called when the NMS threshold value is updated. It simply parses the user input and then calls the SetNMSThreshold() function in the plugin.

Code:

/// <summary> 
/// Called when the NMS threshold value is updated in the GUI 
/// </summary> 
/// <param name="inputField"></param> 
public void UpdateNMSThreshold(TMPro.TMP_InputField inputField) 
{ 
    // Parse the input field value 
    float threshold; 
    float.TryParse(inputField.text, out threshold); 
    // Clamp threshold value between 0 and 1 
    threshold = Mathf.Min(threshold, 1f); 
    threshold = Mathf.Max(0f, threshold); 
    // Update the threshold value 
    inputField.text = $"{threshold}";
    SetNMSThreshold(threshold); 
}

Create UpdateConfidenceThreshold() Method

Likewise, this method is called when the confidence score threshold is updated, and calls the SetConfidenceThreshold() function in the plugin.

Code:

/// <summary> 
/// Called when the confidence threshold is updated in the GUI 
/// </summary> 
/// <param name="inputField"></param> 
public void UpdateConfThreshold(TMPro.TMP_InputField inputField) 
{ 
    // Parse the input field value 
    float threshold; 
    float.TryParse(inputField.text, out threshold); 
    // Clamp threshold value between 0 and 1 
    threshold = Mathf.Min(threshold, 1f); 
    threshold = Mathf.Max(0f, threshold); 
    // Update the threshold value 
    inputField.text = $"{threshold}"; 
    SetConfidenceThreshold(threshold); 
}

Create UpdateVideo() Method

This method will be called when a new video is selected from the video dropdown menu. We will only call the InitializationSteps() method while using the video player and not a webcam.

Code:

/// <summary> 
/// Called when a model option is selected from the dropdown 
/// </summary> 
public void UpdateVideo() 
{ 
   if (videoScreen.GetComponent<VideoPlayer>().enabled == false) return; 
   
   Debug.Log($"Selected Video: {videoDropdown.value}"); 
   InitializationSteps(); 
}

Create UseWebcam() Method

This method will be called when the user interacts with the useWebcam toggle. If the user wants to use a webcam, first confirm that there is a webcam available. If there is not, we will disable the option to use a webcam.

If the user does not want to use a webcam, we will stop the webcam feed and re-enable the video player.

We need to call the InitializationSteps() method whether we are starting or stopping the webcam.

Code:

/// <summary> 
/// Called when the value for the Use Webcam toggle is updated 
/// </summary> 
public void UseWebcam() 
{ 
    if (useWebcam.isOn) 
    { 
        WebCamDevice[] devices = WebCamTexture.devices; 
        for (int i = 0; i < devices.Length; i++) 
        { 
            Debug.Log(devices[i].name); 
        } 
    
        if (WebCamTexture.devices.Length == 0) 
        { 
            Debug.Log("No webcam device detected."); 
            useWebcam.SetIsOnWithoutNotify(false); 
        } 
     } 
     else 
     { 
         // Stop the webcam webcamTexture.Stop(); 
         // Activate the Video Player 
         videoScreen.GetComponent<VideoPlayer>().enabled = true; 
     } 

     InitializationSteps(); 
}

Define OnDestroy() Method

When the script stops running, we will remove the Log() method from the logMessageReceived callback.

Code:

// Called when the MonoBehaviour will be destroyed 
private void OnDestroy() 
{ 
    Application.logMessageReceived -= Log; 
}

Create Quit() Method

This method will be called when the Quit button is clicked in the user interface and will cause the application to exit.

Code:

/// <summary> 
/// Called when the Quit button is clicked. 
/// </summary> 
public void Quit() 
{ 
    // Causes the application to exit 
    Application.Quit(); 
}

That takes care of the required code for this project.

Attach Script to GameObject

To use the ObjectDetector script, attach it to a GameObject. Right-click an empty space in the Hierarchy tab and select Create Empty. Name the new object Object Detector.

With the object still selected, drag and drop the ObjectDetector script into the Hierarchy tab.

Now we can drag and drop the objects in the Hierarchy tab into their associated spots in the Inspector tab.

Next, we will need to click the small lock button at the top of the Inspector tab to keep the Object Detector selected.

Now we can add the video files by selecting all of them in the Assets section and dragging them onto the Video Clips slot in the Inspector tab. We can unlock the Inspector tab after adding the videos.

Assign UI Events

The last step needed before we can build the project is to assign the UI events.

Input Dims

Open the UpdateInputDims container inside the Canvas object and select Width.

In the Inspector tab, scroll down to On End Edit.

Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot.

Click on the No Function dropdown menu and open the ObjectDetector section. Select the UpdateInputDims() method from the list.

Perform the same steps for the Height object.

NMS Threshold

Open the UpdateNMSThreshold container inside the Canvas object and select Threshold.

In the Inspector tab, scroll down to On End Edit. Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot. This time, select the UpdateNMSThreshold(TMP_InputField) option from the function menu.

Confidence Threshold

Open the UpdateConfidenceThreshold container inside the Canvas object and select Threshold. In the Inspector tab, scroll down to On End Edit. Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot. This time, select the UpdateConfThreshold(TMP_InputField) option from the function menu.

Device Dropdown

Select the Device object inside the Canvas. In the Inspector tab, scroll down to On Value Changed (Int32). Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot. This time, select the InitializeOpenVINO() option from the function menu.

Model Dropdown

Select the Model object inside the Canvas. In the Inspector tab, scroll down to On Value Changed (Int32). Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot. Again, select the InitializeOpenVINO() option from the function menu.

Video Dropdown

Select the Video object inside the Canvas. In the Inspector tab, scroll down to On Value Changed (Int32). Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot. This time, select the UpdateVideo() option from the function menu.

Webcam Toggle

Select the Webcam object inside the Canvas. In the Inspector tab, scroll down to On Value Changed (Boolean). Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot. This time, select the UseWebcam() option from the function menu.

Inference

Select the Inference object inside the Canvas. In the Inspector tab, scroll down to On Value Changed (Boolean). Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot. This time, select the UpdateInferenceValue() option from the function menu.

Quit

Select the Quit object inside the Canvas. In the Inspector tab, scroll down to On Click(). Drag and drop the Object Detector object from the Hierarchy tab onto the None (Object) slot. This time, select the Quite() option from the function menu.

Build the Project

Now, build the completed project. First, press Ctrl+s to save the project. Open the File menu and select Build Settings...

Click build in the popup window. You will be prompted to select a folder to store the files generated during the build.

Create a new folder in the default location and name it Build. Click Select Folder.

Once the build is complete, a File Explorer window will open with the project executable selected.

Add Models Folder

Copy and paste models folder from part 1 into the folder with the project executable.

Add the Plugins folder

Open the OpenVINO_YOLOX_Demo_Data folder inside the Build directory. Copy and paste the Plugins folder from part 2.

Run the Application

At last, we can test our project. Double-click the executable to run it. Remember, that when the application will launch for the first time, it will be slow as the cache files are generated.

Conclusion

We now have a general workflow for implementing real-time object detection with the OpenVINO™ Toolkit inside the Unity* game engine. The performance improvements from using OpenVINO™ make the larger variants of the YOLOX model usable without high-end GPUs. The XMX engines in upcoming ARC GPUs will make it possible to run these models without taking away compute resources from the main application.

Next Steps

As mentioned previously, these models can be trained for a wide variety of applications. They can even be combined with other models to add new capabilities like tracking unique objects across multiple frames.

Instructions for training these models on custom datasets can be found in the official documentation. Alternatively, online services like roboflow can make training more convenient. Trained models need to be converted from PyTorch to ONNX (instructions), before being converted to OpenVINO (instructions).

Corresponding Tutorial Links

Developing OpenVINO™ Object Detection Inferencing Plugin for Unity* Tutorial: Part 1 – Setup

Developing OpenVINO™ Object Detection Inferencing Plugin for Unity* Tutorial: Part 2 – Creating the DLL

Project Resources

GitHub Repository

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Developing OpenVINO™ Object Detection Inferencing Plugin for Unity* Tutorial: Part 3 – DLL Interface to Unity*

Overview

Create New Project

Add GUI

Import Canvas Prefab

Install Graphy Package

Create Video Player

Create the Videos Folder

Add Video Files

Create the Video Screen

Add Video Player Component

Make the Video Loop

Add Label Canvas

Enable Unsafe Code

Include Unlit Shaders

Set Background Color

Set Up Project Code

ImageProcessing Compute Shader

Create the Asset File

Remove the Default Codez

Create FlipXAxis() Function

Create Utils Script

Create the Asset File

Required Namespaces

Object Struct

object_classes Tuple

Create BoundingBox Script

Required Namespaces

Properties

InitializeLabel()

ToggleBBox()

InitializeBBox()

SetObjectInfo()

Constructor

Create ObjectDetector Script

Add Required Namespaces

Add Public Variables

Create DLL Method Declarations

Add Private Variables

Create Log() Method

Define OnEnable() Method

Create InitializeVideoScreen() Method

Create InitializeWebcam() Method

Create InitializeCamera() Method

Create InitializeTextures() Method

Create InitializeDropDowns() Method

Create InitializeOpenVINO() Method

Create InitializationSteps() Method

Create GetOpenVINOModels()

Define Start() Method

Create FlipImage() Method

Create OnCompleteReadback() Callback

Create UploadTexture() Method

Create UpdateBoundingBoxes() Method

Define Update() Method

Create UpdateInputDims() Method

Create UpdateInferenceValue() Method

Create UpdateNMSThreshold() Method

Create UpdateConfidenceThreshold() Method

Create UpdateVideo() Method

Create UseWebcam() Method

Define OnDestroy() Method

Create Quit() Method

Attach Script to GameObject

Assign UI Events

Input Dims

NMS Threshold

Confidence Threshold

Device Dropdown

Model Dropdown

Video Dropdown

Webcam Toggle

Inference

Quit

Build the Project