Harlinn.AI.ONNX.DirectML

Harlinn.AI.ONNX.DirectML executes an AI workload asynchronously using DirectML. Currently a successful experiment, demonstrating how the ONNX runtime, with the DirectML Execution Provider, can be used to to run inference on the GPU while, at the same time, rendering a video smoothly on the same GPU.

This is demonstrated by the Yolo9 example app, which is a modified version of yolov9_npu.

The yolov9_npu app didn’t perform as well as expected, since the inference load interferes with the rendering of the video in a timely fashion.

This experiment demonstrates that Direct X workloads can execute concurrently with good performance. It also demonstrates an execution model that should enable concurrent execution of different inference workloads using an active object design pattern based approach.

This provides an elegant solution enabling execution of multiple ONNX sessions, each with their own dedicated thread of execution.

The Yolo9 displays available metadata and type information about the inputs and outputs of the model.