Hardware Acceleration#

oneVPL provides a new model for working with hardware acceleration while continuing to support hardware acceleration in legacy mode.

New Model to Work with Hardware Acceleration#

oneVPL API version 2.0 introduces a new memory model: internal allocation where oneVPL is responsible for video memory allocation. In this mode, an application is not dependent on a low-level video framework API, such as DirectX* or the VA API, and does not need to create and set corresponding low-level oneVPL primitives such as ID3D11Device or VADisplay. Instead, oneVPL creates all required objects to work with hardware acceleration and video surfaces internally. An application can get access to these objects using MFXVideoCORE_GetHandle() or with help of the mfxFrameSurfaceInterface interface.

This approach simplifies the oneVPL initialization, making calls to the MFXVideoENCODE_QueryIOSurf(), MFXVideoDECODE_QueryIOSurf(), or MFXVideoVPP_QueryIOSurf() functions optional. See Internal Memory Management.

Note

Applications can set device handle before session creation through MFXSetConfigFilterProperty() like shown in the code below:

mfxLoader loader = MFXLoad();
mfxConfig config1 = MFXCreateConfig(loader);
mfxConfig config2 = MFXCreateConfig(loader);
mfxSession session;

mfxVariant HandleType;
HandleType.Type = MFX_VARIANT_TYPE_U32;
HandleType.Data.U32 = MFX_HANDLE_VA_DISPLAY;
MFXSetConfigFilterProperty(config1, (mfxU8*)"mfxHandleType", HandleType);

mfxVariant DisplayHandle;
DisplayHandle.Type = MFX_VARIANT_TYPE_PTR;
HandleType.Data.Ptr = vaDisplay;
MFXSetConfigFilterProperty(config2, (mfxU8*)"mfxHDL", DisplayHandle);

MFXCreateSession(loader, 0, &session);

Work with Hardware Acceleration in Legacy Mode#

Work with Multiple Media Devices#

If your system has multiple graphics adapters, you may need hints on which adapter is better suited to process a particular workload. The legacy mode of oneVPL provides a helper API to select the most suitable adapter for your workload based on the provided workload description. The following example shows workload initialization on a discrete adapter:

mfxU32 num_adapters_available;
mfxIMPL impl;

// Query number of graphics adapters available on system
mfxStatus sts = MFXQueryAdaptersNumber(&num_adapters_available);
MSDK_CHECK_STATUS(sts, "MFXQueryAdaptersNumber failed");

// Allocate memory for response
std::vector<mfxAdapterInfo> displays_data(num_adapters_available);
mfxAdaptersInfo adapters = { displays_data.data(), mfxU32(displays_data.size()), 0u, {0} };

// Query information about all adapters (mind that first parameter is NULL)
sts = MFXQueryAdapters(nullptr, &adapters);
MSDK_CHECK_STATUS(sts, "MFXQueryAdapters failed");

// Find dGfx adapter in list of adapters
auto idx_d = std::find_if(adapters.Adapters, adapters.Adapters + adapters.NumActual,
    [](const mfxAdapterInfo info)
{
   return info.Platform.MediaAdapterType == mfxMediaAdapterType::MFX_MEDIA_DISCRETE;
});

// No dGfx in list
if (idx_d == adapters.Adapters + adapters.NumActual)
{
   printf("Warning: No dGfx detected on machine\n");
   return -1;
}

mfxU32 idx = static_cast<mfxU32>(std::distance(adapters.Adapters, idx_d));

// Choose correct implementation for discrete adapter
switch (adapters.Adapters[idx].Number)
{
case 0:
   impl = MFX_IMPL_HARDWARE;
   break;
case 1:
   impl = MFX_IMPL_HARDWARE2;
   break;
case 2:
   impl = MFX_IMPL_HARDWARE3;
   break;
case 3:
   impl = MFX_IMPL_HARDWARE4;
   break;

default:
   // Try searching on all display adapters
   impl = MFX_IMPL_HARDWARE_ANY;
   break;
}
printf("Choosen implementation: %d\n", impl);
// Initialize mfxSession in regular way with obtained implementation.

The example shows that after obtaining the adapter list with MFXQueryAdapters(), further initialization of mfxSession is performed in the regular way. The specific adapter is selected using the MFX_IMPL_HARDWARE, MFX_IMPL_HARDWARE2, MFX_IMPL_HARDWARE3, or MFX_IMPL_HARDWARE4 values of mfxIMPL.

The following example shows the use of MFXQueryAdapters() for querying the most suitable adapter for a particular encode workload:

mfxU32 num_adapters_available;
mfxIMPL impl;
mfxVideoParam Encode_mfxVideoParam;

// Query number of graphics adapters available on system
mfxStatus sts = MFXQueryAdaptersNumber(&num_adapters_available);
MSDK_CHECK_STATUS(sts, "MFXQueryAdaptersNumber failed");

// Allocate memory for response
std::vector<mfxAdapterInfo> displays_data(num_adapters_available);
mfxAdaptersInfo adapters = { displays_data.data(), mfxU32(displays_data.size()), 0u, {0} };

// Fill description of Encode workload
mfxComponentInfo interface_request = { MFX_COMPONENT_ENCODE, Encode_mfxVideoParam, {0} };

// Query information about suitable adapters for Encode workload described by Encode_mfxVideoParam
sts = MFXQueryAdapters(&interface_request, &adapters);

if (sts == MFX_ERR_NOT_FOUND)
{
   printf("Error: No adapters on machine capable to process desired workload\n");
   return -1;
}

MSDK_CHECK_STATUS(sts, "MFXQueryAdapters failed");

// Choose correct implementation for discrete adapter. Mind usage of index 0, this is best suitable adapter from MSDK perspective
switch (adapters.Adapters[0].Number)
{
case 0:
   impl = MFX_IMPL_HARDWARE;
   break;
case 1:
   impl = MFX_IMPL_HARDWARE2;
   break;
case 2:
   impl = MFX_IMPL_HARDWARE3;
   break;
case 3:
   impl = MFX_IMPL_HARDWARE4;
   break;

default:
   // Try searching on all display adapters
   impl = MFX_IMPL_HARDWARE_ANY;
   break;
}

printf("Choosen implementation: %d\n", impl);

// Initialize mfxSession in regular way with obtained implementation

See the MFXQueryAdapters() description for adapter priority rules.

Work with Video Memory#

To fully utilize the oneVPL acceleration capability, the application should support OS specific infrastructures. If using Microsoft* Windows*, the application should support Microsoft DirectX*. If using Linux*, the application should support the VA API for Linux.

The hardware acceleration support in an application consists of video memory support and acceleration device support.

Depending on the usage model, the application can use video memory at different stages in the pipeline. Three major scenarios are shown in the following diagrams:

digraph {
rankdir=LR;
labelloc="t";
label="oneVPL functions interconnection";
F1 [shape=octagon label="oneVPL Function"];
F2 [shape=octagon label="oneVPL Function"];
F1->F2 [ label="Video Memory" ];
}

digraph {
rankdir=LR;
labelloc="t";
label="Video memory as output";
F3 [shape=octagon label="oneVPL Function"];
F4 [shape=octagon label="Application" fillcolor=lightgrey];
F3->F4 [ label="Video Memory" ];
}

digraph {
rankdir=LR;
labelloc="t";
label="Video memory as input";
F5 [shape=octagon label="Application"];
F6 [shape=octagon label="oneVPL Function"];
F5->F6 [ label="Video Memory" ];
}

The application must use the mfxVideoParam::IOPattern field to indicate the I/O access pattern during initialization. Subsequent function calls must follow this access pattern. For example, if a function operates on video memory surfaces at both input and output, the application must specify the access pattern IOPattern at initialization in MFX_IOPATTERN_IN_VIDEO_MEMORY for input and MFX_IOPATTERN_OUT_VIDEO_MEMORY for output. This particular I/O access pattern must not change inside the Init - Close sequence.

Initialization of any hardware accelerated oneVPL component requires the acceleration device handle. This handle is also used by the oneVPL component to query hardware capabilities. The application can share its device with oneVPL by passing the device handle through the MFXVideoCORE_SetHandle() function. It is recommended to share the handle before any actual usage of oneVPL.

Work with Microsoft DirectX* Applications#

oneVPL supports two different infrastructures for hardware acceleration on the Microsoft Windows OS: the Direct3D* 9 DXVA2 and Direct3D 11 Video API. If Direct3D 9 DXVA2 is used for hardware acceleration, the application should use the IDirect3DDeviceManager9 interface as the acceleration device handle. If the Direct3D 11 Video API is used for hardware acceleration, the application should use the ID3D11Device interface as the acceleration device handle.

The application should share one of these interfaces with oneVPL through the MFXVideoCORE_SetHandle() function. If the application does not provide the interface, then oneVPL creates its own internal acceleration device. As a result, oneVPL input and output will be limited to system memory only for the external allocation mode, which will reduce oneVPL performance. If oneVPL fails to create a valid acceleration device, then oneVPL cannot proceed with hardware acceleration and returns an error status to the application.

Note

It is recommended to work in the internal allocation mode if the application does not provide the IDirect3DDeviceManager9 or ID3D11Device interface.

The application must create the Direct3D 9 device with the flag D3DCREATE_MULTITHREADED. The flag D3DCREATE_FPU_PRESERVE is also recommended. This influences floating-point calculations, including PTS values.

The application must also set multi-threading mode for the Direct3D 11 device. The following example shows how to set multi-threading mode for a Direct3D 11 device:

ID3D11Device            *pD11Device;
ID3D11DeviceContext     *pD11Context;
ID3D10Multithread       *pD10Multithread;

pD11Device->GetImmediateContext(&pD11Context);
pD11Context->QueryInterface(IID_ID3D10Multithread, &pD10Multithread);
pD10Multithread->SetMultithreadProtected(true);

During hardware acceleration, if a Direct3D “device lost” event occurs, the oneVPL operation terminates with the mfxStatus::MFX_ERR_DEVICE_LOST return status. If the application provided the Direct3D device handle, the application must reset the Direct3D device.

When the oneVPL decoder creates auxiliary devices for hardware acceleration, it must allocate the list of Direct3D surfaces for I/O access, also known as the surface chain, and pass the surface chain as part of the device creation command. In most cases, the surface chain is the frame surface pool mentioned in the Frame Surface Locking section.

The application passes the surface chain to the oneVPL component Init function through a oneVPL external allocator callback. See the Memory Allocation and External Allocators section for details.

Only the decoder Init function requests the external surface chain from the application and uses it for auxiliary device creation. Encoder and VPP Init functions may only request internal surfaces. See the ExtMemFrameType enumerator for more details about different memory types.

Depending on configuration parameters, oneVPL requires different surface types. It is strongly recommended to call the MFXVideoENCODE_QueryIOSurf() function, the MFXVideoDECODE_QueryIOSurf() function, or the MFXVideoVPP_QueryIOSurf() function to determine the appropriate type in the external allocation mode.

Work with VA API Applications#

oneVPL supports the VA API infrastructure for hardware acceleration on Linux. The application should use the VADisplay interface as the acceleration device handle for this infrastructure and share it with oneVPL through the MFXVideoCORE_SetHandle() function.

The following example shows how to obtain the VA display from the X Window System:

Display   *x11_display;
VADisplay va_display;

x11_display = XOpenDisplay(current_display);
va_display  = vaGetDisplay(x11_display);

MFXVideoCORE_SetHandle(session, MFX_HANDLE_VA_DISPLAY, (mfxHDL) va_display);

The following example shows how to obtain the VA display from the Direct Rendering Manager:

int card;
VADisplay va_display;

card = open("/dev/dri/card0", O_RDWR); /* primary card */
va_display = vaGetDisplayDRM(card);
vaInitialize(va_display, &major_version, &minor_version);

MFXVideoCORE_SetHandle(session, MFX_HANDLE_VA_DISPLAY, (mfxHDL) va_display);

When the oneVPL decoder creates a hardware acceleration device, it must allocate the list of video memory surfaces for I/O access, also known as the surface chain, and pass the surface chain as part of the device creation command. The application passes the surface chain to the oneVPL component Init function through a oneVPL external allocator callback. See the Memory Allocation and External Allocators section for details. Starting from oneVPL API version 2.0, oneVPL creates its own surface chain if an external allocator is not set. See the :ref`New Model to work with Hardware Acceleration <hw-acceleration>` section for details.

Note

The VA API does not define any surface types and the application can use either MFX_MEMTYPE_VIDEO_MEMORY_DECODER_TARGET or MFX_MEMTYPE_VIDEO_MEMORY_PROCESSOR_TARGET to indicate data in video memory.

oneAPI Specification 1.2-provisional-rev-1 documentation

Hardware Acceleration

Contents

Hardware Acceleration#

New Model to Work with Hardware Acceleration#

Work with Hardware Acceleration in Legacy Mode#

Work with Multiple Media Devices#

Work with Video Memory#

Work with Microsoft DirectX* Applications#

Work with VA API Applications#