Decoding Procedures#

There are several approaches to decode video frames. The first one is based on the internal allocation mechanism presented here:

MFXVideoDECODE_Init(session, &init_param);
sts=MFX_ERR_MORE_DATA;
for (;;) {
   if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
      append_more_bitstream(bitstream);
   bits=(end_of_stream())?NULL:bitstream;
   sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,NULL,&disp,&syncp);
   if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
   // skipped other error handling
   if (sts==MFX_ERR_NONE) {
      disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
      do_something_with_decoded_frame(disp);
      disp->FrameInterface->Release(disp);
   }
}
MFXVideoDECODE_Close(session);

Note the following key points about the example:

The application calls the MFXVideoDECODE_DecodeFrameAsync() function for a decoding operation with the bitstream buffer (bits), frame surface is allocated internally by the library.

Attention

As shown in the example above starting with API version 2.0, the application can provide NULL as the working frame surface that leads to internal memory allocation.
If decoding output is not available, the function returns a status code requesting additional bitstream input as follows:
- mfxStatus::MFX_ERR_MORE_DATA: The function needs additional bitstream input. The existing buffer contains less than a frame’s worth of bitstream data.
Upon successful decoding, the MFXVideoDECODE_DecodeFrameAsync() function returns mfxStatus::MFX_ERR_NONE. However, the decoded frame data (identified by the surface_out pointer) is not yet available because the MFXVideoDECODE_DecodeFrameAsync() function is asynchronous. The application must use the MFXVideoCORE_SyncOperation() or mfxFrameSurfaceInterface::Synchronize to synchronize the decoding operation before retrieving the decoded frame data.
At the end of the bitstream, the application continuously calls the MFXVideoDECODE_DecodeFrameAsync() function with a NULL bitstream pointer to drain any remaining frames cached within the decoder until the function returns mfxStatus::MFX_ERR_MORE_DATA.
When application completes the work with frame surface, it must call release to avoid memory leaks.

The next example demonstrates how applications can use internally pre-allocated chunk of video surfaces:

MFXVideoDECODE_QueryIOSurf(session, &init_param, &request);
MFXVideoDECODE_Init(session, &init_param);
for (int i = 0; i < request.NumFrameSuggested; i++) {
    MFXMemory_GetSurfaceForDecode(session, &work);
    add_surface_to_pool(work);
}
sts=MFX_ERR_MORE_DATA;
for (;;) {
   if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
      append_more_bitstream(bitstream);
   bits=(end_of_stream())?NULL:bitstream;
   // application logic to distinguish free and busy surfaces
   find_free_surface_from_the_pool(&work);
   sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,work,&disp,&syncp);
   if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
   // skipped other error handling
   if (sts==MFX_ERR_NONE) {
      disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
      do_something_with_decoded_frame(disp);
      disp->FrameInterface->Release(disp);
   }
}
for (int i = 0; i < request.NumFrameSuggested; i++) {
    get_next_surface_from_pool(&work);
    work->FrameInterface->Release(work);
}
MFXVideoDECODE_Close(session);

Here the application should use the MFXVideoDECODE_QueryIOSurf() function to obtain the number of working frame surfaces required to reorder output frames. It is also required that MFXMemory_GetSurfaceForDecode() call is done after decoder initialization. In the MFXVideoDECODE_DecodeFrameAsync() the oneVPL library increments reference counter of incoming surface frame so it is required that the application releases frame surface after the call.

Another approach to decode frames is to allocate video frames on-fly with help of MFXMemory_GetSurfaceForDecode() function, feed the library and release working surface after MFXVideoDECODE_DecodeFrameAsync() call.

Attention

Please pay attention on two release calls for surfaces: after MFXVideoDECODE_DecodeFrameAsync() to decrease reference counter of working surface returned by MFXMemory_GetSurfaceForDecode(). After MFXVideoCORE_SyncOperation() to decrease reference counter of output surface returned by MFXVideoDECODE_DecodeFrameAsync().

MFXVideoDECODE_Init(session, &init_param);
sts=MFX_ERR_MORE_DATA;
for (;;) {
   if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
      append_more_bitstream(bitstream);
   bits=(end_of_stream())?NULL:bitstream;
   MFXMemory_GetSurfaceForDecode(session, &work);
   sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,work,&disp,&syncp);
   work->FrameInterface->Release(work);
   if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
   // skipped other error handling
   if (sts==MFX_ERR_NONE) {
      disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
      do_something_with_decoded_frame(disp);
      disp->FrameInterface->Release(disp);
   }
}
MFXVideoDECODE_Close(session);

The following pseudo code shows the decoding procedure according to the legacy mode with external video frames allocation:

MFXVideoDECODE_DecodeHeader(session, bitstream, &init_param);
MFXVideoDECODE_QueryIOSurf(session, &init_param, &request);
allocate_pool_of_frame_surfaces(request.NumFrameSuggested);
MFXVideoDECODE_Init(session, &init_param);
sts=MFX_ERR_MORE_DATA;
for (;;) {
   if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
      append_more_bitstream(bitstream);
   find_free_surface_from_the_pool(&work);
   bits=(end_of_stream())?NULL:bitstream;
   sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,work,&disp,&syncp);
   if (sts==MFX_ERR_MORE_SURFACE) continue;
   if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
   if (sts==MFX_ERR_REALLOC_SURFACE) {
      MFXVideoDECODE_GetVideoParam(session, &param);
      realloc_surface(work, param.mfx.FrameInfo);
      continue;
   }
   // skipped other error handling
   if (sts==MFX_ERR_NONE) {
      disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
      do_something_with_decoded_frame(disp);
   }
}
MFXVideoDECODE_Close(session);
free_pool_of_frame_surfaces();

Note the following key points about the example:

The application can use the MFXVideoDECODE_DecodeHeader() function to retrieve decoding initialization parameters from the bitstream. This step is optional if the data is retrievable from other sources such as an audio/video splitter.
The MFXVideoDECODE_DecodeFrameAsync() function can return following status codes in addition to the described above:
- mfxStatus::MFX_ERR_MORE_SURFACE: The function needs one more frame surface to produce any output.
- mfxStatus::MFX_ERR_REALLOC_SURFACE: Dynamic resolution change case - the function needs a bigger working frame surface (work).

The following pseudo code shows the simplified decoding procedure:

sts=MFX_ERR_MORE_DATA;
for (;;) {
   if (sts==MFX_ERR_MORE_DATA && !end_of_stream())
      append_more_bitstream(bitstream);
   bits=(end_of_stream())?NULL:bitstream;
   sts=MFXVideoDECODE_DecodeFrameAsync(session,bits,NULL,&disp,&syncp);
   if (sts==MFX_ERR_MORE_SURFACE) continue;
   if (end_of_stream() && sts==MFX_ERR_MORE_DATA) break;
   // skipped other error handling
   if (sts==MFX_ERR_NONE) {
      disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
      do_something_with_decoded_frame(disp);
      disp->FrameInterface->Release(disp);
   }
}

oneVPL API version 2.0 introduces a new decoding approach. For simple use cases, when the user wants to decode a stream and does not want to set additional parameters, a simplified procedure for the decoder’s initialization has been proposed. In this scenario it is possible to skip explicit stages of a stream’s header decoding and the decoder’s initialization and instead to perform these steps implicitly during decoding of the first frame. This change also requires setting the additional field mfxBitstream::CodecId to indicate codec type. In this mode the decoder allocates mfxFrameSurface1 internally, so users should set the input surface to zero.

Bitstream Repositioning#

The application can use the following procedure for bitstream reposition during decoding:

Use the MFXVideoDECODE_Reset() function to reset the oneVPL decoder.
Optional: If the application maintains a sequence header that correctly decodes the bitstream at the new position, the application may insert the sequence header to the bitstream buffer.
Append the bitstream from the new location to the bitstream buffer.
Resume the decoding procedure. If the sequence header is not inserted in the previous steps, the oneVPL decoder searches for a new sequence header before starting decoding.

Broken Streams Handling#

Robustness and the capability to handle a broken input stream is an important part of the decoder.

First, the start code prefix (ITU-T* H.264 3.148 and ITU-T H.265 3.142) is used to separate NAL units. Then all syntax elements in the bitstream are parsed and verified. If any of the elements violate the specification, the input bitstream is considered invalid and the decoder tries to re-sync (find the next start code). Subsequent decoder behavior is dependent on which syntax element is broken:

SPS header is broken: return mfxStatus::MFX_ERR_INCOMPATIBLE_VIDEO_PARAM (HEVC decoder only, AVC decoder uses last valid).
PPS header is broken: re-sync, use last valid PPS for decoding.
Slice header is broken: skip this slice, re-sync.
Slice data is broken: corruption flags are set on output surface.

Many streams have IDR frames with frame_num != 0 while the specification says that “If the current picture is an IDR picture, frame_num shall be equal to 0” (ITU-T H.265 7.4.3).

VUI is also validated, but errors do not invalidate the whole SPS. The decoder either does not use the corrupted VUI (AVC) or resets incorrect values to default (HEVC).

Note

Some requirements are relaxed because there are many streams which violate the strict standard but can be decoded without errors.

Corruption at the reference frame is spread over all inter-coded pictures that use the reference frame for prediction. To cope with this problem you must either periodically insert I-frames (intra-coded) or use the intra-refresh technique. The intra-refresh technique allows recovery from corruptions within a predefined time interval. The main point of intra-refresh is to insert a cyclic intra-coded pattern (usually a row) of macroblocks into the inter-coded pictures, restricting motion vectors accordingly. Intra-refresh is often used in combination with recovery point SEI, where the recovery_frame_cnt is derived from the intra-refresh interval. The recovery point SEI message is well described at ITU-T H.264 D.2.7 and ITU-T H.265 D.2.8. If decoding starts from AU associated with this SEI message, then the message can be used by the decoder to determine from which picture all subsequent pictures have no errors. In comparison to IDR, the recovery point message does not mark reference pictures as “unused for reference”.

Besides validation of syntax elements and their constraints, the decoder also uses various hints to handle broken streams:

If there are no valid slices for the current frame, then the whole frame is skipped.
The slices which violate slice segment header semantics (ITU-T H.265 7.4.7.1) are skipped. Only the slice_temporal_mvp_enabled_flag is checked for now.
Since LTR (Long Term Reference) stays at DPB until it is explicitly cleared by IDR or MMCO, the incorrect LTR could cause long standing visual artifacts. AVC decoder uses the following approaches to handle this:
- When there is a DPB overflow in the case of an incorrect MMCO command that marks the reference picture as LT, the operation is rolled back.
- An IDR frame with frame_num != 0 can’t be LTR.
If the decoder detects frame gapping, it inserts “fake”’” (marked as non-existing) frames, updates FrameNumWrap (ITU-T H.264 8.2.4.1) for reference frames, and applies the Sliding Window (ITU-T H.264 8.2.5.3) marking process. Fake frames are marked as reference, but since they are marked as non-existing, they are not used for inter-prediction.

VP8 Specific Details#

Unlike other oneVPL supported decoders, VP8 can accept only a complete frame as input. The application should provide the complete frame accompanied by the MFX_BITSTREAM_COMPLETE_FRAME flag. This is the single specific difference.

JPEG#

The application can use the same decoding procedures for JPEG/motion JPEG decoding, as shown in the following pseudo code:

// optional; retrieve initialization parameters
MFXVideoDECODE_DecodeHeader(...);
// decoder initialization
MFXVideoDECODE_Init(...);
// single frame/picture decoding
MFXVideoDECODE_DecodeFrameAsync(...);
MFXVideoCORE_SyncOperation(...);
// optional; retrieve meta-data
MFXVideoDECODE_GetUserData(...);
// close
MFXVideoDECODE_Close(...);

The MFXVideoDECODE_Query() function will return mfxStatus::MFX_ERR_UNSUPPORTED if the input bitstream contains unsupported features.

For still picture JPEG decoding, the input can be any JPEG bitstreams that conform to the ITU-T Recommendation T.81 with an EXIF or JFIF header. For motion JPEG decoding, the input can be any JPEG bitstreams that conform to the ITU-T Recommendation T.81.

Unlike other oneVPL decoders, JPEG decoding supports three different output color formats: NV12, YUY2, and RGB32. This support sometimes requires internal color conversion and more complicated initialization. The color format of the input bitstream is described by the mfxInfoMFX::JPEGChromaFormat and mfxInfoMFX::JPEGColorFormat fields. The MFXVideoDECODE_DecodeHeader() function usually fills them in. If the JPEG bitstream does not contains color format information, the application should provide it. Output color format is described by general oneVPL parameters: the mfxFrameInfo::FourCC and mfxFrameInfo::ChromaFormat fields.

Motion JPEG supports interlaced content by compressing each field (a half-height frame) individually. This behavior is incompatible with the rest of the oneVPL transcoding pipeline, where oneVPL requires fields to be in odd and even lines of the same frame surface. The decoding procedure is modified as follows:

The application calls the MFXVideoDECODE_DecodeHeader() function with the first field JPEG bitstream to retrieve initialization parameters.
The application initializes the oneVPL JPEG decoder with the following settings:
- The PicStruct field of the mfxVideoParam structure set to the correct interlaced type, MFX_PICSTRUCT_FIELD_TFF or MFX_PICSTRUCT_FIELD_BFF, from the motion JPEG header.
- Double the Height field in the mfxVideoParam structure as the value returned by the MFXVideoDECODE_DecodeHeader() function describes only the first field. The actual frame surface should contain both fields.
During decoding, the application sends both fields for decoding in the same mfxBitstream. The application should also set mfxBitstream::DataFlag to MFX_BITSTREAM_COMPLETE_FRAME. oneVPL decodes both fields and combines them into odd and even lines according to oneVPL convention.

By default, the MFXVideoDECODE_DecodeHeader() function returns the Rotation parameter so that after rotation, the pixel at the first row and first column is at the top left. The application can overwrite the default rotation before calling MFXVideoDECODE_Init().

The application may specify Huffman and quantization tables during decoder initialization by attaching mfxExtJPEGQuantTables and mfxExtJPEGHuffmanTables buffers to the mfxVideoParam structure. In this case, the decoder ignores tables from bitstream and uses the tables specified by the application. The application can also retrieve these tables by attaching the same buffers to mfxVideoParam and calling MFXVideoDECODE_GetVideoParam() or MFXVideoDECODE_DecodeHeader() functions.

Multi-view Video Decoding#

The oneVPL MVC decoder operates on complete MVC streams that contain all view and temporal configurations. The application can configure the oneVPL decoder to generate a subset at the decoding output. To do this, the application must understand the stream structure and use the stream information to configure the decoder for target views.

The decoder initialization procedure is as follows:

The application calls the MFXVideoDECODE_DecodeHeader() function to obtain the stream structural information. This is done in two steps:
1. The application calls the MFXVideoDECODE_DecodeHeader() function with the mfxExtMVCSeqDesc structure attached to the mfxVideoParam structure. At this point, do not allocate memory for the arrays in the mfxExtMVCSeqDesc structure. Set the View, ViewId, and OP pointers to NULL and set NumViewAlloc, NumViewIdAlloc, and NumOPAlloc to zero. The function parses the bitstream and returns mfxStatus::MFX_ERR_NOT_ENOUGH_BUFFER with the correct values for NumView, NumViewId, and NumOP. This step can be skipped if the application is able to obtain the NumView, NumViewId, and NumOP values from other sources.
2. The application allocates memory for the View, ViewId, and OP arrays and calls the MFXVideoDECODE_DecodeHeader() function again. The function returns the MVC structural information in the allocated arrays.
The application fills the mfxExtMVCTargetViews structure to choose the target views, based on information described in the mfxExtMVCSeqDesc structure.
The application initializes the oneVPL decoder using the MFXVideoDECODE_Init() function. The application must attach both the mfxExtMVCSeqDesc structure and the mfxExtMVCTargetViews structure to the mfxVideoParam structure.

In the above steps, do not modify the values of the mfxExtMVCSeqDesc structure after the MFXVideoDECODE_DecodeHeader() function, as the oneVPL decoder uses the values in the structure for internal memory allocation. Once the application configures the oneVPL decoder, the rest of the decoding procedure remains unchanged. As shown in the pseudo code below, the application calls the MFXVideoDECODE_DecodeFrameAsync() function multiple times to obtain all target views of the current frame picture, one target view at a time. The target view is identified by the FrameID field of the mfxFrameInfo structure.

mfxExtBuffer *eb[2];
mfxExtMVCSeqDesc  seq_desc;
mfxVideoParam init_param;

init_param.ExtParam=(mfxExtBuffer **)&eb;
init_param.NumExtParam=1;
eb[0]=(mfxExtBuffer *)&seq_desc;
MFXVideoDECODE_DecodeHeader(session, bitstream, &init_param);

/* select views to decode */
mfxExtMVCTargetViews tv;
init_param.NumExtParam=2;
eb[1]=(mfxExtBuffer *)&tv;

/* initialize decoder */
MFXVideoDECODE_Init(session, &init_param);

/* perform decoding */
for (;;) {
    MFXVideoDECODE_DecodeFrameAsync(session, bits, work, &disp, &syncp);
    disp->FrameInterface->Synchronize(disp, INFINITE); // or MFXVideoCORE_SyncOperation(session,syncp,INFINITE)
}

/* close decoder */
MFXVideoDECODE_Close(session);

Combined Decode with Multi-channel Video Processing#

The oneVPL exposes interface for making decode and video processing operations in one call. Users can specify a number of output processing channels and multiple video filters per each channel. This interface supports only internal memory allocation model and returns array of processed frames through mfxSurfaceArray reference object as shown by the example:

num_channel_par = 2;
// first video processing channel with resize
vpp_par_array[0]->VPP.Width = 400;
vpp_par_array[0]->VPP.Height = 400;

// second video channel with color conversion filter
vpp_par_array[1]->VPP.FourCC = MFX_FOURCC_UYVY;

sts = MFXVideoDECODE_VPP_Init(session, decode_par, vpp_par_array, num_channel_par);

sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session, bitstream, NULL, 0, &surf_array_out);

//surf_array_out layout is
do_smth(surf_array_out->Surfaces[0]); //The first channel contains decoded frames.
do_smth(surf_array_out->Surfaces[1]); //The second channel contains resized frames after decode. 
do_smth(surf_array_out->Surfaces[2]); //The third channel contains color converted frames after decode. 

It’s possible that different video processing channels may have different latency:

//1st call
sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session, bitstream, NULL, 0, &surf_array_out);
//surf_array_out layout is
do_smth(surf_array_out->Surfaces[0]); //decoded frame
do_smth(surf_array_out->Surfaces[1]); //resized frame (ChannelId = 1). The first frame from channel with resize available 
// no output from channel with ADI output since it has one frame delay

//2nd call
sts = MFXVideoDECODE_VPP_DecodeFrameAsync(session, bitstream, NULL, 0, &surf_array_out);
//surf_array_out layout is
do_smth(surf_array_out->Surfaces[0]); //decoded frame
do_smth(surf_array_out->Surfaces[1]); //resized frame (ChannelId = 1)
do_smth(surf_array_out->Surfaces[2]); //ADI output (ChannelId = 2). The first frame from ADI channel 

Application can match decoded frame w/ specific VPP channels using mfxFrameData::TimeStamp, :cpp:member:mfxFrameData::FrameOrder` and mfxFrameInfo::ChannelId.

Application can skip some or all channels including decoding output with help of skip_channels and num_skip_channels parameters as follows: application fills skip_channels array with ChannelId`s to disable output of correspondent channels. In that case :cpp:member:`surf_array_out would contain only surfaces for the remaining channels. If the decoder’s channel and/or impacted VPP channels don’t have output frame(s) for the current call (for instance, input bitstream doesn’t contain complete frame or deinterlacing/FRC filter have delay) skip_channels parameter is ignored for this channel.

If application disables all channels the SDK returns NULL as mfxSurfaceArray.

If application doesn’t need to disable any channels it sets num_skip_channels to zero, skip_channels is ignored when num_skip_channels is zero.

Note

Even if more than one input compressed frame is consumed, the MFXVideoDECODE_VPP_DecodeFrameAsync() produces only one decoded frame and correspondent frames from VPP channels.

oneAPI Specification 1.2-rev-1 documentation

Decoding Procedures

Contents

Decoding Procedures#

Bitstream Repositioning#

Broken Streams Handling#

VP8 Specific Details#

JPEG#

Multi-view Video Decoding#

Combined Decode with Multi-channel Video Processing#