NVENC_VideoEncoder_API_ProgGuide
NVENC_VideoEncoder_API_ProgGuide
Programming Guide
NVIDIA® GPUs based on NVIDIA Kepler™ and later GPU architectures contain a hardware-
based H.264/HEVC/AV1 video encoder (hereafter referred to as NVENC). The NVENC hardware
takes YUV/RGB as input and generates an H.264/HEVC/AV1 compliant video bit stream. NVENC
hardware’s encoding capabilities can be accessed using the NVENCODE APIs, available in the
NVIDIA Video Codec SDK.
This document provides information on how to program the NVENC using the NVENCODE APIs
exposed in the SDK. The NVENCODE APIs expose encoding capabilities on Windows (Windows
10 and above) and Linux.
It is expected that developers should understand H.264/HEVC/AV1 video codecs and be familiar
with Windows and/or Linux development environments.
NVENCODE API guarantees binary backward compatibility (and will make explicit reference
whenever backward compatibility is broken). This means that applications compiled with older
versions of released API will continue to work on future driver versions released by NVIDIA.
Developers can create a client application that calls NVENCODE API functions exposed
by nvEncodeAPI.dll for Windows or libnvidia-encode.so for Linux. These libraries are
installed as part of the NVIDIA display driver. The client application can either link to these
libraries at run-time using LoadLibrary() on Windows or dlopen() on Linux.
The NVENCODE API functions, structures and other parameters are exposed in nvEncodeAPI.h,
which is included in the SDK.
NVENCODE API is a C-API, and uses a design pattern like C++ interfaces, wherein the application
creates an instance of the API and retrieves a function pointer table to further interact with the
encoder. For programmers preferring more high-level API with ready-to-use code, SDK includes
sample C++ classes expose important API functions.
Rest of this document focuses on the C-API exposed in nvEncodeAPI.h. NVENCODE API is
designed to accept raw video frames (in YUV or RGB format) and output the H.264, HEVC or AV1
bitstream. Broadly, the encoding flow consists of the following steps:
1. Initialize the encoder
2. Set up the desired encoding parameters
3. Allocate input/output buffers
4. Copy frames to input buffers and read bitstream from the output buffers. This can be done
synchronously (Windows & Linux) or asynchronously (Windows 10 and above only).
5. Clean-up - release all allocated input/output buffers
6. Close the encoding session
These steps are explained in the rest of the document and demonstrated in the sample
application included in the Video Codec SDK package.
3.1.1.1. DirectX 9
‣ The client should create a DirectX 9 device with behavior
flags including : D3DCREATE_FPU_PRESERVE, D3DCREATE_MULTITHREADED and
D3DCREATE_HARDWARE_VERTEXPROCESSING
‣ The client should pass a pointer to IUnknown interface of the created device
(typecast to void *) as NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device, and set
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType to
NV_ENC_DEVICE_TYPE_DIRECTX. Use of DirectX devices is supported only on Windows 10
and later versions of the Windows OS.
3.1.1.2. DirectX 10
‣ The client should pass a pointer to IUnknown interface of the created device
(typecast to void *) as NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device, and set
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType to
NV_ENC_DEVICE_TYPE_DIRECTX. Use of DirectX devices is supported only on Windows 10
and later versions of Windows OS.
3.1.1.3. DirectX 11
‣ The client should pass a pointer to IUnknown interface of the created device
(typecast to void *) as NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device, and set
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType to
NV_ENC_DEVICE_TYPE_DIRECTX. Use of DirectX devices is supported only on Windows 10
and later versions of Windows OS.
3.1.1.4. DirectX 12
‣ The client should pass a pointer to IUnknown interface of the created device
(typecast to void *) as NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device, and set
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType to
NV_ENC_DEVICE_TYPE_DIRECTX. Use of DirectX 12 devices is supported only on Windows
10 20H1 and later versions of Windows OS.
3.1.1.5. CUDA
‣ The client should create a floating CUDA context, and pass the CUDA
context handle as
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device, and set
NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType to
NV_ENC_DEVICE_TYPE_CUDA. Use of CUDA device for Encoding is supported on Linux and
Windows 10 and later versions of Windows OS.
3.1.1.6. OpenGL
‣ The client should create an OpenGL context and make it current (in order to associate
the context with the thread/process that is making calls to NVENCODE API) to the thread
calling into NVENCODE API. NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::device must
be NULL and NV_ENC_OPEN_ENCODE_SESSION_EX_PARAMS::deviceType must be set to
NV_ENC_DEVICE_TYPE_OPENGL. Use of the OpenGL device type for encoding is supported
only on Linux.
1. Cloud gaming
2. Streaming
3. Video conferencing Low latency, with CBR
In high bandwidth channel with tolerance for
bigger occasional frame sizes
1. Cloud gaming
2. Streaming
Ultra-low latency, with CBR
3. Video conferencing
In strictly bandwidth-constrained channel
For each tuning info, seven presets from P1 (highest performance) to P7 (lowest performance)
have been provided to control performance/quality trade off. Using these presets will
automatically set all relevant encoding parameters for the selected tuning info. This is a coarse
level of control exposed by the API. Specific attributes/parameters within the preset can be
tuned, if required. This is explained in next two subsections.
2. The client should use this count to allocate a large-enough buffer to hold the supported
Preset GUIDs.
3. The client should then call NvEncGetEncodePresetGUIDs to populate this list.
‣ The client should specify the session parameters as described in Section Session
parameters.
‣ Optionally, the client can enumerate and select preset GUID that best suits the current use
case, as described in Section Selecting Encoder Codec GUID. The client should then pass
the selected preset GUID using NV_ENC_INITIALIZE_PARAMS::presetGUID. This helps the
NVIDIA Video Encoder interface to correctly configure the encoder session based on the
encodeGUID, tuning info and presetGUID provided.
‣ The client should set the advanced codec-level parameter pointer
NV_ENC_INITIALIZE_PARAMS::encodeConfig::encodeCodecConfig to NULL.
same, performance of 2-pass rate control mode is lower than that of 1-pass rate control
mode. The client application should choose an appropriate multi-pass rate control mode after
evaluating various modes, as each of the modes has its own advantages and disadvantages.
NV_ENC_TWO_PASS_FULL_RESOLUION generates better statistics for the second pass, whereas
NV_ENC_TWO_PASS_QUARTER_RESOLUTION results in larger motion vectors being caught and fed
as hints to second pass.
The client may choose to allocate input buffers through NVIDIA Video Encoder Interface by calling
NvEncCreateInputBuffer API. In this case, the client is responsible for destroying the allocated
input buffers before closing the encode session. It is also the client’s responsibility to fill the
input buffer with valid input data according to the chosen input buffer format.
The client should allocate buffers to hold the output encoded bit stream using the
NvEncCreateBitstreamBuffer API. It is the client’s responsibility to destroy these buffers
before closing the encode session.
Alternatively, in scenarios where the client cannot or does not want to allocate input buffers
through the NVIDIA Video Encoder Interface, it can use any externally allocated DirectX resource
as an input buffer. However, the client must perform some simple processing to map these
resources to resource handles that are recognized by the NVIDIA Video Encoder Interface before
use. The translation procedure is explained in Section Input buffers allocated externally.
If the client has used a CUDA device to initialize the encoder session and wishes to use input
buffers NOT allocated through the NVIDIA Video Encoder Interface, the client is required to use
buffers allocated using the cuMemAlloc family of APIs. NVIDIA Video Encoder Interface supports
CUdeviceptr and CUarray input formats.
If the client has used the OpenGL device type to initialize the encoder session and wishes to use
input buffers NOT allocated through the NVIDIA Video Encoder Interface, the client is required
to provide the textures allocated earlier.
The client may generate textures using glGenTextures(), bind it
to either the NV_ENC_INPUT_RESOURCE_OPENGL_TEX::GL_TEXTURE_RECTANGLE or
NV_ENC_INPUT_RESOURCE_OPENGL_TEX::GL_TEXTURE_2D target, allocate storage for it using
glTexImage2D() and copy data to it.
Note that the OpenGL interface for NVENCODE API is only supported on Linux.
If the client has used a DirectX 12 device to initialize encoder session, then client must allocate
input and output buffers using ID3D12Device::CreateCommittedResource() API. The client
must perform some simple processing to map these input and output resources to resource
handles that are recognized by the NVIDIA Video Encoder Interface before use. The translation
procedure is explained in Section Input output buffer allocation for DirectX 12.
Note: The client should allocate at least (1 + NB) input and output buffers, where NB is the
number of B frames between successive P frames.
output bitstream generated for the current input will then include SPS/PPS for H.264/HEVC or
Sequence Header OBU for AV1.
The client can call NvEncGetSequenceParams at any time, after the encoder has been initialized
(NvEncInitializeEncoder) and the session is active.
Once the encode session is configured and input/output buffers are allocated, the client can start
streaming the input data for encoding. The client is required to pass a handle to a valid input
buffer and a valid bit stream (output) buffer to the NVIDIA Video Encoder Interface for encoding
an input picture.
7. After the client has finished using the resource NvEncUnmapInputResource must be called.
8. The client must also call NvEncUnregisterResource with the handle returned by
NvEncRegisterResource before destroying the registered resource.
The input picture data will be taken from the specified input buffer, and the encoded bit stream
will be available in the specified bit stream (output) buffer once the encoding process completes.
Codec-agnostic parameters such as timestamp, duration, input buffer pointer, etc. are passed
via the structure NV_ENC_PIC_PARAMS while codec-specific parameters are passed via the
structure NV_ENC_PIC_PARAMS_H264/NV_ENC_PIC_PARAMS_HEVC/NV_ENC_PIC_PARAMS_AV1
depending upon the codec in use.
The client should specify the codec-specific structure in NV_ENC_PIC_PARAMS using the
NV_ENC_PIC_PARAMS::codecPicParams member.
If the client has used a DirectX 12 device to initialize encoder session, client must pass pointer
to NV_ENC_INPUT_RESOURCE_D3D12 in NV_ENC_PIC_PARAMS:: inputBuffer containing
the registered resource handle and the corresponding input NV_ENC_FENCE_POINT_D3D12
for NVENC to wait before starting encode. Client must pass pointer to
NV_ENC_OUTPUT_RESOURCE_D3D12 in NV_ENC_PIC_PARAMS::outputBuffer containing the
registered resource handle and the corresponding output NV_ENC_FENCE_POINT_D3D12.
NVENC engine waits until the NV_ENC_INPUT_RESOURCE_D3D12::inputFencePoint is
reached before starting processing of input buffer. NVENC engine signal the
NV_ENC_OUTPUT_RESOURCE_D3D12::outputFencePoint when processing of the resource is
completed so that other engines which need to use these input and output resources can start
processing.
The NVIDIA Video Encoder Interface supports the following two modes of operation.
1
To check the mode in which your board is running, run the command-line utility nvidia-smi (nvidia-smi.exe on Windows) included
with the driver.
‣ The client will receive the event's signal and output buffer in the same order in which they
were queued.
‣ The NV_ENC_LOCK_BITSTREAM::pictureType notifies the output picture type to the clients.
‣ Both, the input and output sample (output buffer and the output completion event) are free
to be reused once the NVIDIA Video Encoder Interface has signalled the event and the client
has copied the data from the output buffer.
‣ NV_ENC_INITIALIZE_PARAMS::enableEncodeAsync = 1
‣ NV_ENC_LOCK_BITSTREAM::doNotWait = 0
‣ NV_ENC_INITIALIZE_PARAMS::enableOutputInVidmem = 0
Note: The impact of enabling these features on overall CUDA or graphics performance is minimal,
and this list is provided purely for information purposes.
NVENC can be used as a hardware accelerator to perform motion search and generate motion
vectors and mode information. The resulting motion vectors or mode decisions can be used, for
example, in motion compensated filtering or for supporting other codecs not fully supported by
NVENC or simply as motion vector hints for a custom encoder. The procedure to use the feature
is explained below.
For use-cases involving computer vision, AI and frame interpolation, Turing and later GPUs
contain another hardware accelerator for computing optical flow vectors between frames, which
provide better visual matching than the motion vectors.
8.1. Look-ahead
Look-ahead improves the video encoder’s rate control accuracy by enabling the encoder to buffer
the specified number of frames, estimate their complexity and allocate the bits appropriately
among these frames proportional to their complexity. This also dynamically allocates B and P
frames.
To use this feature, the client must follow these steps:
1. The availability of the feature in the current hardware can be queried using
NvEncGetEncodeCaps and checking for NV_ENC_CAPS_SUPPORT_LOOKAHEAD.
2. Look-ahead needs to be enabled during initialization by setting
NV_ENC_INITIALIZE_PARAMS::encodeconfig->rcParams.enableLookahead = 1.
3. The number of frames to be looked ahead should be set in
NV_ENC_INITIALIZE_PARAMS::encodeconfig->rcParams.lookaheadDepth which can
be up to 32.
4. By default, look-ahead enables adaptive insertion of intra
frames and B frames. They can however be disabled by
setting NV_ENC_INITIALIZE_PARAMS::encodeconfig->rcParams.disableIadapt and/
orNV_ENC_INITIALIZE_PARAMS::encodeconfig->rcParams.disableBadapt to 1.
5. When the feature is enabled, frames are queued up in the encoder and hence
NvEncEncodePicture will return NV_ENC_ERR_NEED_MORE_INPUT until the encoder has
sufficient number of input frames to satisfy the look-ahead requirement. Frames should be
continuously fed in until NvEncEncodePicture returns NV_ENC_SUCCESS.
‣ For H.264 and HEVC, this will set the (N/2)th B frame as reference where N = number of
B frames. In case N is odd, then (N-1)/2th frame will be picked up as reference.
‣ For AV1, this will set every other B frame as an Altref2 reference but for the last B frame
in the Altref interval.
The API would fail if any attempt is made to reconfigure the parameters which is not supported.
Resolution change is possible only if NV_ENC_INITIALIZE_PARAMS::maxEncodeWidth and
NV_ENC_INITIALIZE_PARAMS::maxEncodeHeight are set while creating encoder session.
If the client wishes to change the resolution using this API, it is advisable to
force the next frame following the reconfiguration as an IDR frame by setting
NV_ENC_RECONFIGURE_PARAMS::forceIDR to 1.
If the client wishes to reset the internal rate control states, set
NV_ENC_RECONFIGURE_PARAMS::resetEncoder to 1.
8.4.1. Spatial AQ
Spatial AQ mode adjusts the QP values based on spatial characteristics of the frame. Since
the low complexity flat regions are visually more perceptible to quality differences than high
complexity detailed regions, extra bits are allocated to flat regions of the frame at the cost of the
regions having high spatial detail. Although spatial AQ improves the perceptible visual quality
of the encoded video, the required bit redistribution results in PSNR drop in most of the cases.
Therefore, during PSNR-based evaluation, this feature should be turned off.
To use spatial AQ, follow these steps in your application.
8.4.2. Temporal AQ
Temporal AQ tries to adjust encoding QP (on top of QP evaluated by the rate control algorithm)
based on temporal characteristics of the sequence. Temporal AQ improves the quality of
encoded frames by adjusting QP for regions which are constant or have low motion across
frames but have high spatial detail, such that they become better reference for future frames.
Allocating extra bits to such regions in reference frames is better than allocating them to the
residuals in referred frames because it helps improve the overall encoded video quality. If
majority of the region within a frame has little or no motion, but has high spatial details (e.g.
high-detail non-moving background) enabling temporal AQ will benefit the most.
One of the potential disadvantages of temporal AQ is that enabling temporal AQ may result in
high fluctuation of bits consumed per frame within a GOP. I/P-frames will consume more bits
than average P-frame size and B-frames will consume lesser bits. Although target bitrate will
be maintained at the GOP level, the frame size will fluctuate from one frame to next within a
GOP more than it would without temporal AQ. If a strict CBR profile is required for every frame
size within a GOP, it is not recommended to enable temporal AQ. Additionally, since some of
the complexity estimation is performed in CUDA, there may be some performance impact when
temporal AQ is enabled.
To use temporal AQ, follow these steps in your application.
1. Query the availability of temporal AQ for the current hardware by calling the API
NvEncGetEncodeCaps and checking for NV_ENC_CAPS_SUPPORT_TEMPORAL_AQ.
Weighted prediction is not supported if the encode session is configured with B frames.
Weighted prediction is not supported if DirectX 12 device is used.
Weighted prediction uses CUDA pre-processing and hence requires CUDA processing power,
depending upon resolution and content.
Enabling weighted prediction may also result in very minor degradation in encoder performance.
During normal encoding operation, following steps need to be followed to mark specific frame(s)
as LTR frame(s).
1. Configure the number of LTR frames:
The frames previously marked as long-term reference frames can be used for prediction of the
current frame in the following manner:
1. The LTR frames that are to be used for reference have
to be specified using NV_ENC_PIC_PARAMS_H264::ltrUseFrameBitmap OR
‣ Absolute value of the QP as decided by the rate control algorithm, depending upon the rate
control constraints. In general, for a given emphasis level, higher the QP determined by the
rate control, higher the (negative) adjustment.
‣ Emphasis level value for the macroblock.
Note: The QP adjustment is performed after the rate control algorithm has run. Therefore, there
is a possibility of VBV/rate violations when using this feature.
Emphasis level map is useful when the client has prior knowledge of the image complexity (e.g.
NVFBC's Classification Map feature) and encoding those high-complexity areas at higher quality
(lower QP) is important, even at the possible cost of violating bitrate/VBV buffer size constraints.
This feature is not supported when AQ (Spatial/Temporal) is enabled.
Follow these steps to enable the feature.
1. Query availability of the feature using NvEncGetEncodeCaps API and checking for
NV_ENC_CAPS_SUPPORT_EMPHASIS_LEVEL_MAP.
2. Set NV_ENC_RC_PARAMS::qpMapMode = NV_ENC_QP_MAP_EMPHASIS.
3. Fill up the NV_ENC_PIC_PARAMS::qpDeltaMap (which is a signed byte array containing
value per macroblock in raster scan order for the current picture) with a value from enum
NV_ENC_EMPHASIS_MAP_LEVEL.
‣ For AV1, HEVC or H.264 encoding, the recommended size for this buffer is:
Output buffer size = 2 * Input YUV buffer size +
sizeof(NV_ENC_ENCODE_OUT_PARAMS)
First sizeof(NV_ENC_ENCODE_OUT_PARAMS) bytes of the output buffer contain
NV_ENC_ENCODE_OUT_PARAMS structure, followed by encoded bitstream data.
‣ For H.264 ME-only output, the recommended size of output buffer is:
Output buffer size = HeightInMbs * WidthInMbs *
sizeof(NV_ENC_H264_MV_DATA)
where HeightInMbs and WidthInMbs are picture height and width in number of 16x16
macroblocks, respectively.
‣ For DirectX 11 interface, this buffer can be created using DirectX 11
CreateBuffer() API, by specifying usage = D3D11_USAGE_DEFAULT; BindFlags
= (D3D11_BIND_VIDEO_ENCODER | D3D11_BIND_SHADER_RESOURCE); and
CPUAccessFlags = 0;
‣ For CUDA interface, this buffer can be created using cuMemAlloc().
3. Register this buffer using nvEncRegisterResource(), by specifying:
When operating in asynchronous mode, client application should wait on event before reading
the output. In synchronous mode no event is triggered, and the synchronization is handled
internally by NVIDIA driver.
To access the output, follow these steps:
1. Client must un-map the input buffer by calling nvEncUnmapInputResource() with
mapped resource handle NV_ENC_MAP_INPUT_RESOURCE::mappedResource returned by
nvEncMapInputResource(). After this, the output buffer can be used for further processing/
reading etc.
2. In case of encode, the first sizeof(NV_ENC_ENCODE_OUT_PARAMS) bytes of
this buffer should be interpreted as NV_ENC_ENCODE_OUT_PARAMS structure
followed by encode bitstream data.The size of encoded bitstream is given by
NV_ENC_ENCODE_OUT_PARAMS::bitstreamSizeInBytes.
3. If CUDA mode is specified, all CUDA operations on this buffer must use the default stream.
To get the output in system memory, output buffer can be read by calling any CUDA API (e.g.
cuMemcpyDtoH()) with default stream. The driver ensures that the output buffer is read only
after NVENC has finished writing the output in it.
4. For DX11 mode, any DirectX 11 API can be used to read the output. The driver ensures that
the output buffer is read only after NVENC has finished writing the output in it. To get the
output in system memory, CopyResource() (which is a DirectX 11 API) can be used to copy
the data in a CPU readable staging buffer. This staging buffer then can be read after calling
Map()which is a DirectX 11 API.
will be added with every IDR frame in the encoded bitstream. Note that only a subset of fields
related to temporal scalability is currently supported in this SEI.
When temporal SVC is enabled, only base layer frames can be marked as long term references.
Temporal SVC is currently not supported with B-frames. The field
NV_ENC_CONFIG::frameIntervalP will be ignored when temporal SVC is enabled.
Intra Refresh
Reference picture invalidation technique described in Section Reference Picture Invalidation
depends upon availability of an out-of-band upstream channel to report bitstream errors at the
decoder (client side). When such an upstream channel is not available, or in situations where
bitstream is more likely to suffer from more frequent errors, intra-refresh mechanism can be
used as an error recovery mechanism. Also, when using infinite GOP length, no intra frames are
transmitted and intra refresh may be a useful mechanism for recovery from transmission errors.
‣ For AV1, the number of tiles used during the intra refresh wave is automatically determined
by the driver based on the value of intraRefreshCnt and intraRefreshPeriod. Any
custom tiles configuration specified by the application will be ignored for the duration of the
intra refresh wave.
‣ If the application does not explicitly specify the number of slices or if the specified number
of slices are less than 3, during the intra refresh wave, the driver will set 3 slices per frame.
‣ For NV_ENC_CONFIG_H264::sliceMode = 0 (MB based slices), 2 (MB row based slices) and
3 (number of slices), the driver will maintain slice count, equal to minimum of: the slice
count calculated from slice mode setting and intraRefreshCnt number of slices during
intra refresh period.
‣ For NV_ENC_CONFIG_H264::sliceMode = 1 (byte based slices), the number of slices during
an intra refresh wave is always 3.
For certain usecases, clients may want to avoid multiple slices in a frame. In such scenarios,
clients can enable single slice intra refresh.
‣ Query the support for single slice intra refresh for the current driver by calling the API
NvEncGetEncodeCaps and checking for NV_ENC_CAPS_SINGLE_SLICE_INTRA_REFRESH.
Intra refresh is applied in encode order and only on frames which can be used as reference.
Tuning Preset
Info P1 P2 P3 P4 P5 P6 P7
High Yes Yes No No No No No
Quality
Low Yes Yes Yes Yes No No No
Latency
Tuning Preset
Info P1 P2 P3 P4 P5 P6 P7
Ultra Low Yes Yes Yes Yes No No No
Latency
‣ NV_ENC_REGISTER_RESOURCE::bufferUsage = NV_ENC_OUTPUT_RECON
‣ Set NV_ENC_REGISTER_RESOURCE::bufferFormat to desired value.
NvEncRegisterResource() will return a registered handle in
NV_ENC_REGISTER_RESOURCE::registeredResource.
4. Set NV_ENC_MAP_INPUT_RESOURCE::registeredResource to
NV_ENC_REGISTER_RESOURCE::registeredResource, which was obtained in the previous
step.
5. Call nvEncMapInputResource(). It will return a mapped resource handle in
NV_ENC_MAP_INPUT_RESOURCE::mappedResource.
6. Call nvEncEncodePicture() by setting NV_ENC_PIC_PARAMS::outputReconBuffer to
NV_ENC_MAP_INPUT_RESOURCE:: mappedResource and
NV_ENC_PIC_PARAMS::encodePicFlags to NV_ENC_PIC_FLAG_OUTPUT_RECON_FRAME.
new states, corresponding to each of these iterations in its internal state buffers. NVENC state
can be then advanced to any one of the iterations using NvEncRestoreEncoderState() API.
Steps to enable iterative encoding:
1. Set NV_ENC_INITIALIZE_PARAMS::numStateBuffers to desired value when calling
NvEncInitializeEncoder() API. Maximum number of state buffers which can be allocated
is 16 for H.264 and HEVC and 32 for AV1.
Application can call NvEncReconfigureEncoder() API to set desired encoding parameters,
for e.g. different QP values, before every iteration of the frame. Alternatively, it can also set
NV_ENC_PIC_PARAMS::qpDeltaMap array to desired value for encoding current iteration of the
frame.
Iterative encoding for H.264 and HEVC when picture type decision(PTD) is taken
by NVENCODE API
Follow these steps to encode the same frame multiple times:
1. Set NV_ENC_PIC_PARAMS::encodePicFlags, NV_ENC_PIC_PARAMS::frameIdx and
NV_ENC_PIC_PARAMS::stateBufferIdx to valid values, as mentioned in above section
and call NvEncEncodePicture() API. This API will return either NV_ENC_SUCCESS or
NV_ENC_ERR_NEED_MORE_INPUT status.
2. If it returns NV_ENC_SUCCESS, the application can do iterative encoding on this frame now.
3. If it returns NV_ENC_ERR_NEED_MORE_INPUT, the application can not do the iterative
encoding on this frame right now. Application must send the next frames for encoding, until it
returns NV_ENC_SUCCESS. Application can now do iterative encoding on this frame for which
NV_ENC_SUCCESS is returned.
4. Call NvEncLockBitstream() API to get the encoded output for the first iteration.
a). If NV_ENC_LOCK_BITSTREAM::frameIdxDisplay is same as
NV_ENC_PIC_PARAMS::frameIdx in step 1 or 3, NvEncLockBitstream() API must be
called for all remaining iterations.
b). In some cases, NV_ENC_LOCK_BITSTREAM::frameIdxDisplay may be different
than NV_ENC_PIC_PARAMS::frameIdx indicating different frame is received than
the one on which iterative encoding was done in step 1 or 3. In this
case, application must do iterative encoding on the frame corresponding to
NV_ENC_LOCK_BITSTREAM::frameIdxDisplay, if needed. NvEncLockBitstream() API
must be called for all iterations of this frame to get the encoded outputs,
followed by NvEncRestoreEncoderState() API, to restore the state. NVENCODE
API will save encoding parameters for all iterations of the frame corresponding to
NV_ENC_PIC_PARAMS::frameIdx in step 1 or 3 and will send them for encoding after
all previous frames (for which NV_ENC_ERR_NEED_MORE_INPUT status was returned) are
encoded and the encoder state is restored for all of them.
5. Call NvEncRestoreEncoderState() API for the chosen frame iteration.
a). If NV_ENC_ERR_NEED_MORE_INPUT was returned for any of the frames, NVENCODE API
will now send one of those frames for encoding.
b). If NV_ENC_PIC_PARAMS::encodePicFlags is not set to
NV_ENC_PIC_FLAG_DISABLE_ENC_STATE_ADVANCE for the frame which is sent for
encoding in step a), subsequent frame will also be sent for encoding.
6. If there are frames for which NV_ENC_ERR_NEED_MORE_INPUT was returned, application
must call NvEncLockBitstream() API. NV_ENC_LOCK_BITSTREAM::frameIdxDisplay will
indicate the frame on which iterative encoding can be done now.
7. Repeat above step for all frames for which NV_ENC_ERR_NEED_MORE_INPUT status was
returned.
Buffer reordering:
1. Driver does the buffer reordering when B frames are present. Buffer reordering is done for
output bitstream buffer, completion event and reconstructed frame buffer.
2. This reordering is done so that the application can get output in decode order and does not
have to take care of picture types.
3. For state buffer index, there is no reordering.
Following table describes the API calls and the buffers which will have the encoded output,
reconstructed frame output and internal states:
Table 3. API calls for iterative encoding for H.264 and HEVC
S. no. API calls Return parameters Comments
1 NvEncEncodePicture (I1, NV_ENC_SUCCESS
N1=1, O1, E1, R1, F1=0)
2 NvEncLockBitstream(O1) frameIdxDisplay=1,
picType:
NV_ENC_PIC_TYPE_I
Note: I1, N1, O1, E1, R1, S1 represent input buffer, frame index, output
buffer, completion event, reconstructed buffer and state buffer index for the first
iteration of the first frame. F1=0 represents NV_ENC_PIC_PARAMS::encodePicFlags.
is not set to NV_ENC_PIC_FLAG_DISABLE_ENC_STATE_ADVANCE. F1=1 represents
NV_ENC_PIC_PARAMS::encodePicFlags is set to
NV_ENC_PIC_FLAG_DISABLE_ENC_STATE_ADVANCE.
must be called for all iterations of this frame to get the encoded outputs,
followed by NvEncRestoreEncoderState() API, to restore the state. NVENCODE
API will save encoding parameters for all iterations for the frame corresponding to
NV_ENC_PIC_PARAMS::frameIdx in step 1 or 3 and will send them for encoding after
all previous frames (for which NV_ENC_ERR_NEED_MORE_INPUT status was returned) are
encoded and the encoder state is restored for all of them.
c). If NV_ENC_PIC_PARAMS::encodePicFlags was set to
NV_ENC_PIC_FLAG_DISABLE_ENC_STATE_ADVANCE for
NV_ENC_LOCK_BITSTREAM::frameIdxDisplay frame and there were frames prior to
it for which NV_ENC_ERR_NEED_MORE_INPUT status was returned, then the received
encoded frame will be non-displayable frame.
d). For any non-displayable frame, corresponding OVERLAY frame would be encoded just
once, after all frames prior to this frame for which NV_ENC_ERR_NEED_MORE_INPUT was
returned are encoded, regardless of the number of iterations for this frame.
5. Application must call NvEncRestoreEncoderState() API to restore this frame. It may
return NV_ENC_ERR_NEED_MORE_OUTPUT or NV_ENC_SUCCESS status.
a). If it returnsNV_ENC_ERR_NEED_MORE_OUTPUT, application must call
NvEncRestoreEncoderState() API again with an output buffer
as input in NV_ENC_RESTORE_ENCODER_STATE_PARAMS::outputBitstream.
Application must send completion event
as input in
NV_ENC_RESTORE_ENCODER_STATE_PARAMS::completionEvent, if asynchronous
mode of encoding is enabled.
b). If NvEncRestoreEncoderState() API returns NV_ENC_SUCCESS, NVENCODE API will
now send one of the frames for which NV_ENC_ERR_NEED_MORE_INPUT was returned for
encoding.
c). If NV_ENC_PIC_PARAMS::encodePicFlags is not set to
NV_ENC_PIC_FLAG_DISABLE_ENC_STATE_ADVANCE for the frame which is sent for
encoding in step b), subsequent frame will also be sent for encoding.
6. Call NvEncLockBitstream() API if there are frames for which
NV_ENC_ERR_NEED_MORE_OUTPUT was returned.
a). Application should expect the encoded output for these frames in same order in which
they were sent for encoding.
b). If NV_ENC_LOCK_BITSTREAM::frameIdxDisplay is not in same order, it indicates that
non-displayable frame is received.
c). Application can now do iterative encoding on the frame corresponding to
NV_ENC_LOCK_BITSTREAM::frameIdxDisplay, if needed.
d). For any non-displayable frame, corresponding OVERLAY frame would be encoded just
once, after all frames prior to this frame for which NV_ENC_ERR_NEED_MORE_INPUT was
returned are encoded.
7. Repeat steps 5) and 6) for all frames for which NV_ENC_ERR_NEED_MORE_OUTPUT was
returned.
Following table describes the API calls and the buffers which will have the encoded output,
reconstructed frame output and internal states:
Note: I1, N1, O1, E1, R1, S1 represent input buffer, frame index, output
buffer, completion event, reconstructed buffer, state buffer index for the first
iteration of the first frame. F1=0 represents NV_ENC_PIC_PARAMS::encodePicFlags.
is not set to NV_ENC_PIC_FLAG_DISABLE_ENC_STATE_ADVANCE. F1=1 represents
NV_ENC_PIC_PARAMS::encodePicFlags is set to
NV_ENC_PIC_FLAG_DISABLE_ENC_STATE_ADVANCE. When picture type decision is taken
by application, there is no reordering for reconstructed buffer. Application must use
NV_ENC_LOCK_BITSTREAM::frameIdxDisplay to track this buffer.
The NVIDIA hardware video encoder is used for several purposes in various applications. Some
of the common applications include: Video-recording (archiving), game-casting (broadcasting/
multicasting video gameplay online), transcoding (live and video-on-demand) and streaming
(games or live content). Each of these use-cases has its unique requirements for quality,
bitrate, latency tolerance, performance constraints etc. Although NVIDIA encoder interface
provides flexibility to control the settings with many API’s, the table below can be used as a
general guideline for recommended settings for some of the popular use-cases to deliver the
best encoded bitstream quality. These recommendations are particularly applicable to GPUs
based on second generation Maxwell architecture beyond. For earlier GPUs (Kepler and first-
generation Maxwell), it is recommended that clients use the information in Table 5 as a starting
point and adjust the settings to achieve appropriate performance-quality tradeoff.
Game-casting &
‣ Rate control mode = CBR
‣ Medium VBV buffer size (1 second)
cloud transcoding
‣ B Frames*
‣ Look-ahead
‣ Avoid using B-frames. B-frames requires additional buffers for reordering, hence avoiding
B-frames would result to savings in video memory usage.
‣ Reduce maximum number of reference frames. Reducing number of maximum reference
frames results in NVIDIA display driver allocating lesser number of buffers internally thereby
reducing video memory footprint.
‣ Use single pass rate control modes. Two pass rate control consume additional video
memory in comparison to single pass due to additional allocations for first pass encoding.
Two pass rate control mode with first pass with full resolution consumes more than first
pass with quarter resolution.
‣ Avoid Adaptive Quantization / Weighted Prediction. Features such as Adaptive Quantization /
Weighted Prediction allocate additional buffers in video memory. These allocations can be
avoided if these features are not used.
‣ Avoid Lookahead. Lookahead allocates additional buffers in video memory for frames that
are buffered in the lookahead queue.
‣ Avoid Temporal Filter. Temporal filter requires neighbouring frames and and allocates
additional buffers in the video memory.
‣ Avoid UHQ Tuning Info. UHQ Tuning Info enables lookahead and temporal Filter, that have
higher memory requirements.
Note, however, that the above guidelines may result in some loss in encode quality. Clients are,
therefore, recommended to do a proper evaluation to achieve right balance between encoded
quality, speed and memory consumption.
NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice.
Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgment, unless otherwise agreed in
an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any
customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed
either directly or indirectly by this document.
NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications
where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage. NVIDIA
accepts no liability for inclusion and/or use of NVIDIA products in such equipment or applications and therefore such inclusion and/or use is at customer’s own risk.
NVIDIA makes no representation or warranty that products based on this document will be suitable for any specified use. Testing of all parameters of each product
is not necessarily performed by NVIDIA. It is customer’s sole responsibility to evaluate and determine the applicability of any information contained in this document,
ensure the product is suitable and fit for the application planned by customer, and perform the necessary testing for the application in order to avoid a default of
the application or the product. Weaknesses in customer’s product designs may affect the quality and reliability of the NVIDIA product and may result in additional
or different conditions and/or requirements beyond those contained in this document. NVIDIA accepts no liability related to any default, damage, costs, or problem
which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
Trademarks
NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, CUDA Toolkit, cuDNN, DALI, DIGITS, DGX, DGX-1, DGX-2, DGX Station, DLProf, GPU, Jetson, Kepler, Maxwell, NCCL,
Nsight Compute, Nsight Systems, NVCaffe, NVIDIA Deep Learning SDK, NVIDIA Developer Program, NVIDIA GPU Cloud, NVLink, NVSHMEM, PerfWorks, Pascal,
SDK Manager, Tegra, TensorRT, TensorRT Inference Server, Tesla, TF-TRT, Triton Inference Server, Turing, and Volta are trademarks and/or registered trademarks
of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which
they are associated.
Copyright
© 2010-2024 NVIDIA Corporation. All rights reserved.