I have been working on video transcoding recently for my organization’s video processing pipeline and for transcoding purposes I have been using FFmpeg. Most of our videos are user-generated content and for some of them while transcoding I am receiving the following error/warning message from FFmpeg: Application provided invalid, non-monotonically increasing dts to muxer in the stream.
What does this message mean and how can I avoid it?
To understand this warning, we need to understand what are DTS and PTS values first. When a player or software tries to decode and play the video, both audio and video streams have information about how fast and when you are supposed to play them inside of them. Audio streams have a sample rate, and video streams have a frame-per-second value. However, if we simply synced the video by counting frames and multiplying by frame rate, there is a chance that it will go out of sync with the audio.
Let’s explore the ideas behind PTS (Presentation Timestamp) and DTS (Decoding Timestamp) and how important they are to video playback. Comprehending these timestamps is essential to understanding the alert and guaranteeing that transcoding and video playing function properly.
1. Identifying PTS and DTS
Two important timestamps connected to video frames are the DTS (Decoding Timestamp) and PTS (Presentation Timestamp). They are essential in guaranteeing that the audio and video streams are correctly synced when being played back.
DTS helps video decoders preserve the proper frame order by indicating when a video frame has to be decoded. PTS allows for synchronized playing of audio and video by defining when a frame should be shown to the viewer.
2. Video and Audio Syncing
Maintaining synchronization between the audio and visual streams is essential for a flawless viewing experience. This implies that the timeline of the movie must be followed, with audio and video frames displayed simultaneously.
But there’s a chance that the sample rate of audio and the frame rate of video won’t always coincide. Counting frames and multiplying the result by the frame rate alone may eventually cause synchronization problems. This problem is solved with DTS and PTS.
3. DTS and PTS’s Roles
The secret to having exact control over when each audio and video frame is presented and decoded is to use PTS and DTS timestamps. They are essential for preserving synchronization while playing back.
While PTS directs the appropriate timing of the presentation of these frames to viewers, DTS helps the video decoder decide when to decode each video frame.
The decoding process may be hampered and video and audio synchronization problems may arise if the DTS values are not in a monotonically rising order. The presence of DTS values that deviate from the intended order is indicated by the warning message.
4. Managing DTS that isn’t monotonic
It is necessary to either adjust the DTS values or utilize the -fflags +genpts option in FFmpeg to resolve the warning message and guarantee seamless video transcoding and playback. With the use of this option, precise PTS values dependent on frame order can be generated.
When DTS and PTS values are aligned properly, audio and video stay synchronized, providing a smooth and error-free viewing experience.
Effective video processing and transcoding require an understanding of the roles played by DTS and PTS in video playback and synchronization, particularly when working with user-generated content. You may ensure the dependable operation of your video processing pipeline and the provision of top-notch material to your viewers by taking care of non-monotonic DTS values.
Decoding time stamps (DTS) and presentation time stamps (PTS) are two possible features of packets from the stream. H.264 (AVC) and H.265 (HEVC) like codecs store a video in three kinds of frames: ’ I ’ frame, ‘ P ’ frame, and ‘ B ’ frame.
• I: frames contain a full image.
• P: frames depend upon previous I and P frames and are like diffs or deltas.
• B: frames are the same as P frames but depend upon information found in frames that are displayed both before and after them.
When B-frames are not used with transcoding, PTS and DTS are the same but the problem arises when B-frames are used. Usually, the decoded frame contains PTS and DTS information inside the packet but we need the PTS of our newly decoded raw frame, so we know when to display it.
This warning is thrown when FFmpeg finds the decode sequence timestamps associated with samples or frames are not increasing monotonically. Usually, FFmpeg takes care of these kinds of issues on its own.
With the help of FFmpeg, you can also re-generate PTS for a video using -fflags genpts but usually, it does not solve the issue in my experience.
If your video plays fine then you should not worry about these kinds of warnings.
When playing back multimedia, non-monotonically increased DTS can result in a number of problems as follows:
Frames presented out of order may cause visual artifacts and a disorganized look.
Desynchronization of the audio and video signals can occur when frames are not presented in the correct sequence, causing a mismatch between the audio and the corresponding video frames.
To preserve synchronization, some multimedia players may occasionally lose frames, resulting in the loss of visual data.