Fraunhofer FOKUS, FAME | FAMIUM Deep Encode

Contact Person

Daniel Silhavy

Project Manager

Business Unit FAME

+49 30 3463-7680

FAMIUM Deep Encode

What is it about?

Video streaming content differs in terms of complexity and requires title-specific encoding settings to achieve a certain visual quality. A classic “one-fits-all” encoding ladder ignores this characteristics and applies the same encoding settings to all video files. In the worst case this leads to a waste of bandwidth and storage, quality impairments, and a bad user experience. Our title-based encoding solution has the potential to significantly decrease the storage and delivery costs of video streams while improving the perceptual quality.

Title based encoding

The efficiency of a video-codec is closely linked to the spatial and temporal redundancy of the input video. A video that contains lots of movement and scene changes (e.g sports and action movies) is much “harder” to encode, than a video in which most parts are redundant or slowly changing over time (e.g animated movies). Hence, different types of content require different bitrate settings to achieve a certain quality. As a result, applying a title-based encoding-ladder introduces major advantages compared to the classic approach of a “one-fits-all” encoding ladder.

Standard per-title encoding solutions identify the optimal encoding settings for a single asset in a complexity analysis step, prior to the actual production encodes. The most common approach to determine this complexity is by performing multiple test encodes. Based on the resulting bitrate/quality (PSNR, SSIM,VMAF) value pairs, content specific bitrate/resolution ladders are derived and applied. The major downside of such a solution comes with the fact that the test encodes are very computational heavy and not applicable in a live video streaming scenario.

FAMIUM Deep Encode

The FAMIUM Deep Encode tool leverages different machine learning techniques to avoid the computational heavy test encodes. Examples of algorithms in this area include decision trees ( e. g. through Boosted Decision Trees) and Deep Neural Networks (through Recurrent Neural Networks or Deep Convolutional Networks). By constant validation and retraining of our models the accuracy and efficiency of the algorithms are improved with each input and the entire system learns on its own.

Fraunhofer FOKUS

Key facts

Title-based encoding solution for live and VoD content leveraging machine learning techniques like Decision Trees and Deep Neural Networks
Constant validation and retraining of the machine learning models to improve the accuracy and efficiency of the algorithms
Calculation of video quality metrics like PSNR, SSIM and VMAF
Detection of complex metadata like scene changes, object classification and labeling for each video
Codec agnostic approach, can be applied to H.264, H.265, VP9, AV1 etc.

Related Projects

The FAMIUM Packager is leveraging the our per-title encoding solution to adjust the bitrate ladder before transcoding.
Our end-to-end ad insertion solution for MPEG-DASH and HLS benefits from content specific encoding settings
The MPEG-SAND based Streaming Analytics framework uses the VMAF and PSNR values to calculate a session specific QoE score

More About this Subject

Watch the "AI-Powered Per-Scene Live Encoding" presentation by Anita Chen at the 2020 W3C Workshop on Web and Machine Learning