大象传媒

Machine Learning for Video Coding Optimisation

大象传媒 R&D's investigating Rate-Distortion Optimisation for faster, more efficient video compression.

Published: 8 April 2020
  • Saverio Blasi

    Saverio Blasi

    Lead Research Engineer
  • Marta Mrak

    Marta Mrak

    Lead R&D Engineer

Fast and efficient video compression is vital for the 大象传媒, and we鈥檝e written previously about how and why 大象传媒 Research & Development is using machine learning (ML) to optimise this process.

As part of this research, we are also investigating Rate-Distortion Optimisation (RDO) techniques for estimating the quality of a delivered video frame given a specific number of bits, using Convolutional Neural Networks (CNNs).

In addition to compressing video in as few bits as possible, the challenge of video encoding is to perform compression to provide the best possible viewing experience. This process gives the best results if it can estimate what the quality of a delivered frame would be, given a specific number of bits.

This estimate can be achieved using Rate-Distortion Optimisation (RDO) techniques, so that we can select appropriate compression parameters. This enables delivery of the highest quality content for a given bit rate. To help streamline the process, we are investigating a novel approach based on machine learning.

What we're doing

Our aim is to achieve an efficient RDO design by focusing on the most bit-consuming part of the video: intra-predicted frames. Here, the compression comes from reducing redundant spatial information (neighbouring pixels in a frame which are typically similar).

Compression of other frames, i.e. inter-predicted (motion compensated) frames, typically results in fewer bits than intra-prediction as neighbouring frames are very similar.We have focused on addressing the challenge of deciding how many bits to spend on coding a given frame. This helps us to deliver the best video quality for a given bandwidth, without an excessive reduction of bits that could result in unnecessarily poor picture quality.

Our approach

We are using machine learning that involves neural networks to estimate the number of bits needed to represent a compressed frame, and associated frame quality at that compression level. Typically, these values would only be known once a frame is encoded. Our approach aims to achieve faster speeds by estimating this for multiple compression (RDO) parameters, without actual encoding.

More specifically, our estimation of RDO parameters is achieved . CNNs have become increasingly popular in recent years for their performance in tasks such as video classification, segmentation and super-resolution. In our method, one CNN is used to estimate the number of bits, and the other is used to estimate the distortion (reduction in quality) that would be obtained after compressing an intra-frame.

CNN diagram

The first CNN (CNN #1) takes an original frame as input and estimates how many bits are needed to save it at a certain compression setting (i.e. quality level, defined by Quantisation Parameter, QP). The second CNN (CNN #2) takes the same frame and produces estimated distortion maps, i.e. the pixel-wise difference between the original frame and compressed frame.

With this approach, the estimated results are close to those which would be achieved with real encoding, thanks to accurate predictions enabled by the CNNs. Overall, this means that CNN-based estimation can help video compression in choosing the best compression parameters.

The open-source software is now available in the 大象传媒 GitHub:

More details about our approach can be seen in the paper  presented at the IEEE International Conference on Visual Communications and Image Processing (VCIP 2018).

What's next?

Our initial results demonstrate that new machine learning algorithms can be used to create advanced video coding tools. We are actively working towards further optimisations by applying ML to discover new compression solutions, especially those that enable better prediction of pixels. We are also researching ML and AI tools that are interpretable, explainable and predictable, that will allow us to create robust and simpler visual data processing solutions.

This work was co-supported by the  through an  in collaboration with the  .

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: