Sergio Grce
CEO, iSIZE
The amount of video content being distributed is only going in one direction – and that is up. Over half the global IP video traffic (56.8%) will be HD and around a quarter (22.3%) will be Ultra HD by 2022 according to Cisco. This demand for high resolution video inevitably requires a trade-off somewhere along the line - either in terms of bandwidth or to the end user experience. Higher resolution video, which consumers increasingly expect as standard, also typically means higher bitrates, which can result in slow starts, video buffering and high content delivery network (CDN) and storage costs. This is bad news for the viewer and bad news for the content provider.
The continuing surge in online media consumption means our industry faces two pressing challenges. First, there is unprecedented stress on network infrastructures worldwide, which not only creates content delivery bottlenecks, but also affects how content can be distributed efficiently to the ever-growing numbers of viewers. Second, this rapid rise in content consumption and delivery also has a huge impact on the industry’s environmental footprint.
In the perennial drive to balance efficiency and capacity, interest in the perceptual optimisation of video – in other words, the processing of digital video streams to deliver the uncompromising quality that users expect without a simultaneous uptick in bandwidth – is rising. Traditionally, the world of digital video has looked to compression technology to address these issues, working to increase the efficiency and sophistication of the codecs it uses – but this brings much higher levels of complexity and is highly processor-intensive.
We are now facing a situation where the increase in video encoding complexity is outpacing Moore’s Law. Even with more GPUs and CPUs capacity to encode video content, the sheer volume of content being produced - and watched - means we will very quickly outstrip the compute cycles available. In parallel, the carbon footprint of the internet is estimated to be greater than that of the aviation industry and is something we need to address.
As a company, we believe that the only way we will meet the growing demand for online video, reducing processing, energy, and storage requirements is through disruptive innovation for video streaming. For us this takes the form of new deep perceptual pre- and post-processing, encoding and delivery tools that are device-aware and cross-codec compatible.
We are laser focused on helping customers solve the challenges they face, and we are working – through our own R&D efforts, as well as through projects such as the SEQUOIA R&D project partnership – towards this aim. Deep perceptual optimization of video streams is a key focus for us as a way of reducing the bandwidth required for equal quality, and iSIZE has built up extensive expertise in this domain.
A unique approach for an urgent challenge
ISize believes that the increasingly urgent challenge of finding trade-offs between the various metrics, between bitrate and perception and between more content and the need to lower the environmental impact – all while managing processing and encoding complexity - requires a unique approach. Instead of relying on more complex codecs and greater GPU/CPU capacity, we believe the more sensible route is to reduce the bandwidth needed for high-quality video streaming. We have directed our patent-pending artificial intelligence (AI) features and machine learning, combined with the latest advances in perceptual quality metrics to this aim. By reducing the bits required for elements of the image that perceptual metrics tell us are not important to human viewers, our technology innovation can deliver perceptual quality that is optimally balanced against encoding bitrate.
If we are to make real headway as an industry, the most effective and efficient approach is to implement a server-side deep perceptual pre-processing enhancement that enhances details of the areas of each frame that affect the perceptual quality score of the content after encoding. In this way, we do not change the encoding, packaging, transport or decoding mechanisms – unlike solutions such as LCEVC. Furthermore, we can be fully compatible with any encoding, streaming and playout device with zero modifications. By using a method that is cross-codec applicable, codec-agnostic, and optimizes legacy encoders like AVC, but also HEVC, AV1 and VVC, we no longer need to know the encoding specifics of each encoder – and so can remove an added layer of complexity.
Most pre-processing solutions use sharpening techniques to deliver perceptual optimisation. iSIZE comes at the problem from a different angle; we maintain the perceptual characteristics of the source and eliminate the need for multi-pass encoding and in-the-loop integration used by many other optimization tools. We have created a single-pass, pre-processing solution that needs no metadata or integration with the subsequent encoding engine(s) and delivers significant gains in quality.
Deep learning for optimized results
iSIZE challenges the accepted norms within the video delivery industry by placing our technology before the encoder. We also ensure that our solution does not depend on a specific codec, and it optimises both for low-level metrics like SSIM (structural similarity index metric), as well as for higher-level (and more perceptually oriented) metrics like VMAF, Apple’s AVQT metric or AI-based perceptual quality metrics like LPIPS. In fact, we are able to deliver average bitrate savings – compared to the same encoder and codec - in excess of 20%. On top of this, our technology has been designed in a way that does not break coding standards; this means it can easily be used in existing distribution chains and with existing client devices without causing disruption to customers’ workflows. Thanks to the single-pass approach and agnosticism to coding standards, we are also able to ensure easy deployment on custom hardware or high-performance CPU/GPU clusters.
In a nutshell, we have created a methodology that delivers significant savings in two key areas. First, by reducing the bitrate required from a standard codec to deliver a certain quality level. And second, if bitrate saving is not the only goal, our technology can be used to make the actual encoding much faster - up to 500%, or even faster in the case of VP9, AV1 or VVC encoding.
Leveraging our knowledge and expertise in AI and deep neural networks, we have elegantly answered one of the growing challenges faced by the industry: sustainable distribution of Ultra High Definition content, while limiting the impact of video on internet traffic and reducing distribution costs.
At iSIZE we believe that by proactively reducing energy consumption at all stages within the media value chain, this type of innovation can make a difference to every aspect of media distribution, delivering benefits for the whole sector. With efficiency a key buzzword and environmental ramifications a rising concern, the need to reduce energy consumption and eliminate complexity is front of mind for anyone who delivers content. We are already working with customers to roll our technology out in several vertical sectors, including gaming, social media, and entertainment streaming and will be making announcements in the months ahead.