Part – 1: Achieving conformance through Video Intelligence & Generative AI
Audiences today have an international content palette. The worldwide success of international titles like “Parasite” and Spanish shows like “Money Heist” only confirm that content now traverses global boundaries. As OTTs go global and studios and sporting leagues launch their own OTT platforms, the need for technology that localizes this volume of content will increase at an enormous scale. With timelines that can put distribution and post-production teams under undue pressures and stringent and diverse global distribution norms for content, this is perfect use case for Artificial Intelligence (AI).
Today media companies create content in over 30+ languages, and the list increases with rollouts in new territories. AI provides a competitive advantage by eliminating the redundancies in content localization workflows while enabling teams to focus on the job’s necessary creative elements.
What are the challenges in localizing content?
Whether you are a broadcaster or a renowned brand like Pepsi, the challenges to repurpose content in multiple languages are creative-steep and effort-intensive. When media and entertainment companies look to distribute a TV show or a movie worldwide, they broadly face four kinds of challenges.
- Conformance – frame-level alignment of audio, video, subtitles/cc.
- Compliance – platform distribution requirements, audience ratings, edits of explicit portions, product placements
- Translation & Dubbing – dubbing & voiceovers, generating subtitles & closed captions)
- Remastering & Upscaling – (Improving content quality, colorization, upscaling low-resolution content, and preventing video degradation)
In this 4-part blog series, we will discuss each of the challenges in detail and how AI is solving them.
The scope of Part #1 is on how AI helps in achieving Conformance.
What makes conformance a tricky challenge?
Conformance includes achieving frame-level accuracy between video, audio and text components of content. Different content providers and content intermediaries have their own standards for content to be acceptable. A study shows that almost 70% of submitted content gets rejected because of non-conformance, picture quality issues, errors, or lack of local context in creating the content.
The first task is to align the video content with the audio. Spoken English is concise to align with video in post-production but is a manual task nonetheless. However, mapping the video content to languages like Spanish, French, or Russian is tricky even for advanced editors who use Avid Media Composer or Adobe Premiere.
This gets complicated when editors come across more extended and more advanced sentence structures – say Spanish. For instance, when an English video needs to be synced with Spanish, the exact verbiage in Spanish is longer and, of course, takes more time to say. If the video content is one minute or more, this can cause the video content to drift ahead of the audio. Basically, the content viewer sees the video content before the audio is spoken or completed, causing dissonance.
Another challenge is to align the lip-sync of the speakers along with the flow of the video (Note – not with spoken words. But, we will solve that too).
Hence, the audio content matches what the viewer should be seeing. These challenges make content sharing via OTT a tedious and expensive process. The editorial teams should have high expertise with editing audio tracks, manipulate lip-syncs, and slow/speed up the video playing on the timeline. This is a costly attempt both in time to market and paying the editor(s).
During localization, sub-clips or proxy versions of the content are created, including edits for compliance, programming rules, duration change due to language, etc. For distribution on OTT, the audio versions need to be perfectly synchronized to a master video. We can eliminate simple items like pauses or overruns during the AI process, which happen when audio tracks are misaligned.
Audio conformance
AI can learn from the editorial metadata and remove redundancies in alignment operations. The Video Intelligence solution can pinpoint areas of misalignment between audio and video without having editors go through the entire content. The benefits of such a solution are two-fold.
- The solution frees up the creative bandwidth of editors by 4x & improves itself through “Active Learning.”
- AI provides the speed & accuracy needed for large scale content localization operations. Human errors & mistakes would get the content rejected by OTT Delivery platforms, and orders can be lost in the programming rotation quickly without fast edits.
Subtitle conformance & closed captioning
Closed Captioning & Subtitling is another complex process and is often riddled with flaws in sentence construction or precisely timing the captions to audio/video. Using AI, the captions can be automatically created or aligned with perfect synchronization and complying with the FCC specifications.
Deep-fake enabled dubbing
Here is where things get really interesting with Generative AI capabilities. Using deep-fakes, you can confirm the character’s lips to conform with the spoken words. This is the next level of customization, solving the lip-sync problem, which was not possible earlier.
Deepfakes-enabled Localization automatically edits audio stems as per localization guidelines for various applications such as marketing, compliance and dubbing.
Recently Malaria must die came up with a deep-fake campaign featuring ace footballer David Beckham for spreading awareness in nine different languages.
What does a state-of-the-art content localization solution look like?
- Infrastructure agnostic, whether it be on cloud & prem
- Customizable to different platforms & distribution formats
- Actively learns from metadata generated through editorial operations
- Doesn’t feel like AI
Quantiphi, with its Video Intelligence and Generative AI capabilities, offers multiple modular sets of services that can fit seamlessly into existing content operations. We intend to make Localization easier by eliminating redundant functions so that teams could focus their efforts on the creative and nuanced aspects of their work.
To learn more about Quantiphi’s Content Localization capabilities, email me at patrick.murphy@quantiphi.com.
Contributed by: Sankalp Chaudhary & Niraj Nishad