In Conversation with iSIZE

We are joined by Sergio Grce (CEO) and Yiannis Andreopoulos (CTO) from iSIZE to discuss how they have recently raised $6.3 million in funding in order to tackle the environmental impact of streaming. We hear more about the impact of streaming and the challenges the industry is facing along with how iSIZE are helping customers to tackle these challenges and to help reduce their carbon footprint.

Revolutionizing OTT And E-Commerce Through Computer Vision

This blog post by Prachi Jain is the winning entry in Quantiphi’s February’21 QuriousWriter Blog Contest.

Think of the last movie you streamed. There must have been clothing, an accessory, or a food product that you saw on screen and couldn’t resist the temptation of buying. For instance, the elegant gown that Emma Watson wears in the movie Little Women or the watches that Robert Downey flaunts on the screen.

Impulsive buying behaviour is something all of us have indulged in at some point. In fact, impulse buying is considered to be one of the significant factors in boosting sales volume in the retail sector.

Driven by the curiosity that is stimulated by various triggers such as advertisements and online content, we are often tempted to look up the price and other details of the things that catch our attention. Adding these things to our wish lists gives us a feeling of gratification, if we cannot purchase the object of desire.

Matching your purchase intent within media content is a revolutionary concept. While most of the advertising may seem tone-deaf and interruptive, here, a brand is included throughout the digital content and is likely to grab your attention.
This opens up multiple possibilities for hyper-commercializing the OTT media streaming platforms with e-commerce. Retailers can offer the viewers with instant buying options as they watch their favourite shows.

Hyper-commercialization enables advertisers to influence consumers’ attitudes, awareness, and behaviours as they engage with the media. The holistic motivation behind the hyper-commercialization of content is to bring products closer to the purchase intent.

OTT Media: Next frontier of content consumption

Over-the-top (OTT) Platforms refer to media streaming services offered via the internet directly to its viewers. OTT bypasses traditional media channels like cable, broadcast, and satellite television platforms. At present, , streaming services form an integral part of the media, entertainment and gaming ecosystem with capabilities to scale up and serve the global and niche audiences alike.

Advertisers’ Interest in  OTT platforms

OTT offers many  benefits over traditional media. Research shows this segment is highly receptive to OTT advertising and avoids skipping the ads. OTT enables viewers to watch their desired content at their convenience, and  on the preferred devices. It also helps in data transparency as every data point of the user is known. Consumption patterns on OTT devices offer valuable  information for targeted advertising.

These channels make microtargeting possible because of the deep insights about audience segments and their viewership behaviour. OTT is similar to the traditional media in terms of media format publishing and the placement of ads which offer additional opportunities to target online consumers.

Application of Augmented Reality in Retail Business Concept for Discounted or on Sale Products

Revenue models for OTT

OTT-business earns its revenue via various revenue models such as  subscription model, advertisement model, hybrid model, and transactional model. Introducing different shops while watching the content has the potential to create a significant size revenue for the OTT media services.

Following the trends in the OTT space, it is certain that technological advancements will empower the media industry to a great extent. 5G has to offer a speed of 100x, low latency, smooth 4k, and VR streaming, and decreased buffering. With the rise of 5G, the transition towards video streaming will  surge.

With the outburst of the pandemic, there has been a significant rise in content consumption over OTT due to easy accessibility, higher availability of time, lack of a social life, and inaccessibility to outdoor activities.

During the COVID-19 lockdown phases,  subscription-led OTT services like Netflix, Disney+Hotstar, Amazon Prime launched new movies on their platforms to gain maximum viewership and subscriptions revenue while offering  interesting entertainment options to families.  While viewers prefer the platform with not only the best content but also with the best UX features, such as  easy navigation, considerable font size, color, and readable attractive thumbnails, OTT space is witnessing fierce competition  in impressing consumers with their UX features.

How  computer vision is transforming OTT

A video enables the visibility of various objects in the scene. Viewers are attracted to the vibrant clothing, jewelry, furniture, food, electronic appliances, and accessories, while consuming content that can be easily identified by computer vision. If they are conferred with an option to buy these products at the time when they like them, they will be more  likely to purchase them. This will certainly reduce the drop rate due to the inaccessibility of these items. Integrating e-commerce options with an OTT app and aligning these options in relevance to the scene viewed will reduce the decision-to-action time.

The metadata in the digital content is linked to the brand’s e-commerce site, pointing to the exact item for checking  its availability. The other options include  the ability to provision with alternative similar brands and/or products or google search for the products identified via computer vision.

Computer vision algorithms will help us detect objects in the scene and either fetch the data from the metadata with options to purchase or run an image search on those objects, and store the purchase link on the same. The latter, however, is not an accurate option. The total possible exhaustive combinations are –

  • Same brand and product
  • Same brand but different product
  • Same product but a different brand
  • Similar product and similar brand
  • Similar brand but different product
  • Similar product but a different brand

Impact on stakeholders

Hyper commercialization comes with various advantages for all the stakeholders – consumers of OTT, media houses owning OTT, and content creators in the production houses. Customers gain the ease of buying their favorite items at the inception phase, get more options to interact with the media, spend consumption as well as shopping time, and enjoy a live purchasing experience. Thus, these customers save a lot of time in researching for the product and gain the satisfaction of purchasing the original and authentic products.

The unsurpassed joy of shopping while watching the content is that engagers get to buy items that are used by their favorite celebrities and feel emotionally connected to it. The emotional appeal drives more traffic to watch the content and increases sales for the brand being endorsed within that content.

Production houses will have the opportunity to obtain sponsorships from the brands and thus, reduce the production cost. Moreover, they can attune their content to gain maximum traction without altering the storyline, emotions, and intent of the script. This also gives an opportunity for brands to create their own content with an interesting storyline. Hyper-commercialization benefits media houses by generating additional revenue from its sales, the possibility of a commission from brands for its technical integration, publishing strategy, offering visibility to content consumers, and bringing products close to them.

However, hyper-commercialization doesn’t always simplify things. The concept needs to be technically improved to avoid adding on the media file size due to metadata size. The user interface must not be heavily loaded due to e-commerce options, and the shops can create metadata load detection accuracy delays if enabled in the motion scene.

Customers may either face issues in watching content if this feature is not diligently implemented or may not use these features at all if it is hidden in the tree-like option within the settings. Hence, smart UX with an easy-to-understand flow that strikes a balance without compromising the viewing experience is indispensable. OTT companies also need to ensure that the links in the metadata are authentic despite the commercial pressure to promote other brands over the original brand. Production houses face the risk of losing authenticity and in turn, customer loyalty.

Conclusion

Hyper-commercialization in the OTT space is bound to gain importance in the coming future.  Spectacular advancements in OTT have just begun. Creating an ecosystem where all the  stakeholders are represented, especially the integration of e-commerce and media, presents  tremendous  scope for  growth.

Content Localization For Media Industry – The AI Way

Part – 1: Achieving conformance through Video Intelligence & Generative AI

Audiences today have an international content palette. The worldwide success of international titles like “Parasite” and Spanish shows like “Money Heist” only confirm that content now traverses global boundaries. As OTTs go global and studios and sporting leagues launch their own OTT platforms, the need for technology that localizes this volume of content will increase at an enormous scale. With timelines that can put distribution and post-production teams under undue pressures and stringent and diverse global distribution norms for content, this is perfect use case for Artificial Intelligence (AI).

Today media companies create content in over 30+ languages, and the list increases with rollouts in new territories. AI provides a competitive advantage by eliminating the redundancies in content localization workflows while enabling teams to focus on the job’s necessary creative elements.

What are the challenges in localizing content?

Whether you are a broadcaster or a renowned brand like Pepsi, the challenges to repurpose content in multiple languages are creative-steep and effort-intensive. When media and entertainment companies look to distribute a TV show or a movie worldwide, they broadly face four kinds of challenges.

  • Conformance – frame-level alignment of audio, video, subtitles/cc.
  • Compliance – platform distribution requirements, audience ratings, edits of explicit portions, product placements
  • Translation & Dubbing – dubbing & voiceovers, generating subtitles & closed captions)
  • Remastering & Upscaling – (Improving content quality, colorization, upscaling low-resolution content, and preventing video degradation)

In this 4-part blog series, we will discuss each of the challenges in detail and how AI is solving them.

The scope of Part #1 is on how AI helps in achieving Conformance.

What makes conformance a tricky challenge?

Conformance includes achieving frame-level accuracy between video, audio and text components of content. Different content providers and content intermediaries have their own standards for content to be acceptable. A study shows that almost 70% of submitted content gets rejected because of non-conformance, picture quality issues, errors, or lack of local context in creating the content.

The first task is to align the video content with the audio. Spoken English is concise to align with video in post-production but is a manual task nonetheless. However, mapping the video content to languages like Spanish, French, or Russian is tricky even for advanced editors who use Avid Media Composer or Adobe Premiere.

This gets complicated when editors come across more extended and more advanced sentence structures – say Spanish. For instance, when an English video needs to be synced with Spanish, the exact verbiage in Spanish is longer and, of course, takes more time to say. If the video content is one minute or more, this can cause the video content to drift ahead of the audio. Basically, the content viewer sees the video content before the audio is spoken or completed, causing dissonance.

Another challenge is to align the lip-sync of the speakers along with the flow of the video (Note – not with spoken words. But, we will solve that too).

Hence, the audio content matches what the viewer should be seeing. These challenges make content sharing via OTT a tedious and expensive process. The editorial teams should have high expertise with editing audio tracks, manipulate lip-syncs, and slow/speed up the video playing on the timeline. This is a costly attempt both in time to market and paying the editor(s).

During localization, sub-clips or proxy versions of the content are created, including edits for compliance, programming rules, duration change due to language, etc. For distribution on OTT, the audio versions need to be perfectly synchronized to a master video. We can eliminate simple items like pauses or overruns during the AI process, which happen when audio tracks are misaligned.

Audio conformance

AI can learn from the editorial metadata and remove redundancies in alignment operations. The Video Intelligence solution can pinpoint areas of misalignment between audio and video without having editors go through the entire content. The benefits of such a solution are two-fold.

  • The solution frees up the creative bandwidth of editors by 4x & improves itself through “Active Learning.”
  • AI provides the speed & accuracy needed for large scale content localization operations. Human errors & mistakes would get the content rejected by OTT Delivery platforms, and orders can be lost in the programming rotation quickly without fast edits.

Subtitle conformance & closed captioning

Closed Captioning & Subtitling is another complex process and is often riddled with flaws in sentence construction or precisely timing the captions to audio/video. Using AI, the captions can be automatically created or aligned with perfect synchronization and complying with the FCC specifications.

Deep-fake enabled dubbing

Here is where things get really interesting with Generative AI capabilities. Using deep-fakes, you can confirm the character’s lips to conform with the spoken words. This is the next level of customization, solving the lip-sync problem, which was not possible earlier.

Deepfakes-enabled Localization automatically edits audio stems as per localization guidelines for various applications such as marketing, compliance and dubbing.

Recently Malaria must die came up with a deep-fake campaign featuring ace footballer David Beckham for spreading awareness in nine different languages.

What does a state-of-the-art content localization solution look like?

  • Infrastructure agnostic, whether it be on cloud & prem
  • Customizable to different platforms & distribution formats
  • Actively learns from metadata generated through editorial operations
  • Doesn’t feel like AI

Quantiphi, with its Video Intelligence and Generative AI capabilities, offers multiple modular sets of services that can fit seamlessly into existing content operations. We intend to make Localization easier by eliminating redundant functions so that teams could focus their efforts on the creative and nuanced aspects of their work.

To learn more about Quantiphi’s Content Localization capabilities, email me at patrick.murphy@quantiphi.com.

Contributed by: Sankalp Chaudhary & Niraj Nishad

Intelligent Content Creation and Distribution With Artificial Intelligence

The science of artificial intelligence (AI) has become indispensable to the art of captioning. The rise of AI in captioning and transcription solutions can be attributed to many factors. However, the one that stands out is the rise of voice technology. Speech-to-text technology is the most rapidly emerging technology in the closed-captioning arena. The growth in the speech-to-text market is fueled by:

  • Growth in smart speakers and intelligent voice assistants on mobile phones and other devices
  • Increased government spending on education for the disabled
  • Rising number of people with learning challenges
  • Rise of elderly populations’ reliance on technology

The speech-to-text API market is expected to grow from US$ 1.6 billion in 2019 to US$ 4.1 billion by 2024, at a CAGR of 20.6%, as stated by the Markets and Markets report. The report also said that North America is expected to hold the largest market size in the global speech-to-text API market, while Asia Pacific is expected to grow at the highest CAGR. North America is also the highest contributor of revenue for the speech-to-text API market.

Based on the above figures, we know that speech-to-text is turning out to be the most significant factor for content creators and distributors when it comes to generating effective and accurate captioning. We have a very concrete example to show how AI speech-to-text and translation engines increase the speed of developing content and quality. We at Digital Nirvana realize we are at the cusp of the golden age of AI and machine learning.

With AI-enabled automatic captioning, broadcasters can create content and make it searchable, translate it into multiple languages enabling content localization where users across the globe can consume it. Speech-to-text engines generate quality metadata through translation, which is another example of natural language processing, and it has taken a massive leap in the past few years. AI captioning solutions can take spoken language and turn it into speech-to-text before converting it into any other language. That’s exactly what Digital Nirvana’s Trance does with a very high degree of accuracy.

You can check out our blog, where we have detailed how AI is transforming the closed-captioning landscape through our success story. At Digital Nirvana, we understand that AI elevates the value of content beyond mere translation and captioning. Our advanced speech-to-text engines generate rich metadata of speech-to-text, but our solution leverages AI to work beyond this, aka generating metadata using video intelligence.

<Download Trance brochure to know how you can leverage AI for your captioning workflows>

Our captioning solution, Trance, leverages AI to enable facial recognition and logo recognition, which is a massive boon for sports broadcasters. These broadcasters are obligated to follow FCC regulations to display a logo a certain number of times during live streaming of events and classify advertisements.  Our machine learning and AI workflows come into play, where we can take an ad and automatically figure out using speech-to-text, computer vision, and machine learning what that advertisement is about and whether it's a restricted or a free ad. This enhances the workflow tremendously and reduces the time to put an ad out into the market.

Let’s take a look at the critical features of Trance that content creators and broadcasters can leverage:

  • Web User Interface: Trance has a simple and intuitive user interface with user-specific access and customizations.
  • Transcription Page: This is where all the magic happens with the advanced speech-to-text engines equipped to handle various content types.
  • Automatic Formatting (Presets): Natural Language Processing (NLP) based on the formatting of grammar and proper nouns.
  • Pro Captioning Page: Advanced caption editing features like importing captions to sidecar files.
  • Text Localization/Translations: Trance supports generating transcripts in more than 30 languages and translates them into more than 100 languages.

Digital Nirvana leverages two decades of speech-to-text and knowledge management expertise to deliver greater productivity, shorter turnaround times, and improve both the speed and accuracy of the captioning process all in an easy-to-use interface. Contact us for a personalized demo of Trance, where our experts will take you through the solution.

Accelerate your post-production caption generation with Trance from Digital Nirvana

Do you know that AI is the primary driving force behind captioning and transcription solutions? Media organizations leverage AI-driven solutions to elevate post-production workflows. A leading tennis broadcaster in America was grappling with an inefficient and time-consuming captioning process. They leveraged Digital Nirvana’s Trance captioning application and reduced turned around time from hours to 30 minutes. Trance has exceeded our client’s expectations in several unique ways. Learn more about how we’ve positively impacted this organization.

Digital Nirvana’s Trance brings the AI advantage to your transcription, captioning, and translation workflows.

Convert raw video to compliant captioned content in 90 minutes using AI-driven Trance

Content consumption has grown exponentially in the past year fueled by consumers’ insatiable demand. This resulted in the growth in the number of OTT platforms providing excellent opportunities as well as challenges, for content providers. Their goal is to deliver high-value content tailored to the specific requirements of each platform, including multi-language transcriptions and compliant closed captions, and to do it as fast and efficiently as possible. Our client, a leading entertainment short-form news provider, chose AI power of Trance to tackle these challenges. Trance empowered our client to generate captions within 90 minutes of asset ingest and deliver fully compliant content, conforming to exact style guideline requirements.

Digital Nirvana’s Trance brings the AI advantage to your transcription, captioning, and translation workflows.

Delivering fast, accurate captions for VOD sports programming: The AI-enabled Trance story

The rapidly developing FCC guidelines that went into effect in 2017 posed new challenges for our client, one of the largest sports and new networks in the U.S. The company needed a solution that would provide FCC caption compliance while also producing captions in the shortest time possible, regardless of volume. They leveraged Trance, Digital Nirvana’s AI-powered captioning and transcription solution to generate highly accurate captions within 12 hours of original airing.

Digital Nirvana’s Trance brings the AI advantage to your transcription, captioning, and translation workflows.

Maximising Video Revenues through AI – On Demand Webinar

Artificial intelligence (AI) is transforming from a competitive advantage to an essential technology in the media sector.

In this webinar, you’ll discover how AI and machine learning can drive profitable growth for your organisation beginning now. Learn how leaders across business functions use AI to:

  • Engage customers
  • Wrangle big data
  • Improve efficiency
  • Maximise video revenues

Brought to you by Symphony MediaAI, a technology company with 30 years of media experience and deep roots in AI innovation.

How to secure e-learning content as piracy risks intensify

As e-learning plays an ever greater role in employee training and academic life, the value of the associated video content is soaring—but so, too, is its value to thieves.

Fortunately, e-learning producers and distributors can turn to cloud-based content protection solutions to minimize these threats. First, they must engage with a service that can cover all major DRMs and other piracy prevention techniques. Second, they must demand unfailing service performance and the lowest total cost of ownership (TCO).

Download white paper to discover how modern content protection technologies can safeguard your valuable e-learning services while giving you the lowest TCO.