Newsbridge AI Research Leading Advancements in Multimodal Speaker Diarization

Investing in Machine Learning Techniques for Audio Data Processing 

To top off a year of record-breaking R&D advancements, Newsbridge had the pleasure of welcoming Yannis, its new AI Researcher, who’s doctoral groundwork lies in cutting-edge signal processing, machine learning, computer vision and speech analysis methods.

More specifically, Yannis specializes in Speaker Diarization– a technology that Newsbridge recognizes as a significant advancement as applied to Multimodal AI (audio analysis being an extra modal to the multimodal family)!

Currently, state-of-the-art diarization relies solely on audio processing. To challenge the latest technology advancements, Newsbridge is now working with diarization as applied to other types of processing, i.e. video and other external metadata to achieve ‘next-gen diarization’. By applying diarization to advanced speaker and transcription detection capabilities, the company plans to be a step ahead of existing state-of-the art solutions.

Newsbridge believes this technology will be a crucial component applied to the platform in the years to come.

By leading the way in multimodal diarization research, we plan on improving the accuracy of our underlying technology. While simultaneously leveraging and cross-analyzing multiple modes- such as face, object and context detection- we are improving diarization, consequently putting us at the ‘research forefront’ of this state-of-the-art technology. The high-level objective is making the Newsbridge platform home to the most powerful Multimodal AI Indexing technology in the world.” – Frédéric Petitpont, Newsbridge CTO

Basic Speaker Diarization: 3 Steps to Determine the ‘Who Spoke When’ 

So let’s back up a bit- what exactly is Speaker Diarization, and what does the process look like? High-level, Speaker Diarization can be summed up as who spoke and when’. 

A visual representation of Speaker Diarization: The ‘Who Spoke and When’ technology

Let’s break basic Speaker Diarization down into 3 steps:

1) Homogeneous Segments: In Speaker Diarization, an audiovisual file is first broken down into bite-size homogeneous segments to better analyze various components. For reference, a segment is considered homogeneous when it contains the voice of only one person. This means that each extracted segment must correspond to a particular individual.

2) Segment Clustering: Next, the segments are clustered- meaning that all segments associated with a particular speaker are grouped together and annotated accordingly. To clarify, this means multiple segments from a single audiovisual file are broken down and attached to the corresponding speaker, with automated tags (i.e. speaker ID,  speech time, pauses etc.)

3) Annotation + Identified Speech Turns: We are then presented with a fully annotated audio file, with identified speech turns of the different corresponding speakers.

Newsbridge Speaker Diarization Research: Who Spoke What and When

Now that we know how basic speaker diarization works, relying on the single mode of audio- let’s look at what Newsbridge is doing to improve this cutting edge technology.

When applied to the Newsbridge platform, there’s an important component added to the mix… ‘who spoke what, and when’.

So what does that mean?

Because Newsbridge already leverages built-in speech-to-text technology as part of its AI-powered media valorization platform offering, speaker diarization can eventually be improved by taking into account what individuals are actually saying (using NLP techniques applied to the auto-generated text from the transcription) and can better match speaker turns, resulting in higher quality speech-to-text output. As this is internal research at Newsbridge, the implementation is set to debut in the next couple of years.

Another important mode that will improve Speaker Diarization in reference to Newsbridge is the who. Since the platform also analyzes video, the diarization pipeline will also be able to take into account any publicly known (or priorly tagged speakers via facial recognition). If the speaker is then recognized, the platform takes into account Wikidata information and can also detect if he or she is a native speaker- improving diarization.

Future Implications: How Will Professionals Use Speaker Diarization?  

Once applied to speech-to-text technology, Speaker Diarization has the potential to revolutionize a number of industries who work heavily in the automatic speech-recognition space. By adopting this ‘who spoke when’ technology, the possibilities are seemingly endless.

For example, Speaker Diarization helps:

1) Calculate speaking times

This is especially relevant for high-profile political debates in which strict speaking turn guidelines are given as part of Public Speaking Ethics. In this way, post-examiners can analyze and report speaker length, an important component of conversational analysis. It is also a great tool to analyze speaker parity among sexes- ensuring men and women are (quite literally!) equally heard.

2) Automatically obtain subtitles associated with the person speaking

Currently, for post-production teams there’s no easy way to automatically align subtitles with speaking turns in a video. Most of this work is done manually and oftentimes prone to errors (due to overlapping voices) and long turn-around times. By implementing Speaker Diarization applied to speech-to-text detection, production teams can automate and improve this drawn-out process, auto-generating subtitles based on diarized speaking-turn segments. Improved quality in subtitles is also a major win in terms of Accessibility, further assisting those who are deaf or blind (if vocal assistance is activated).

3) Better understand overlapping speeches

This can be applied to various examples- especially for journalists who are analyzing hundreds of media assets on any given subject (i.e. sports conferences, interviews, congressional speeches, debates, etc) in which there are multiple speakers overlapping one another. In order to publish their story as quickly as possible, journalists need to work with correct transcription for quotes and usually end up transcribing by ear, or if they do have a speech-to-text tool, many times the output quality is questionable, at best.

4) Identify Audio Patterns

This could be a game-changer for corporate marketers when working with and searching through their base of existing media assets. For example, after indexing media based on detected ads, music, jingles and applause- a user can search via audio (i.e. I can’t remember the name of the ad, but I need the video with that ‘Rise & Shine’ song).

5) Match Voice Fingerprints 

This could be especially useful for documentalists or archivists who are tasked with creating a digital archives library and analyzing individual video clips which may have ‘off-screen’ entities which are only heard and not seen. In this case, applied Speaker Diarization can assist with matching voice fingerprints to known individuals, simplifying the archive analysis process. Matching voice fingerprints is also effective for journalists combatting the war on deep fake, allowing them to ensure correct speaker identity.

Final Thoughts: Continued Research for Commercialized Application   

As applied to Newsbridge’s Multimodal AI technology, Speaker Diarization is an important component in improving algorithm robustness. Depending on the quality of media assets, certain low resolution videos and photos will be better analyzed and thus indexed, due to improved speech-to-text functionality.

More generally, as a technology that relies on both supervised and unsupervised Machine Learning Techniques along with Deep Learning and Agglomerative Clustering models, Multimodal Speaker Diarization is not yet a mainstream solution due to its complex nature. This is why continued research remains top priority for those interested in leading the way with commercialized application.

Newsbridge Plans Expansion with €4M in Investment led by Elaia, Signs with TF1 Group, FFF and Asharq News

Newsbridge, a cloud-based media valorization platform powered by AI, announces today its next round of investment, totaling €4 million, along with an impressive list of recently signed clients including: TF1 GroupFrench Federation of Football (FFF), and recently launched Asharq News that has an exclusive content agreement with Bloomberg Media

Newsbridge Co-founders: Philippe and Frederic Petitpont

A new round of financing led by Elaia

The company’s latest round of €4 million in financing was spearheaded by European VC Elaia Partners along with the participation of investors, such as the co-founder of Ateme, Dominique Edelin and BPIFrance. All of this to accelerate Deep Tech R&D and scale its international strategy with a newly achieved product-market fit.

Co-founded by Frédéric and Philippe Petitpont in 2016, Newsbridge aims at developing the most advanced Multimodal Indexing AI technologies on the market. Leveraging Computer Vision and Deep Learning along with applied cognitive technologies, the platform helps Media and Sports Rights-Holders auto-index, manage and monetize voluminous amounts of audiovisual content.

For 2021, Newsbridge plans to continue its focus on Deep Tech R&D related to media valorization via Multimodal AI, along with international recruiting efforts. At the same time, the new product-market fit has enabled the company to kick-start international scaling initiatives and enhanced market penetration, focused on the sports, news and media sectors.

European VC Elaia Partners who led the financing round says:

“This past year, the pandemic induced rapid acceleration in digital transformation across the board. During this unprecedented time, Newsbridge’s solution and technology proved more relevant than ever. Within a short time period, the company achieved its product-market fit and successfully expanded its client portfolio.”

– Anne-Sophie Carrese, Partner at Elaia

This announcement comes on the wings of a successful past year in which the Covid crisis amplified the media and sports’ industries’ need for cloud-based and remote-friendly solutions, accelerating their digital transformation initiatives.

New clients trust the Newsbridge Platform

In 2020, the company focused on key product releases to amplify its Media Asset Management (MAM) solution with built-in AI indexing and content monetization features. International business development strategy was prioritized as well.

All of this proved to be the ideal configuration in establishing the company’s newly acquired product-market fit and acquiring 3 major clients including:

  • TF1 Group– the leader in free-to-air privately-owned television in France,
  • the French Federation of Football– France’s leading Sports Federation and founding member of FIFA,
  • Asharq News- the 24/7 Arabic multiplatform news service reaching across the Arab world and beyond.

“We’re focused on strategic international business and augmenting our capacity to respond to our incoming international leads. We look forward to building on our vision of offering the most advanced media valorization platform for media and sports right holders.”

– Philippe Petitpont, Newsbridge CEO

About Newsbridge

Newsbridge is a cloud-based platform for next gen media valorization offering Multimodal Indexing via Artificial Intelligence (AI).

Taking into account facial, object and scene recognition with audio transcription and semantic context, Newsbridge provides unprecedented access to content. Whether it be media logging, archiving, monetizing, or investigative research- the solution allows for smart media asset management.

Today our platform is used by TV Channels, Press Agencies, Major Sports Federations, Production Houses, Journalists, Editors and Documentalists to boost their media value chain.

About Elaia

Elaia is a European top-tier VC firm with a strong tech DNA. We back tech disruptors with global ambition from early stage to growth development. For the past 18 years, our commitment has been to deliver high performance with values.

We are proud to have been an active partner in over 70 startups including success stories such as Criteo (Nasdaq), Orchestra Networks (acquired by Tibco), Sigfox, Teads (acquired by Altice), Mirakl (valued $1.5B in Series D) and Shift Technology.

In Conversation with TMD

We talk to Tony Taylor, Executive Chairman at TMD about a number of topics around the company including:

  • TMD’s MAM system Mediaflex and how it enables you to Acquire, Manage and Deliver content
  • Global Customer Case studies including YLE, Wildbrain, Discovery US, National Film and Sound Archive of Australia, and DPS
  • The benefits of workflow orchestration and how important it is for a business to be able to implement automated workflows
  • TMD’s product Coeus, an intuitive cloud-based service for short and long term storage, management and protection of media content

Journal 116

IABM Journal 116

Published Q1 2021


Journal is the IABM Magazine released every quarter that covers hot topics within the industry. It is distributed widely throughout the industry.

In Conversation with Primestream

We are joined by Claudio Lisman, President & CEO at Primestream to discuss how the coronavirus pandemic has fundamentally changed drivers in the market and how this is influencing the way Primestream operate.

Claudio also tells us how these changes will effect them in the long-term and what developments, trends and technologies they are working on and what problems they will solve.

In Conversation with GB Labs

We are joined by GB Labs Chief Product Officer Howard Twine for an overview of their Unify Hub, looking at the technology and finding out how it can unify and collaborate remote production workflow.

Virtual Playout in the Cloud

Today's rapidly changing broadcast marketplace necessitates faster channel launch and much more flexible content management, all while dealing with a massive increase in 8K, HD, and UHD data requirements and the ongoing need to reduce infrastructure and production costs. The white paper reflects specifically on these requirements by investigating the possibilities provided by an open broadcast content management and playout scheme, which, unlike all other broadcast technology, is completely deployed in the Cloud.

The virtualized cloud playout platform would eliminate the need for capital technology investment and the purchase and maintenance of costly broadcast playout hardware, allowing TV channel owners around the world to outsource almost all of the technological equipment historically used for broadcasting.

The platform virtualizes the channel management process, which is managed from a standard enterprise device from a secure internet link secured by a firewall. Program playout can be automated to the degree necessary while also maintaining the ability to ingest live content.

The cloud-based interactive playout allows independent broadcasters to launch new television channels on short notice, whether as a permanent extension to an existing bouquet or as a supplemental program stream covering a temporary event. It is adaptable to a wide range of deployment models, allowing for stable service across baseband, hybrid IP, and virtualized, cloud-based networks.

The cloud-based virtual playout allows independent content providers to launch television networks without the high start-up costs that are traditionally associated with such projects, while still offering unrivalled versatility in format handling and transcoding. It supports almost every technological norm in today's rapidly changing broadcasting market, whether for cable, satellite, or internet streaming, live, near-live, or catch-up.

In Conversation with CGI

We are joined in this interview by Michael Thielen, Vice President for Radio solutions at CGI to discuss Remote Radio Production and how it has changed in the last 12 months.

We look at the challenges that broadcasters have faced during recent times and how CGI has been able to help it’s clients to stay productive even with most of their staff members working remotely.

Michael looks ahead at whether broadcasters will move their production more and more to a non-stationary workflow and finally looks at other factors driving change in the Broadcast and Media market.

CGI dira Dimension specification

Download CGI's Whitepaper on Remote audio editing in radio production during the pandemic.

In Conversation with Platform Communications

We are joined in this interview by David Lawrence, Managing Director at Platform Communications to talk about the challenges in generating valuable leads in a more digital focused world.

We discuss how companies can generate leads effectively away from a traditional trade show environment and how, once they have generated them, these leads can be effectively nurtured.

David talks us through the key takeaways from Platform Communications recently published 5 step guide to lead generation (available to download here) and finally what steps and methodology we will continue to use as we move out of the pandemic and back to face to face interaction.