MediaTech Intelligence

AI for audio: where automated production delivers benefits

Journal Article from Salsa Sound

Tue 18, 01 2022

Rob Oldfield

Co-Founder and CEO, Salsa Sound

Live sport is one of the most watched content genres and in an era where on-demand and non-linear viewing is edging out linear, it continues to buck the trend. Fans at home want a live experience that is better than being at the stadium, with multiple camera angles and immersive sound. Clubs and sports leagues also want their content to reach as many fans around the globe as possible – online, on linear and on social media.

What this means for broadcasters is creating flawless audio and video, making multiple language versions available and delivering live content as it happens to social media. The challenge facing the industry as a whole is that budgets and staffing numbers are not changing; everyone is trying to do more for less.

The job of a sound engineer is becoming harder rather than easier as new technologies and formats come to the fore, involving more time-intensive, manual processes. The average production can already involve creating over 16 mixes; anything more puts stress on an already stretched workflow – or requires an increased workforce.

Fans’ expectations are not getting any lower either – quite the opposite, in fact. So, how do broadcasters continue to meet fans’ expectations for live content experiences that continually push the envelope forward?

Getting immersive with AI

Enter AI. The technology enables sound engineers to automatically create bespoke, enhanced and 360 immersive audio experiences, with multiple simultaneous mixes that give fans the ultimate listening experience whatever their device or preferences.

By automating the manual-heavy processes, AI-based solutions free up sound engineers to focus on becoming creatives; they can craft rather than chase a mix. This means a far richer audio experience for the end viewer.

Bringing the type of intelligent automation into the workflow that AI makes possible, gives broadcasters a cost-effective way to meet the growing demand for coverage of even the most niche sports, and still deliver the high-end results that align with their brand.

In a traditional live production environment, audio is a job on its own, requiring a level of experience and expertise that is outside that of other members of the production team. When it comes to niche productions, everything – including cost – has to be pared right down. Here, leveraging automated mixes that can be set up ahead of time, means a small production crew – with one person handling audio, video, and graphics – can take care of the job. It is all about making the job easier and giving broadcasters a way to meet the demand for niche content (e.g., reserve or development team matches), that audiences are showing a growing appetite for.

AI can automatically render to multiple formats and mix multiple language versions and even create different crowd flavours (for home and away matches for example). Intelligent automation means each mix is made compliant to meet the loudness standards and parameters required by social media platforms, linear broadcast, VOD, or OTT.

Working with what you have

Doing more for less – without adding complexity – is top priority for any broadcaster today. What sets Salsa Sound apart is that we have developed our AI technology to integrate with and make use of existing infrastructure. We deliver all the automation capability and the ability to create stunning, immersive sound using a standard microphone set-up.

By taking audio feeds from existing broadcast microphones at a stadium, we can use AI algorithms that automatically detect, mix in and enhance the on-pitch/court/ring sounds. As a result, the sound engineer can create engaging real-time mixes without the need for additional kit requirements.

The beauty of this approach is that you do not have to be a topflight club or Tier 1 broadcaster to give viewers amazing, immersive sound. By making use of what is already in place, we are opening up the power of AI to smaller clubs, niche sports and even applications outside premium live sports broadcasting.

By adding automation to the workflow in an intelligent way that actually adds value, we can open up a world of possibility.

A new world of sound

Making effective use of AI is all about leveraging the data; if we can start to see microphones as data capture devices rather than just sound recorders, it opens up the possibility of mining that vast amount of data for sounds that give more meaning to video.

One key area where this approach comes into its own is in highlights creation. AI technology can be trained, for example, to automatically select clips based on what is happening in the audio feed and entering it as part of the metadata; picking up sections where the referee/umpire has been active or vocal, have changes in crowd noise. Commentary is another area where you can use machine learning to generate speech to text and in turn create powerful metadata describing the content. This can help content, tagging, searching or segmenting and provide a better service for content creators and viewers.

While sports, football in particular, is a natural starting point for AI-based audio mixing, this is by no means where the possibilities end. Audio recognition is relevant to everything from game shows to other live entertainment genres.

We are still just scratching the surface of what AI can do to take audio in live production to a whole new level.