Blu Digital – Fere nihil sine deo – thoughts on AI, localization and humanity
Silviu Epure, Senior Vice President, Content Globalization, Blu Digital Group
“It was the best of times, it was the worst of times…”
What a glorious decade for global media distribution. Content consumption is higher than it’s ever been, borders have been stretched, pushed or removed entirely, “foreign” content is captivating “foreign” audiences and the inaccessible is finally becoming accessible to all.
This high-stakes global distribution marathon is made possible and fueled by the propagation of Localization and Accessibility. In simple terms, Localization takes content from one language and creates a set of parallel files (audio tracks, subtitle files, artwork, etc.) in a different language than the original. In contrast, Accessibility sets forth the creation of additional audio, video and text files that enhance the viewing experience of individuals with visual, auditive or cognitive disabilities.
This is the groundwork that gives every piece of content the ability to make its way into the hearts and minds of audiences anywhere in the world. Whether it’s bringing the vision of a Brazilian film director into the home of a Norwegian movie lover, or whether it’s helping a sightless child in France “see” an Australian animated series, the value of the myriad of writers, translators, adapters, voice actors, voice directors, audio engineers, audio mixers, production coordinators, producers and all other creatively talented people that make Localization and Accessibility happen cannot be understated.
Nevertheless, as the demand for localization and accessibility increases and the flow of content rages, so do the logistical and capacity challenges that accompany it – limited talent availability, diminishing studio capacity, supply chain inability to keep up with demand, etc.
And so, in an effort to self-correct, the market trend has been to push localization producers into a frenzy, demanding that months of much needed creative production work should happen within weeks, that respect for artistic detail should take a back seat to “out the door-ism”, and linguistically nuanced subtleties (which enhance viewer experiences but which slow down deliveries) should find a warm place in the corner and die.
In contrast and in defiance of general market trends, a handful of global studios, broadcasters, streamers, distribution outlets and vendors continue to fight off internal / external pressures and stay true to high artistic standards of quality. Even through failure, the aim remains high on every minute of content created – that’s because this group of professionals knows all too well that even the best piece of global storytelling can be brought to its knees by a bad localization experience.
And then, there was Automation and Artificial Intelligence.
According to techopedia.com, “Automation is the creation and application of technologies to produce and deliver goods and services with minimal human intervention.” According to the same source, Artificial intelligence (AI) is “a branch of computer science that focuses on building and managing technology that can learn to autonomously make decisions and carry out actions on behalf of a human being.”
Automation has been used and refined in various industries since the 1700s, when the first spinning mill was created. Its ability to constantly improve efficiencies, increase outputs and reduce costs by speeding up tasks that were previously performed entirely by humans has been crucial to the development of our global supply of modern day goods and services.
AI on the other hand, while it has fascinated writers, movie makers and computer scientists for decades, it has not been widely applied until its recent accelerated breakthroughs have placed it front and center in our news cycles and for some, our lives.
And so, the pattern becomes clear – in our collective efforts to advance our societal ever-growing wants, we’re moving from “minimal human intervention” towards “on behalf of a human being”.
Is this the right path forward? I believe the answer to that question is nuanced and highly dependent on the field in which we are exploring its applications, on the ethical / moral / philosophic perspectives that we choose to employ in our analysis, as well as our openness and ability to accept and embrace change.
I do think one thing is certain though – from a business perspective, ignoring the patterns and the realities of the era is akin to working for Blockbuster, watching the first Netflix billboard go up and thinking that it’s not worthwhile your limited attention span.
Artistic, Creative, Human vs. Everything Else.
The rabbit hole of thought when it comes to AI’s pros and cons is deeper and wider than one can imagine at first. Once you jump in it, you can praise, despise, embrace and fear its potential applications, all in the same breath. And although we don’t quite know the real side effects that come with ChatGPT passing the US Bar Exam, I’m sure it would be worthwhile to give it some more consideration before we unleash its true potential into the world.
In the localization space, just like in many other industries, the cost / time saving opportunities brought forth by automation and AI are simply too financially seductive and operationally valuable to be ignored. From automated workflows and AI driven speech-to-text transcriptions, all the way to instantaneous AI translations, neural / synthetic voices and deepfake video manipulation, the applications seem to be endless.
That being said, in an effort to ensure that these newfound processes and technological advances help heal the industry’s quality-quantity divide rather than deepen it, it’s important to acknowledge that every form of localization service incorporates two components:
— The first component includes highly creative tasks which require human talent, subjectivity, experience and artistic vision – tasks such as translation, voice acting and creative audio mixing.
“The” Translation
Translation can mean different things to different people. If you’re traveling abroad in search of making friends around the world, translation is simply a tool that you use to communicate. In that context, scrolling through a dictionary or using an AI powered headset / visor to help translate sentences serves the exact same function. The process doesn’t need to be artistic, creative or perfect.
However, translation in the context of subtitling or dubbing TV and film content requires a completely different analysis. Every movie and TV show ever made is a manifestation of a message created by the show’s writer / director / cast. When you’re translating TV and film content from one language into another, you’re not only translating the words spoken by the on-screen characters. You’re actually translating the message of the original creators into a new language and developing a new creative experience for members of a different culture.
Linguistic nuances, formality of context, cultural improprieties – choosing the right word, at the right time, in the right circumstances, for the right character – that is a truly artistic / creative / human endeavor that AI cannot get close to successfully mimicking (for now).
“The” Voice Acting
Up until 10 years ago I had never thought about how a dub is created. I had worked in media production and distribution for many years prior but I always looked at those “foreign language audio tracks” as lifeless imitations of the original content. It was only when I started producing dubs that I realized the enormity of the challenge and the importance of the result.
So let’s paint a mental picture together where You are the voice actor.
You step into a semi-dark room equipped with a microphone and a TV screen. You put on a pair of headphones and you’re asked to pay attention to the picture, the voice director, the audio engineer, the note taker and the script, all at the same time. You are also asked to perform in such a way that seems as though you’re there, in the action, living the same adventures, breathing the same air and having the same type of emotions as the character you see on screen. But don’t just mimic the emotions, do it in a way that is appropriate for your language, your friends and your culture. And don’t just say the words, sync them to the lips of the character that you see on screen. And not just the words – every audible gesture, every drop of saliva, every sigh. And now do it again for a different character, in a different way. And do it well.
Session after session, day after day, talented, dedicated voice actors put on those headphones, stand in front of that microphone and transform silence into sound. Through their performances, they create foreign language dialogue tracks that sound just as good (or sometimes even better) than the original. They offer authentic, culturally appropriate, creative experiences without ever showing their faces. And although text-to-speech, synthetic voicing, audio cloning, and other facets of AI are making truly impressive advances in the space, they cannot begin to generate the qualitative results that these talented actors are producing or the human emotions that their performances evoke.
“The” Creative Audio Mixing
Simply put, audio mixing is the process of taking various audio tracks (dialogue, music, sound effects) and blending them all together to create an authentic / immersive audio experience.
In a dub, every time you hear a loud, echoey voice on stage or a whispered sob on a phone, that was the creative decision of an audio mixer, choosing the “just right” volume levels, reverbs, effects, modulations and everything else needed to create that perfect immersive experience. All of these voices may have been recorded in a silent soundproofed recording booth and yet it is knowledge, creativity, passion and dedication that make them sound as if they were recorded on the street / in a car / on stage or in outer space.
— The second component of localization includes logic-driven sequences of actionable tasks, such as asset / project management, transcription, timing, and technical QC (to name a few).
This component can and should be automated / AI driven because it provides the ability to optimize production capacity, support the supply chain and ensure both quantity and quality of output.
By choosing to spend less time dealing with spreadsheets, emails, manual file transfers and many other essential yet repetitive, action driven production processes, we gain the ability to invest more human time into creative translations, voice recordings and mixing sessions.
The same is true about cost optimization – by using automated / AI driven workflows in transcription, timing and technical Quality Control, we can invest more resources into creative / linguistic / artistic QA.
Ultimately, I believe automation and AI can be a blessing or a curse depending on how we choose to use them. One road leads to optimization, growth and enhanced artistic experiences, while the other to the deepening loss of our creativity and quite possibly – our Humanity.
I hope we choose wisely.