MediaTech Intelligence

Meaningful metadata – the international standard

Journal Article from Spicy Mango

Fri 01, 10 2021

Chris Wood
CTO, Spicy Mango

In the last 36 months, the way in which machine learning technologies have advanced is incredible. As the world moves to more automated ways of working, I’m going to dive into how media supply chains are shaped, driven and limited by data.

AI and ML are always billed as the saviour. As buzzwords on an array of the latest product datasheets, occasionally there is a reason to see why. The ability we now have to analyse video, whether this be clips or entire programmes, and generate meaningful metadata is second to none. From a short clip of some relatively mundane motoring content, we can identify people, places, and objects and not just at a high level. An implementation we’ve looked at recently can not only extract the colour, year and model of a Ford motorcar, but also tell us that there’s a wheel, a headlamp, a mirror. Additionally - the metadata produced can be timed – so we now have a system that can tell us not only that there’s a car in the video, but at what point in time the car arrives in the scene.

Moving forward, what do we do with this information? Can we make search more insightful for our users? What about counting occurrences of objects to influence recommendations? Linking to parts catalogues or online stores? How about generating caption details on screen or content categorisation and tagging? All incredibly valid use cases we couldn’t have dreamt of five years ago. Great benefits to the end consumer, but what about the precursor – and making this content available in the first place?

When media supply chains are built, they rely on a few elements: primarily, essences of video and audio, and a metadata payload to be able to identify them. Through AI/ML, our ability to generate and augment that metadata to include more useful information about the contents of the video is hugely useful. Downstream systems can make decisions in real time during ingest and processing around what to do with that asset such as where it goes in the catalogue, how to categorise it into the correct price point or tier and so on. In the case of sports content, identification of a goal can generate a clip from a highlights programme – or even the reverse, and automatically publish this to your OTT platform of choice.

On the technical front, our ability to analyse a piece of content and understand its makeup have been with us a while longer. Generation of a file size, length, codec, resolution, aspect ratio data are all known entities now. The smarts here relate to the way in which we use this information to drive downstream decision making. In OTT, we commonly leverage this data to make meaningful transcode choices, formatting for the correct devices and platforms.

There is no doubt that the use of automation and analytics technologies will help a great deal – but there’s still a gap to fill that the technology isn’t yet able to bridge. To make best use of automation, simplify our content chains and delivery ecosystems, driving home the need for an international standard for high quality metadata at source is still the key.

The modern supply chain

Having explored what these innovations mean for the consumer, how do we start to think about what they look like for the industry as a whole?

In order to make this useful we need to understand what supply chains look like in today’s world (if you’d have asked me this question 15 years ago – you’d get a very different answer). The major brands we love and know are content businesses. Sure, they build and own technology and products, but fundamentally what is being delivered to the consumer is content. It’s the ‘product’ we’re all buying.

The notion that content starts and ends life within the same four walls doesn’t apply anymore. In fact rarely is it even the same organisation! Assets are now transferred around the globe between content producers and service providers on a many to many basis. Our world is pillared by licence deals and syndication agreements, so the need to move assets in a supply chain is no longer as limited as it was when television had four channels and everything was taken care of under one roof.

Supply chains are now more complex than ever. Every organisation moving content and metadata operates with its own standard. Many of these standards are driven by either what is required to support processing or driven by years of platform development and integration with a variety of tools and systems. Change is often slow, and it’s very hard to adjust a system and workflow that hasn’t been designed from the ground up to be modular and support dynamic change without affecting everything around it.

Efforts like the DPP initiative in the UK have worked hard to try to introduce standards and simplify the asset logistic challenges. In looking at the member roster, and having worked with a number of these partners over the years, many are still some way away from a seamless unified approach to logistics - highlighting how complex and fragmented the delivery ecosystem is.

Take this problem and multiply it for every syndication partner that’s using a different format and delivery method, rights and license rule variants, and we start to see how big the challenge is with many custom integrations, content and metadata transformations all needed. Despite efforts, no one is singing from the same hymn sheet.

Lastly, there’s one other challenge that we haven’t yet touched upon. This is the availability of metadata (and I mean good metadata!) from source. Having been in this industry for a long time, I’m still surprised at how many supply chains are driven by Excel, PDF and Word documents; assets arriving via FTP with a basic document (sometimes even forwarded as an email) that incorporates no more than a title, description, series name, season identifier and episode number. Do we have a supply chain? Sure, but it’s pillared by large teams of people manually inputting data, often consolidated from disparate third party sources that are yet to be integrated.

AI/ML technologies haven’t yet evolved enough to analyse a piece of content and tell us who the director, the producer, or wardrobe assistant was, or even who the file should be syndicated to and when. What are the license rules and distribution parameters?

Despite everything we can do with technology, the ability to create structured data that we can use downstream is still limited if what is received from source is either incomplete or inaccurate.

So what is a data driven supply chain? I’d argue it’s the process of using data and information to make decisions on media asset logistics. Does this need AI and ML to get there? Not at all.