VP News, Associated Press
How the Associated Press applied AI to deliver automatic descriptions of live and non-live content
In 2019, the Associated Press (AP) – the world’s largest international news agency – instigated a project to leverage AI technology to shorten its production process so that its customers could receive content more quickly, while significantly reducing manual input, freeing its own staff for more creative purposes. AP turned to Limecraft to enable the transformation, using Vidrovr for scene description, facial recognition and gesture detection alongside Trint for audio description, which together deliver a single, coherent and frame-accurate description of every individual shot.
At a recent BaM live!™ event, Sandy Macintyre, VP News at Associated Press, and Maarten Verwaest, CEO of Limecraft, gave a wide-ranging and honest description of the unfolding of the project. Far from being a ‘fit and forget’ job, it proved to be a complex but ultimately very worthwhile project. What follows is an edited transcript of the session, providing a brilliant insight for everyone who is considering embarking on tapping up AI technology to enhance their processes: in short, the more you put in, the more you get out.
What drove Associated Press to investigate the role AI could play in its production process?
Sandy: “Our starting point was: can AI help us reduce the amount of time that we're spending on the very manual tasks, which of themselves are important, but are not part of the creative storytelling journalistic process? What we were trying to do achieve was to remove what we might call the ‘grunt work’ from a workflow - the really manual stuff. That took us towards AI because one of the things that all news companies in the broadcast space spend an inordinate amount of time doing is transcription of interviews and shot lists in terms of depicting frame-by-frame what is filmed and what's going into an edit. If AI could help us with this, we thought we could save literally tens of thousands of hours of manual time.
“To put this into context, as the world’s largest, and indeed oldest international news agency, we transmit to hundreds of broadcasters and digital publishers around the world; it all adds up to about 20,000 hours of live content or 100-150,000 edited items every year. That's literally hundreds of thousands of minutes of content that have to be manually transcribed and shot listed so the scale is vast, and therefore the time saving could be huge. But where we were also coming at this from was, if we can solve a problem for ourselves, we can also solve a problem for everybody who is a subscriber or customer of AP, which is around 700 broadcasters and probably twice that number of digital publishers; they get their content more quickly.”
What can you do with the ‘grunt’ time you’ve liberated?
Sandy: “If you go back to the days of film, you had a couple of hours between the time you shot something and the time something aired; when we got to tape, yes there was the editing process, but it was then running away someplace across town to the feed point to feed the video. This is just another big, polar change of how you spend that time. But crucially at this time in the era of fake news, fact checking - getting it right - has never been more important. So if you think about most news organizations that are about both being first with the news and about being right and accurate, winning this time by using AI allows for speed and accuracy to come to the fore – and of course, more creativity in the editing process.”
Quality and accuracy are paramount
Maarten: “The quality of the output of artificial intelligence is a critical success factor for acceptance. If the word error rate is four or five percent, that’s not acceptable – the rate needs to be pushed below 2% and that’s a challenge from an engineering point of view. We excel at pushing technology to a level where it becomes enjoyable for journalists to use. And what we found is that there is a huge gap between what a typical engineer seems to find acceptable and what the journalist will accept. There is nothing more frustrating than having to correct artificial intelligence again and again because it's been recognizing the wrong person, or the wrong word. And that's where technology has evolved a lot the last 24 months; I would say that's where massive training comes in. People looking to engage with artificial intelligence should look for a proper man in the middle interface - a proofreading interface: you have to accept that artificial intelligence will make mistakes from time to time, and that it needs correction, and quality control, and that needs to go as smoothly as possible. With these three conditions: a good engine, proper training and a good user interface, you increase your chances of an acceptable solution.”
What process did you adopt to roll out the AI?
Sandy: “I think we've got to be very honest and say that at the start of this process - at the start of all new technology processes - there is an element of fear: fear that people might lose their jobs, fear of change, fear that ‘I won't get the new ways’. So what we had to do was be very open and honest with people about what we were trying to achieve, which was to liberate quality time back to the folks who had to do this work. But we also had to recognize that because real time video is now such a big and growing part of the news ecosystem, that wasn't going away. And therefore, liberating that time was doubly important, because the volume has gone up. The terms of success on which we are judged is live real time, and therefore we had to make these changes. So we effectively put together a coalition of the willing. And within that, we deliberately also asked people who were potential naysayers and doubters to come to the table because their input was super-important. And we knew that over time, they would probably become some of our biggest and most evangelical supporters, if we could get this right.
“But crucially as well, it had to be people who were actually doing this work, so that they could know the difference between how it was before, and how it will be. And also, so we could get to the tipping point between when it is faster to do it all manually, versus when AI wins the race - and also recognizing the point where, while AI still makes some mistakes, it's actually quicker to catch and correct those mistakes. So what you have ended up with is a shift to where I describe AI as the best Assistant Producer – the best intern you ever had - right now. But in time with the training, we're constantly giving knowledge back to it, creating that learning loop, so that Assistant Producer goes up through the ranks and potentially becomes the Executive Producer of this whole workflow process. That’s very much the journey that we’re on.”
All-in or phased process?
Sandy: “The first thing we quite quickly realized was that a ‘boil the ocean’ approach of throwing all our content into the AI bot and trying to get AI to recognize it just wasn't sustainable; the technology is not that smart yet. We quickly realized that a huge amount of content that appears on television screens across digital publishers every day is what you might call governmental, political, diplomatic content - Joe Biden getting off a plane, getting in a car, getting on a stage, making a speech, having a bilateral meeting etc. All of these things you could also apply to any world leader, foreign minister or celebrity. So we knew that if we could teach it recognize these kinds of actions, we could take possibly 20 to 25% of the news content that flows through the AP system every day, and gain understanding of that – and that indeed is what we've done. We've been able to teach it say the top 300 names that appear most frequently on the screen in the world, and the actions that those people might take within a range of known domains.
“The second is that we absolutely did this offline - we got into the digital sandpit and we played there, so that it wasn't polluting the everyday workflows that continued. It's been a fast, iterative process which hasn't got too hung up early on about how it integrates with legacy systems. We got something that works in a beta phase in the sandpit, and then figured out how we would integrate it into the technical and editorial workflows - and let the people who've done that work in the sandpit be the ones who are leading the conversation about integration.
“This process also allows us to take a very hard line - drive it or park it. Is this going to work, how big a help is it going to be, is it worth persisting with? We thought we would get this done inside a year; perhaps more accurately, I naively thought we would get this done inside a year. And the truth is it’s probably taken us two and it will take us another one to get the amount of learnings in there for us to really drive change. It goes back to that analogy of this being as clever, sophisticated and experienced as your assistant producer. But when you create that learning loop of knowledge and experience in the real world, in 12 months’ time with the amount that will go through that learning loop, that machine will be way more experienced, way more helpful - way more like a senior member of the editorial team.”
What have been the catalysts to propelling AI into the demonstrably useful tool AP has now?
Maarten: “We’ve seen plenty of technologies like speech recognition, facial recognition maturing as standalone singular point solutions over recent years, and there are many AI companies out there. But users like news desks have realized that there was a serious effort required to integrate these point technologies into workable solution. What's changed? From 2018 to 2020, Limecraft has been involved in Memad, a €3.5m R&D effort funded by the European Commission. That’s enabled us to combine all the different aspects - speech recognition, scene detection, gesture detection, face detection, optical character recognition, emotional tonality detection - into a single, coherent and phrase-accurate description as if it were produced by a journalist. And it’s made it searchable, effectively automating the shot-listing process in a quantum leap from technology into solution that can be adopted by real-world users.”
Where will AI take AP in the coming years?
Sandy: “We need to figure out how or if AI can help us with the tonality and the accidental bias of any content. AP prides itself on fair, accurate, impartial news, and therefore we need to ensure that our reporting both reflects fairness and the world we live in. I think there will definitely be an AI component to understanding whether you get the tone right, are you biased in favour of one side or the other or one gender or the other, one ethnic group or the other. I think you can get an awful lot of information back from AI in this regard, and there is a project already underway which we are associated with, under the IBC Accelerator umbrella. It is beginning to take baby steps into doing this – and Hollywood and the movie industry is already running scripts through just this kind of tonality index; look out for this at IBC.”
Future direction of AI
Maarten: “Artificial intelligence is only as intelligent as it has been trained to be and training with accurate, normalized data is a critical success factor, as well as a scalability inhibitor if not done correctly. What we’ve done with AP is set up the feedback loop from a journalist into the ‘brain’, continuously updating the data model going forward. We hope this corpus of data will be exposed to third parties, and in the future, it's my hope that other customers like local news desks will also provide feedback to the overall data model. So we have this joint co-creation effort, and my hope is that AI evolves in that direction.”
You can watch the original video of this session at BaM Live!™ here.