The data domino effect

The data domino effect

Articles

Thought leadership articles by IABM and our members
Articles taken from IABM's journal and at show papers
To submit your article email marketing@theiabm.org

The data domino effect

Journal Article from GB Labs

Thu 11, 08 2022

By Duncan Beattie, GB Labs


When designing a storage system for a production or post facility, it is easy to concentrate on capacity and throughput, and think that is all you need. However, a very real issue is data safety and how to guard against it. That is a particular problem in our industry of media production. Once you start considering the issues around data loss, the potential problems start to escalate: a real domino effect.

Media production companies are faced with significant challenges: perhaps an order of magnitude larger than other industries. First, we tend to deal with large files, and therefore very large storage requirements. If you are shooting with today’s digital cinematography cameras you are generating many terabytes a day of raw content. If you are a busy post house with a number of edit suites, grading rooms and audio dubbing theatres, then you will have multiple simultaneous projects on the go. This will create a significant number of files with some being very large.

In parallel, and compounding the challenge, there are significant time pressures. Often you only get paid for a production once it is delivered. There are few things more disastrous in the production industry than failing to deliver a project on time: at best there is the risk of significant reputational damage, and there may well be a financial penalty too.

So we start out with data which has already had a significant investment. The costs of a shoot including cameras, crew, actors, locations and more. That data has to be secured to the highest standards: the loss of data requiring the recreation of an event is almost unthinkable. The cost of restaging a shoot will spiral: actors and key crew may be unavailable, and it may even be impossible to do it again. (Think live sport).

The next domino to drop is that while you are working out how to recover the lost data, you are not making any progress on current projects. Editors and others are standing idle and the delivery deadline looms.

In truth, few would think of any production set-up – even one based on a single edit workstation and its internal disk drives – that did not back up the data on ingest. But backing up takes a finite time, especially when dealing with large files, which means that there are inherent delays in the system.

Larger facilities will have central storage systems, serving multiple creative rooms. Here is another domino: a single storage system supporting multiple projects means that any data failure has the potential to delay many clients. And that enterprise-level data failure means there will be many editors, colourists and others standing idle, costing money and earning no revenue. Completed work will have to be remade, taking time and costing money.

RAID

Using a RAID-protected server gives some insurance against drive failures, although the performance of the server will be impacted while the RAID is rebuilt once the dead drive is replaced. But the whole server also needs to be regarded as a potential single point of failure, with consequent data loss or at least loss of productivity.

Again, you do not need me to tell you that you should be backing up your server. But now we are working in tens or hundreds of terabytes, so backing up is no trivial matter. And even if a good incremental backup system is implemented so there is relatively little overhead as backups are created, there will certainly be a long delay while the repaired server is rebuilt.

Some will choose to back up to LTO tape, which is simple and efficient, but very slow to rebuild: maybe a week to restore all the data. Tape archives should be stored off-site, so there is a physical time to locate and retrieve the tapes before restoration can start.

Others will prefer to back up to the cloud, but downloading many terabytes of data taxes even the fastest internet connections, not to mention the significant egress charges raised by the cloud provider. In either case, again you are facing expensive downtime.

You also need to consider how long it takes to write the data to the backup storage in the first place. In a production or a post house dealing in many terabytes of new data a day, the time to ingest the original material and to write the backups begins to represent a very significant overhead.

You may decide to run backups overnight when there is little other traffic on the server, but that risks losing a full day’s work on multiple projects. And when the facility gets busy there may be a temptation to stop backups to keep everyone working, which means you are back in the completely unsafe zone again, and running into a spiral of inaction from which you may never recover.

Nearline

An option which will reduce risk significantly is to have a second online storage pool that backs up the live work server. Sometimes this is called a nearline server, as it is designed to support as closely as possible the online server.

The advantage of a nearline storage pool duplicating all the data on the primary server is that should any sort of failure occur, there is a complete copy of all media available on the same network with no need to download or restore from tape. You can also arrange another layer of security – to the cloud or to tape – from this second server, without impacting the performance of the primary source.

At GB Labs we specialise in high performance, high security storage systems, particularly for media applications. Our very strong recommendation would be that the minimum you should consider is a second storage pool within the production environment. Ideally you should be able to work directly from it if necessary. This means a structure very similar to the primary server, but the benefit is in the secure knowledge of minimal downtime and delays in production deliverables.

Further, there should be the third level of security, either disk to disk to tape or disk to disk to cloud. With well-designed archive software, you will be able to incrementally add to the cloud or tape backup at any time over the 24-hour period, ensuring this third level of storage remains as close to synchronisation as possible.

A subtle point in the system design is that you might choose the second storage pool as your point of ingest, with automation pushing it to the live server. This gives production teams the security of knowing that if they can see it, there is already at least one backup.

Copy protection

You can also consider immediate copy protection within the primary storage. This is often achieved through clustered storage systems.

A cluster is a collection of servers with a management layer to spread the data across the pool. Should one node within the cluster fail, the way the data is copy protected means that there is no immediate loss of content and work can continue. Although as an entity a cluster is resilient, an entire cluster can fail, resulting in the same data loss as if you had just a single server.

The alternative is a failover server, a completely identical server maintained with precisely the same data at the same time. Should the primary fail for any reason, all the clients can be switched to the secondary storage without any interruption to service. This has the benefit of zero downtime and zero data loss due to primary failure. The two servers can be in different locations, although for optimum performance they have to be on the same network.

Adding it all together, we have an online or primary server delivering content across very high-performance networks to your team of creative artists. There is an exact copy of the primary server on a duplicate, with an automatic failover switch should there be a problem with the primary.

The secondary server also handles all ingest, ensuring that all data is backed up at least to another server before it is released for use. It is also responsible for off-site backups, to the cloud or to tape, scheduled to meet the practical data demands of the operation.

The storage management layer will also provide a dashboard of system health and escalating notifications should any problems arise.

GB Labs has developed the range of tools needed to implement the storage capacity, performance and resilience of your business needs, helping you to deliver on time every time, safe from the domino effect of data loss.

Search For More Content



X