COO and Principal Engineer, Imagine Products
Marketing, Imagine Products
Recently, the American Society of Cinematographers (ASC) released its Media Hash List (MHL) specification to bring standardization to the media transfer process. The raw amount of data produced in the media and entertainment industry has never been greater than it is today, and will continue to grow well into the future. Similarly, there are now more data transfer points than ever before.
Through the entirety of the production cycle, data files will be transferred numerous times and store a variety of media. It is not uncommon for a single file to have passed through several hard drives, gone up to the cloud and back again, and archived to tape for long-term storage. All of these transfer points have made ensuring the data chain of custody more difficult than ever. The ASC MHL specification was developed specifically to combat these problems through standardization and intelligent media management procedures.
Currently, most organizations have implemented their own internal data integrity and chain of custody processes. While this may work well enough for intra-organization data transfer, complications and confusion often arise once data goes beyond the bounds of the organization. For the data chain of custody to be truly maintained, each organization in the data transfer process must have procedural agreements. This is where standardization is essential. The ASC decided that implementing a form of standardization to MHL would be beneficial in helping decrease the amount of time and number of complications that occur when transferring media.
How Does it Work?
The innovative products and updates created as a result of the new MHL protocols will be more powerful than ever before. But how does it work? Files will be tracked over time and integrity snapshots taken at every transfer point. These snapshots are then added to the list of previous transfer snapshots. Collectively, these snapshots provide a complete chain of custody history for the data set and greatly reduces the possibility for data loss. Should a problem occur at some point in the media transfer process, the chain of custody can be inspected to determine the exact point at which the problem occurred, and provide context on how to remedy the problem.
Now that it has been established that snapshots can be used to ensure data integrity, the question that must be answered is how they ensure data integrity. Data integrity is ensured through the use of hashes. Simply put, hashes are a unique representation of a collection of bytes. Since files are nothing more than a collection of bytes, hash values are an appropriate representation of a file. If two files have the same hash values, they can be considered to be identical. There are many algorithms that can be used to generate hash values, which will be discussed in greater detail in a subsequent section.
ASC MHL Format
ASC MHL is intended to be used for media production workflows and establishes how and where the integrity information is stored. Hashes for the media are recorded and converted into a human and machine-readable XML format. Through this, a chain of custody is instituted, which allows the easy tracking of damaged or duplicated files.
The application of these guidelines involves documenting hash records in an ASC MHL manifest file. This manifest file contains hash records for one or more files or directories within its scope. The scope of an ASC MHL manifest is the directory containing the files and directories of the manifest’s managed data set. Once an ASC MHL manifest file is created, it should be considered read-only and cannot be altered in any way. Hash records in an ASC MHL manifest’s files utilize file system paths that are relative to its scope. This means that an ASC MHL manifest cannot contain hashes for files or directories that are not within a Manifest’s defined scope. These manifests can also contain sections that have information about the creator of the file and details of the process the creator used to create it. Hashes that are recorded within an ASC MHL manifest at a given moment represent a snapshot of the corresponding data set at that same moment.
Each file or directory can have hash values generated using various algorithms. Every hash value that is entered is labeled as one of three options: original, verified or failed. A hash value that is classified as “original” means that this particular hash is the first of a specific file or directory. Hash values defined as “verified” are hashes extracted from the current copy of the ASCMHL manifest file and have been verified against hashes recorded in previous versions of the file. Finally, a hash value labeled as “failed” is a hash that was not able to be verified against hashes recorded in the previous versions of the file. In a broader perspective, a failed hash value signifies that the current file copy is not identical to the previous copy of the original. This guarantees media professionals can quickly recognize when their file has been damaged or when it is not complete, which will reassure the users that their data is being tracked accurately. This is essential when storing media and offloading it onto other platforms because previously, there was no standardized technique to confirm that data had not been lost in the storage or offloading process.
An ASC MHL chain is a file that serves as a Table of Contents for an MHL history. This means that an ASC MHL chain contains paths and file hashes for all ASC MHL Manifests within the ASC MHL history, ensuring that the integrity of the files can be verified. All ASC MHL manifests within an ASC MHL chain have the same scope. An ASC MHL history is an ASC MHL chain that all ASC MHL manifests are referenced from. This directory is named “ascmhl”. The “ascmhl” directory is created at the top level of a media directory so that an ASC MHL history can be automatically transferred when the media directory is moved or copied. The managed data set is the files and other directories that are stored within the “ascmhl” directory. All ASC MHL manifests that are a part of the same ASC MHL history must cover the same scope. This means that all record paths across the various manifests will be relative to the same location in the file system. This ensures the ability to track the data and corroborate that it is all secured in a singular space.
ASC MHL Manifests, and therefore an ASC MHL History, can contain hash records that use multiple types of hash algorithms. Multiple types of hash algorithms can be used because requirements may be different depending on the management system. The MHL guidelines are being put in place because there needs to be standardization when it comes to elements such as the type of checksum being utilized. The following hash algorithms are supported under the new ASC MHL guidelines: MD5, SHA1, C4, XXH64, XXH3, and XXH128. While agencies may have specific hash types they prefer to utilize, each of the different types of checksums has its pros and cons.
Types of Checksum Verification
The XXH algorithms are generally the fastest and are the preferred options for many organizations. The xxHash family of algorithms are all non-cryptographic hash algorithms working at speeds close to RAM limits. Non-cryptographic functions aim to avoid collisions for nonmalicious input and are typically much faster as a result. Of the 4 types of xxHash algorithms, there are 3 that software usually implements: xxHash-64, xxHash3-64, and xxHash-128. Each of these has different speeds and collision spaces.
The MD5 hash type was considered the standard for years before xxHash-64 arrived. Although MD5 is significantly slower than the xxHash algorithms, it is still a very viable option and many industry professionals are very familiar with it. For specifics on xxHash performance metrics against other formats such as MD5, this information is available on the xxHash GitHub repository.
The SHA versions of checksums are generally used to keep continuity in workflows. They are rigorous checksums but are outdated when compared to xxHash-64 because SHA versions of checksums are generally slower than most other verification methods.
Another verification type is C4. C4, like SHA formats, is slow but very robust. A distinctive advantage of using C4 is the production of URL safe output, which means that if a file was renamed to its C4 value, it can be posted on the web.
Overall, the goal of the new ASC MHL guidelines is to reassure media professionals that their data is secure and stored safely and completely. Avoiding data loss is critical during media transfer processes and has been difficult to track in the past. Hopefully using the standardization these guidelines create, media professionals can avoid data loss and verify the integrity of their files efficiently and consistently.
The ASC may give the outline to follow, but companies like ours are in charge of modifying our products to abide by those guidelines. Imagine Products helped develop these guidelines but their creation would not have been possible without extensive efforts from others such as Netflix, and Pomfort, the creators of SilverStack. In collaboration with our partners, the team worked to code the command line interface, which led to the application of the new protocols and the creation of adjusted products. The newly improved result of these guidelines can be seen through our newly revealed ShotPut Pro® version 2022 for Mac, as well as in TrueCheck® and myLTO®.