Following our introductory article, this (the second in our IMF series) post describes the Netflix IMF ingest implementation and how it fits within the scope of our content processing pipeline. While conventional essence containers (e.g., QuickTime) commonly include essence data in the container file, the IMF CPL is designed to contain essence by reference. As we will see soon, this has interesting architectural implications.
The Netflix Content Processing Pipeline
A simplified 3-step view of the Netflix content processing system is shown in the accompanying figure. Source files (audio, timed text or video) delivered to Netflix by content and fulfillment partners are first inspected for their correctness and conformance (the ingestion step). Examples of checks performed here include (a) data transfer sanctity checks such as file and metadata integrity, (b) compliance of source files to the Netflix delivery specification, (c) file format validation, (d) decodability of compressed bitstream, (e) “semantic” signal domain inspections, and more.
In summary, the ingestion step ensures that the sources delivered to Netflix are pristine and guaranteed to be usable by the distributed, cloud-scalable Netflix trans-coding engine. Following this, inspected sources are transcoded to create output elementary streams which are subsequently encrypted and packaged into streamable containers. IMF being a source format, the scope of the implementation is predominantly the first step.
IMF Ingestion: Main Concepts
An IMF ingestion workflow needs to deal with the inherent decoupling of physical assets (track files) from their playback timeline. A single track file can be applicable to multiple playback timelines and a single playback timeline can comprise portions of multiple track files. Further, physical assets and CPLs (which define the playback timelines) can be delivered at different times (via different IMPs). This design necessarily assumes that assets (CPL and track files) within a particular operational domain (in this context, an operational domain can be as small as a single mastering facility or a playback system, or as large as a planetary data service) are cataloged by an asset management service. Such a service would provide locator translation for UUID references (i.e., to locate physical assets somewhere in a file store) as well as import/export capability from/to other operational domains.
A PKL (equivalently an IMP) defines the exchange of assets between two independent operational domains. It allows the receiving system to verify complete and error-free transmission of the intended payload without any out-of-band information. The receiving system can compare the asset references in the PKL with the local asset management service, and then initiate transfer operations on those assets not already present. In this way de-duplication is inherent to inter-domain exchange. The scope of a PKL being the specification of the inter-domain transfer, it is not expected to exist in perpetuity. Following the transfer, asset management is the responsibility of the local asset management system at the receiver side.
The Netflix IMF Ingestion System
We have utilized the above concepts to build the Netflix IMF implementation. The accompanying figure describes our IMF ingestion workflow as a flowchart. A brief description follows:
- For every IMP delivered to Netflix for a specific title, we first perform transfer/delivery validations. These include but are not limited to:
- Checking the PKL and ASSETMAP files for correctness (while the PKL file contains a list of UUIDs corresponding to files that were a part of the delivery, the ASSETMAP file specifies the mapping of these asset identifiers (UUIDs) to locations (URIs). As an example, the ASSETMAP can contain HTTP URLs);
- Ensuring that checksums (actually, message digests) corresponding to files delivered via the IMP match the values provided in PKL.
- Track files in IMF follow the MXF (Material eXchange Format) file specification, and are mandated in the IMF context to contain a UUID value that identifies the file. The CPL schema also mandates an embedded identifier (a UUID) that uniquely identifies the CPL document. This enables us to cross-validate the list of UUIDs indicated in PKL against the files that were actually delivered as a part of the IMP.
- We then perform validations on the contents of the IMP. Every CPL contained in the IMP is checked for syntactic correctness and specification compliance and every essence track file contained in the IMP is checked for syntactic as well as semantic correctness. Examples of some checks applicable to track files include:
- MXF file format validation;
- Frame-by-frame decodability of video essence bitstream;
- channel mapping validation for audio essence;
- We also collect significant descriptors such as sample rates and sample counts on these files.
Valid assets (essence track files and CPLs) are then cataloged in our asset management system.
- Further, upon every IMP delivery all the tracks of all the CPLs delivered against the title are checked for completeness, i.e., whether all necessary essence track files have been received and validated.
- Timeline inspections are then conducted against all CPL tracks that have been completed. Timeline inspections include for example:
- detection of digital hits in the audio timeline
- scene change detection in the video timeline
- fingerprinting of audio and video tracks
At this point, the asset management system is updated appropriately. Following the completion of the timeline inspection, the completed CPL tracks are ready to be consumed by the Netflix trans-coding engine. In summary, we follow a two-pronged approach to inspections. While one set of inspections are conducted against delivered assets every time there is a delivery, another set of inspections is triggered every time a CPL track is completed.
Asset and Timeline Management
The asset management system is tasked with tracking associations between assets and playback timelines in the context of the many-to-many mapping that exists between the two. Any failures in ingestion due to problems in source files are typically resolved by redelivery by our content partners. For various reasons, multiple CPL files could be delivered against the same playback timeline over a period of time. This makes time versioning of playback timelines and garbage collection of orphaned assets as important attributes of the Netflix asset management system. The asset management system also serves as a storehouse for all of the analysis data obtained as a result of conducting inspections.
Incorporating IMF primitives as first class concepts in our ingestion pipeline has involved a big architectural overhaul. We believe that the following benefits of IMF have justified this undertaking:
- reduction of several of our most frustrating content tracking issues, namely those related to “versionitis”
- improved video quality (as we get direct access to high quality IMF masters)
- optimizations around redeliveries, incremental changes (like new logos, content revisions, etc.), and minimized redundancy (partners deliver the “diff” between two versions of the same content)
- metadata (e.g., channel mapping, color space information) comes in-band with physical assets, decreasing the opportunity for human error
- granular checksums on essence (e.g., audio, video) facilitate distributed processing in the cloud
Following is a list of challenges faced by us as we roll out our IMF ingest implementation:
- Legacy assets (in the content vault of content providers) as well as legacy content production workflows abound at this time. While one could argue that the latter will succumb to IMF in the medium term, the former is here to stay. The conversion of legacy assets to IMF would probably be very long drawn out. For all practical purposes, we need to work with a hybrid content ingestion workflow - one that handles both IMF and non-IMF assets. This introduces operational and maintenance complexities.
- Global subtitles are core to the Netflix user experience. The current lack of standardization around timed text in IMF means that we are forced to accept timed text sources outside of IMF. In fact, IMSC1 (Internet Media Subtitles and Captions 1.0) - the current contender for the IMF timed text format does not support some of the significant rendering features that are inherent to Japanese as well as some other Asian languages.
- The current definition of IMF allows for version management between one IMF publisher and one IMF consumer. In the real world, multiple parties (content partners as well as fulfillment partners) could come together to produce a finished work. This necessitates a multi-party version management system (along the lines of software version control systems). While the IMF standard does not preclude this - this aspect is missing in existing IMF implementations and does not have industry mind-share as of yet.
In our next blog post, we will describe some of the community efforts we are undertaking to help move the IMF standard and its adoption forward.
By Rohit Puri, Andy Schuler and Sreeram Chakrovorthy