Published on
·
Time to read
8 minute read

SDI Capture

Blog post image
Authors
  • this post is about SDI capture methods, the most performant way to capture and manipulate video from cinema-quality cameras using AI in real-time
  • introduction
    • this year I started a new job at Netflix working on the Studio side in applied research
    • it's been a really fun return to my roots
    • origin story
      • back in high school when I first learned to program, I ran a small video production company too
      • I was deep in After Effects and Premiere, filming in front of green screens, and doing a lot of comp work
      • shoutout to Andrew Kramer and VideoCopilot.net for the best introduction anyone could ask for
      • anyway, did a lot of filming and visual effects work alongside programming, thought that's what I might do forever
      • went to college for computer science, got deep into software and kind of forgot about video (except for my personal projects and photography of course)
    • fast forward to 2024
      • I'm back in the game!
      • going from dinky conference productions for small businesses in Tennessee 15 years ago to the literal cutting edge of video production at Netflix has been quite the jump
      • super super fun and a literal dream come true
    • the problem
      • part of my responsibilities now involve onset processing and manipulation of video in real-time using AI research
      • almost everything onset is Windows or processed with dedicated hardware, not exactly linux / AI research toolchain friendly
      • we have the obvious problem of getting the ML inference pipeline to run in real-time (but I alredy know a lot about how to optimize that, which is a whole other post), what about the entire rest of the pipeline getting it from the studio into python-land and back out again?
      • i had no idea how to do this 2 months ago, lots of learning
      • what follows is an explanation of what I've learned so far and the best way to capture and manipulate video from cinema-quality cameras using AI in real-time
  • background info
    • video production
      • SDI is a video interface standard that is used in many broadcast and professional video production environments
      • it's basically the USB of video
      • it can carry crazy high resolution and frame rates, uncompressed, up to 12 Gbps
      • there's usually splitters and repeaters and all kidns of gear converting the signals from raw camera data to apply creative looks and effects
      • there's also tons of in-camera options and effects that can affect the look, latency, and quality of the video
      • we want to introduce a step that applies AI transformations to the video in real-time
      • the goal is to do this as efficiently as possible with minimal latency, ideally with a single machine
    • capture cards
      • capture card is a device that sits between the camera and the computer
      • it's responsible for converting the raw video signal from the SDI cable into bits for our software to process
      • there are many different capture cards, each with their own features and capabilities
      • AJA
        • defacto standard for broadcast and professional video production
        • very high quality, also very expensive
        • not exactly linux friendly
      • DeckLink by Blackmagic Design
        • another popular choice
        • also very high quality
        • a Linux driver and source SDK are available, plus an ffmpeg plugin!
      • Magewell
        • another slightly less popular choice
        • best linux support of all, built-in v4l2 support (we'll get to that in a minute)
        • as plug-and-play as it gets, USB options that work just like a webcam
        • still pretty high quality
    • capture methods
      • irrespective of the card, you also have many options for capturing the video stream itself from that card
      • on linux we have several decent options we'll explore
      • v4l2
        • what your typical webcam uses
        • super easy to use, just plug and play
        • v4l2-ctl list example
        • opencv code example
      • ffmpeg
        • the swiss army knife of video processing
        • several options, including reading from process pipe in python and a companion ffplay command line tool
        • bash code example
        • python code example reading from stream
      • gstreamer
        • from nvidia, supposedly to optimize our use case of processing video in real-time
        • several options, including reading from process opencv in python and a companion gst-launch-1.0 command line tool
        • bash code example
        • python code example
      • vendor SDKs
        • code from each vendor to read the raw video data from the card, usually in C++
        • C++ code example of decklink
    • colorspaces and pixel formats
      • these are super complex and probably deserve a post of their own at some point, but for now, here's the important stuff
      • colorspace is the strategy we use for defining the possible values of a color, there's scene referred and display referred, and whole lot of options here but for now just know that there are a range of options here and they can affect latency because of how many bits are used to represent a color
      • pixel format is the strategy we use for encoding a color in a particular colorspace, the one you're probably most familiar with is RGB if you have a generic tech background, but video (and jpegs actually) tend to use YUV, this too can affect latency
  • the experiment
    • setup
      • i wanted to test the "glass-to-glass" latency of the entire pipeline
      • set up a stopwatch with highest precision I could find, placed it next to my monitor and pointed the camera at it
      • when I bring up the real-time feed on the monitor and take a photo, the difference in timestamps between the photo and the video feed is the latency! BOOM!
      • rented two cameras, a RED Komodo, a Blackmagic micro studio 4k, ordinary USB webcam as control
      • RED Komodo is a very high quality camera, but also very expensive and heavy
      • Blackmagic micro studio 4k is a also a pretty high quality camera, but light and (relatively) cheap
      • we want to isolate each component of the pipeline and test them individually as much as possible
        • e.g. if RED latency is 200ms but Blackmagic is 100ms, we know the difference isn't the capture card
      • constraints:
        • I only had about 1 week to build the entire pipeline and make our recommendations before I flew to LA to start working on the project which meant some compromises
        • I didn't want to spend more than 2 days on this hardware optimization experiment
        • Ubuntu ML inference optimized far better than Windows, so we'll use Linux for the pipeline for now
          • That means AJA is out for now, but revisit if we need to
        • The entire team knows python but little C++, DeckLink SDK is out too for now, but revisit if we need to
      • diagram of the pipeline
        • light from camera -> camera sensor -> in-camera processing -> SDI out (camera) -> SDI in (capture card) -> capture card hardware processing -> driver/OS processing -> python/SDK processing
      • we'll manipulate the following settings
        • camera used, check if it's the bottleneck
        • capture card used, determine its latency and best option
        • capture+display method, determine best option, understand display latency from opencv render
        • FPS, check if it's the bottleneck, obviously higher fps the lower latency of each frame
        • resolution, check impact on latency
        • in-camera settings (colorspace / lut), check impact on latency
        • capture card settings (rgb vs. yuv, mjpg flag), check impact on latency
    • the test complications
      • the "millisecond precision" stopwatch was not nearly as precise as advertised and reviewed
      • I doubt anyone was actually taking high FPS recordings and checking that the latency was in milisecond granularity
      • the screen appeared to have a ~30Hz refresh rate, so the latency readings were in ~33ms increments 😭
      • fix: do a ton of trials, measure frame delta on larger time scales, and average it all out
      • disappointing? yes, but also, we only really care about frame-level delay at 24FPS for cinema, so it still works
    • the results
      • most comparisons of the form "which card/setting is faster, X or Y" are inconclusive (not enough statistical power to draw conclusion or too many context changes erase or reverse the "gains")
      • most important factors are...
        • capture resolution (duh, ~60ms for 1080p, ~200ms for 4K, almost 4x to match the pixels you're pushing!)
        • FPS (duh, ignoring hardware there's mandatory ~30ms delay from 24 to 60 from physics)
        • client/driver choice (v4l2 faster than ffmpeg faster than gst)
        • camera itself
      • screenshot of giant spreadsheet with all the results
      • widget to play around with latency ranges and see the impact on the pipeline
      • hypothetical fastest possible configuration:
        • 1080p Blackmagic cam @ 60 FPS, Magewell USB Capture card, v4l2, RGB pixel format, no colorspace / LUT transformations
      • capture card made little difference
        • DeckLink can be faster, but not conclusively so
        • surprising given PCI vs USB, part of this is driver related, v4l2 by far the best method and decklink doesn't support it
        • both are pretty well optimized
      • pixel format not as big a hardware lift as one might hope
        • CV2 is pretty good at converting between pixel formats already, max savings of a few ms
      • input resolution doesn't really matter
        • hardware processing and scaling down in the camera vs capture card not a big deal
        • minisculely faster to do in camera
        • big difference was hardware vs. software, obviously
      • processing from python was sometimes FASTER than vendor-provided C++ options
        • GStreamer, the purpose built C++ NVIDIA option was 2x slower than python v4l2 -> opencv display when using magewell card, but the opposite for decklink cards
        • never thought that would happen, even if inconsistent across configs!
      • conclusion: soooooo much of the performance is dependent on your very specific environment / client choice, very tough to make generalizable statements.
        • unfortunately, severely limits the usefulness of this article to you, the reader
        • ultimately, just like in code, a lot of these microoptimizations weren't as big of a deal as you think it might be
        • you should probably measure your specific case and only care if it's noticeable!