Getting In Sync
Your guide to getting your audio and video in step with each other
Your guide to getting your audio and video in step with each other
Written by Terry Doner, Feb 20, 2021. Updated April 1, 2021.
In our audio/video systems it is important that our audio and video are in sync with one another. For example, it can be very distracting if an actor's lips and voice are acting out of sync. Broadcast standards typically require a variance of less than 5ms, but that is a very high target for most church budgets. Two frames would be a reasonable maximum target for low budget productions; in a 30fps system that would be a limit of 66ms.
There are different kinds of synchronization problems, for example:
Two audio signals might take different paths through an audio console and then be mismatched in time. (Often measured in 'samples' which could be as short as 10 nanoseconds @ 96k )
Video from two cameras might be misaligned with respect to each other
Video and audio are out of alignment with each other due to the different processing chains applied to each.
This article will deal primarily with the third variety, although some of the techniques would also apply to #2 as well. In the following sections we will cover the causes of out-of-sync, how to measure the difference, and how to correct it.
The fundamental source of sync problems is that the audio and video signals are taking different pathways and these two paths are independent in their timing. Some examples:
Wireless video transmitter/receivers - cheap systems can easily add 100ms or more of delay
Audio processing via a DAW (Digital Audio Workstation). The system capacity, buffer sizes, interfaces and plug-ins used can easily add a second of audio delay if not careful.
Encoding/Decoding for a protocol like NDI-HX can add a few frames of video delay
Some video capture interfaces for computers can add significant delay to the video path.
In-camera electronic image stabilization can cause video delay.
Format conversions can add a frame or two of video delay (eg 1080p60 to 720p30)
Running digital-to-digital interfaces as different sample rates, for example a source at 48kHz and the destination at 44.1 kHz. In this situation, they may start out in sync, but the audio will drift over time.
Using variable bit rate encoder options, use constant bit rate options throughout your entire signal chain.
USB devices - often people don't have a choice about this, but some USB interfaces have been reported to be unstable with respect to timing.
It is very unreliable to measure sync on YouTube Live - it can drift by up 45 ms. (A similar drift is likely on other services as well.) It would be good to download the video and measure the sync on a local video as that eliminates the streaming variability.
If using a laptop (especially a Mac), plug it in, don't run off of battery as it may reduce system capacity to save battery time.
If using network links (eg NDI), do not use wifi - only wired connections.
if using wireless video systems, pay close attention to their latency characteristics.
If running Windows or Mac, look at Sweetwater's guides to computer optimization.
If using OBS, monitor its resource usage, the OBS stats are visible under View -> Stats, also dockable.
We are naturally accustomed to see things before we hear them (light travels faster than sound). So when setting sync it is better to have the video ahead of audio by a small margin.
There many techniques which can be used to measure the delay.
Steven Ballast has this video about time alignment using OBS, but the techniques and the tools are universal. (just not the OBS solution). He has a web page as well. It is a good place to start as he explains how to use it. The other two videos are alternatives. In all cases you need to play the video and then observe the signal at the end of you chain to see how much drift there is.
This is a simple picture of one way to measure the delay. Using a video recorder to record the merged audio and video allow you stop the recording on playback to more easily read the delay.
Once you determine whether it is the audio or the video that is ahead, you can then decide what to do:
One key concept is that you cannot 'make a signal go faster', you can only slow it down less. This usually comes up when the video is late and people want to insert a box that will speed it up. They don't exist. You can only change things that will slow it down less.
You may also wish to measure the latency through a device. The technique is similar for and audio device and a video device, but the details differ. If you have a drift-over-time problem then you might need to run this for your program time. (if your program is an hour, you need to run the test for an hour).
For audio, the source can be a click track for short tests, but for longer tests you will need something that also has some kind of global time marker; a voice over would do. Or you can also use a DAW like reaper to generate an LTC timecode and a program like LTC Reader to display to result (the demo will work for five minutes; you can use it for the last five minutes).
You can then bring both tracks into a DAW and examine the timing difference between the two waveforms.
For measuring video latency we again need a source, you could use a time code generator or even a time-of-day clock like time.is (if you press the period a few times you will get millisecond display). We need a display (it could be a hardware device or a computer) for the original 'time' and a second for the processed time. Using a video camera, record both displays. Then you can review the recording and calcualate the difference between the two times across the span of your test.
There are several things that can create a delay in your audio.
Your audio console may have a delay setting turned on. Many digital consoles have the ability to add a delay to a signal path, so ensure that isn't the cause
DAW sample size: There will be a sample size setting that affects the latency through the DAWs processing chain. For fastest processing 128 or fewer might be good choices. The tradeoff is that the smaller numbers will require more computing power, so you need the right balance between system capacity and latency.
DAW Interfaces: If the inputs or outputs of your DAW are analog there will be some latency due to device processing. Usually the only way of tuning this is to buy a faster device, or eliminate it. You eliminate the Analog-to-Digital and Digital-to-Analog latencies by keeping the signal digital as much as possible. Many devices or DAWs will report on the interface latency.
DAW Plugins: A plugin such as reverb or pitch correction adds latency to the signal. You need to choose these carefully and may need to limit the number of plugins you use. If you suspect a plugin, disable it and measure again.
DAW System Resources: A computer only has so much capacity. There are classes of resources: CPU, GPU, Memory, Disk, and Network. For DAW the main culprit which can cause delay is CPU. If you are recording as well, then maybe also Disk. Monitor your usage by looking at the system's resource monitor, "Task Manager" for Windows, "Activity Monitor" for Mac. It is a best practice to not ask your DAW computer to do more than one job; don't also run ProPresenter, and Lightkey, and OBS, etc on the same machine.
Also note that a software video mixer can also create a latency much like a DAW.
There are a few ways to delay your audio:
Your digital audio console or DAW is very likley able to add delay to your signal. Usually limited to the millisecond range, but some can create a delays of seconds.
You can add a hardware device that adds a delay. See the list below.
Your video mixer may have an audio delay setting. Check your manaual. Some ATEM models, OBS, and vMix have audio delay settings.
Here are some videos for Vmix, OBS, ATEM Mini, and the X32.
To delay the audio in OBS, click on the gear icon in the audio mixer and select the "Advanced Audio Properties". In the image below, the "Sync Offset" is what you want to adjust.
Common causes of video delay:
Electronic Image stabilization in a camera can create a noticeable delay
Video conversions can add a frame of delay, especially cross-converters.
Computer video capture interfaces vary widely in their processing delays.
Wireless video transmitter/receiver systems, especially the cheap ones, can create substantial delay.
Video Mixers can add 1 to 3 frames (~90 ms @ 30 fps). But if your audio and video are merged prior to the video mixer, they should keep the audio and video (as received) in sync - by that I mean they won't make it any worse.
There are devices that you can buy that can delay video. They are not cheap, see the list below. Some software, such as vMix and OBS can delay video. See the vMix video in the prior section.
To delay video in OBS, use the "Video Delay (Async)" filter on the source.
Note:
Delaying video requires the software to retain the video content in memory and this can be a substantial demand on the system. Setting long delay times can create system instabilities. This is true of all software.
This is a list of hardware devices that can be added to your signal path to adjust the timing of audio or video. You can see from the prices that delaying video can be expensive!
In this sceanrio, there is a problem that needs to be remedied. This should not happen.
The first step to is to identify the device or devices in your signal chain that are the cause of the problem. The test technqiues discussed above can be used on each device. If the drift is happening over the course of an hour, you will need to run the test for an hour.
Now that you now where the problem(s) is, you can focus on that. Here are a few tips.
If using digital-to-digital interfaces, check your sample rates, they should all be the same. If, for example, you have a source at 48kHz and the destination at 44.1 kHz, the audio will drift over time.
Check your frame rates on all devices. If your system standard is 30fps, then make sure that is what everything is using. (It is harder to mess this up and not have bigger problems).
If the device is a computer, check your system resources. Make sure your system (when under load) is consuming all of a resource.
As a rule of thumb, system CPU should not exceed 75%, like wise for GPU. Some software cannot effectively use more than one CPU core, if a single process is using more the 75% of a CPU then you might want to investigate that.
If using NDI check the network utilization along the entire path. Again don't exceed more than 75% of a port's bandwidth or of a switch's backplane capacity.
If recording, then ensure your disk bandwidth is also not constrained. You may need to run a disk benchmark to know what your maximum capacity is. Keep it under 75% for a dedicated drive - don't use a shared drive (shared with OS).
Check that all of your drivers are up-to-date.
Do not run any software that is not needed for the job.
If at all possible do not run laptops of battery - plug them in. This avoids OS energy saving measures from kicking in.
Finally over the course of your event, your systems temperature can rise to levels that the OS restricts how much work can be done. This is called thermal throttling. There are utilities that can monitor temperature. If system utilization (CPU and GPU in particular) decrease as temperature increases then that is a good indicator. Resource utilization may suggest there is spare capacity, but due to excess heat it cannot be made available to you. The resolution is to add cooling to the system or decrease overall load.
References
Technical Standards for Delivery of Television Programmes to Sky - Page 12 has their specification of 5ms sync tolerance. I've read that the BBC has the same upper tolerance limit.
https://www.avsforum.com/threads/tools-for-measuring-audio-video-synchronization.2999524/