2

Time Resolution in Digital Audio

After having covered the more or less well known basics of digital audio in terms of bit depth and samplerate, I’d like to go one step further and explore another common buzzword: the time domain resolution of digital audio.

If you have read more than one article here, it might have become obvious to you that I am a strong advocate for digital audio technology. Given my profession, this is not surprising. But of course I know about the skepticism that digital audio topics still traditionally provoke. Sometimes of course for good reasons, especially when it comes to actual signal processing. However, one of my goals with The Science of Sound is to promote an informed debate about the advantages and shortcomings of digital audio by helping understand the very foundation of the technology.

One of the concerns about digital audio I frequently encounter is one about the resolution in time domain. In this context, it is often suggested that the precision with which the exact timing of a sound event can be represented is limited to the sampling period. As time differences are a very important feature the ears use to determine sound source direction and spatiality, this coarse resolution must consequently be detrimental to the spatial impression of a sound recording.

Note that for a standard CD sampling rate, the sampling period is around 23 microseconds. And this could indeed be a little too coarse, as hearing studies found that some people are able to reliably identify interaural time differences as low as 15 microseconds for low-frequency sounds. That’s bad news isn’t it?

But you might have already guessed that it’s not that bad in reality. Think about it for a second. How do “sound events” know anything about the sampling period and how would a converter decide how to “quantize” these sound events? In fact, in reality there is no such thing as a sound event, just continuous signals.

Now the sampling theorem states that a signal can be perfectly reconstructed from samples taken at regular intervals, as long as it is bandlimited to half of the sampling rate. That includes any phase information, which – very simply speaking – determines when stuff happens exactly.

A Sub-Sample Time Shift Example

Have a look at this funny little animation here.

TimeDomainResolution

This is a bandlimited version of a square wave period, as an A/D converter would see it after antialias filtering. It is time shifted by different amounts up to one sample. Notice the dots, which represent the actually recorded sample that would land in your .wav file, and observe how they change with the time shift.

This example shows quite well how the sampling theorem actually works. The original signal is the only one possible that contains no frequency components above the Nyquist frequency and still passes through all of the sampled points.

But what about this strange overshoot and the limited steepness of the steps? That doesn’t look like a real square wave. Isn’t that a problem?

Actually not. These phenomena are a direct result of the band limitation. A steeper slope with less overshoot would be equivalent to a larger bandwidth of the signal, which would violate the sampling theorem and cause artifacts. A perfect step function actually has an infinite bandwidth, at no extra benefit in terms of perception, as everything in our hearing range is unaffected by the increased bandwidth. It has to look strange in order to not sound strange!

Time Resolution in Signal Processing

So recording and playback is fine. What about delay effects? Precision of delay is limited to sample steps, right?

Well, only if the developer was lazy. A very simple delay would just shift the audio playback by a number of samples and that’s it. But most implementations will need a process called interpolation. This is important if you need a very precise delay, as for example in physical modeling synthesis, or if you want to change the delay smoothly without artifacts.

Interpolation means computing the reconstructed signal that you would get at the output of a D/A converter at certain time instances between the samples. Thus, the same principle is also used for samplerate conversion. There are several ways to look at it, but in essence, it’s all the same principle. For a precision delay you would combine a sample-precise delay with a filter that adds a sub-sample time shift to the signal and you’re done. However, there are countless different ways to create such a filter, from simple linear interpolation (which has some artifacts but can sometimes be sufficient, depending on the application’s requirements) to a high-end sinc interpolation which will cost a dime or two in terms of CPU load.

Crunching the Numbers

Does that mean time domain precision is actually infinite? It must be limited in some way…

It actually is. But in a practically absolutely neglectable way. Let’s go back a few paragraphs. I wrote that “the original signal is the only one possible that contains no frequency components above the Nyquist frequency and still passes through all of the sampled points”. Now it becomes clear how we can derive the actual time domain resolution: we need to find out by what amount we can shift a signal in time without changing the resulting sample values.

Obviously (?) the limit we will face has to do with the bit depth. The time shift must be large enough for the resulting sample values to change by at least the quantization level, otherwise they would stay the same due to quantization.

I was once even bored enough to calculate that roughly. The result depends a lot on the signal and its level. For a sine signal, the limit gets smaller when frequency and or level is increased. So for a worst case of a 10 Hz sine with -60dBFS magnitude quantized at 24 Bits resolution, we get into a range of around 4 microseconds resolution. Which is already pretty coarse, but you wouldn’t hear that sound anyway. For a more realistic 100 Hz sine at -20 dBFS we are in the range of 4 nanoseconds. By the way, the samplerate doesn’t even show up in the equations!

Although this is a very rough estimate for a very artificial situation, I would suggest that we worry a little less about that in the future. In practice, the actual resolution will be limited by analog noise anyway. However, what we might need to worry about instead is the importance of a good signal to noise ratio – especially at low frequencies – for the spatial impression of a stereo recording!

So it seems like today we’ve pushed another elephant out of the room (but I guess I just let another one in). Soon we should be able to get to the real issues. Stay tuned for more!

  • Waldbaer

    This article really was an eye-opener for me, everything to be dedicated from the central point: The animation with the italic sentence below. Actually, I’m too lazy to verify this mathematically, but it definitely sharpened my view of the captured digital signal: I was one of the people simple thinking that the signal narrowed by simply connecting the points by straight lines.

    So thanks for taking your time to write this blog, I’ll continue reading it and probably learn much more…

    The possibilities of interpolated delays sound interesting and should be nearly endless, so I’d really like to experiment with that. Do you have any suggestions for .au plugins for such interpolated/fractional delays? I found this one, which seems to be great, but is only available for windows/VST until now: http://www.forward-audio.de/fa-fractionaldelay/

    • Hi Waldbaer! Sorry for the late answer. Eventide recently released a plugin for precise sub-sample delaying: Precision Time Align

      I think it’s a little expensive for what it does, but you might want to try the demo.

      Anyway, I don’t think there’s much use in practice for precise fractional delays in a music production context. Inside DSP algorithms of course, they’re very relevant.