Quantization Noise and Bit Depth

In this early stage in the life of The Science of Sound, I’d like to cover some of the basics especially of digital audio, as these are a recurring source of confusion in many discussions. Today we’ll start with covering the „vertical“ dimension of digital audio: quantization noise and bit depth.

Ah, digital audio. Pretty much every controversial debate about audio I came across in my life somehow revolves about the analog vs. digital debate. However, taking part in such debates while studying the mathematical foundations of digital signal processing was very healthy for me. As the audio community is very creative in finding reasons why digital audio is in its very foundation totally messed up, I had the chance to take up on all the skepticism and dig really deep to find out what works and what doesn’t. Let me get one thing straight right from the beginning: digital audio is not in its very foundation totally messed up. But it is awfully counterintuitive and confusing in a lot of ways, and there’s a lot of ways it can go wrong. By the way, it’s absolutely no less counterintuitive and confusing than electronics. However, let’s relieve at least a little bit of the confusion.

Fixed Point Quantization

To represent a continuous signal in a computer, we need to chop it up in a finite number of pieces that we can then represent as bits and bytes in memory. As an audio signal is continuous in both time and level dimensions, we need to chop it up two times. The first step is chopping up the time axis by sampling, which will be covered in a future article. The next step is chopping up the level, which is done by a process called quantization.

The process is actually pretty simple. Say we have an audio signal that reaches at most the +-1 Volt levels, we need to measure the voltage at each sampling interval and assign the level we get a number. As mentioned, we can only represent a finite number of different levels. Using N bits to represent our levels, we have a pool of 2^N numbers to choose from. Our signal has a peak to peak voltage of 2 Volts, so if we decide to use 16 Bits we could represent this voltage range at a step resolution of ca. 2 \mathrm{V} / 2^16 = 0.03 \mathrm{mV}. That’s 65536 steps. With every further bit, the number of steps doubles.

Quantization Noise

Of course, the quantization to discrete steps introduces an error, which is at any sample time the difference between the actual level and the quantized level. You can think of it as an error signal that is added to the original signal, which is called quantization noise. To find out how big the problem is, we need to calculate the level of this noise. I’ll skip the derivation because I want to keep formulas at a minimum on The Science of Sound, but you can compute the signal to noise ratio, which is the level of the noise relative to the signal level, using this simple formula:

\mathrm{SNR} = N * 6.02 \mathrm{dB} + 4.77 \mathrm{dB} - \mathrm{crest}

Where „crest“ denotes the crest factor, which is the difference between peak amplitude level and average power level of the signal. This assumes a normalized signal where the peak level is 0 dBFS. For a pure sine, the crest factor is 3.01 dB, for music typically around 12 dB. Note that there are a couple of different variants of this formula, mostly assuming a full scale sine signal. I like the above definition more because it also works for practically relevant signals. Anyway, for music quantized at 16 Bits we’d get an SNR of around 89 dB. This is fairly nice, but a bit tight. Assume we play back the music so that the average level of the recording translates to a sound pressure of around 83 dB SPL, as is recommended because around that loudness range the audible frequency range is best (more on that later). That means that the quantization noise level is still 6 dB below the threshold of hearing in this case. Just about enough.

A Quick Overview On Dithering

Now we only looked at the average level so far, but we still don’t know how it sounds. And that is a bit of a problem, because the quantization noise is not just noise, it depends on the signal. The result of that is that the noise power concentrates at frequencies where there is signal and at harmonics thereof, especially with small bandwidth sounds like a sine or a decaying piano tone. If the signal is soft enough, these frequencies can become audible and sound similar to harmonic distortion. But there is a solution to this problem called dithering, you probably heard of it. It is often stated that dithering means just adding noise to mask the quantization noise, which is not true. Dithering is indeed done by adding random number just below the smallest quantization level before quantization. The does look like white noise, but a more accurate description is that the quantization noise power that may be concentrated at some frequencies is redistributed over the full spectrum, making the peak level in the noise spectrum smaller.

Dithering is a whole topic of itself, but I should note at this point that there are several ways to do it. They differ in the way the final dither noise spectrum is shaped. The simplest method results in a flat white quantization noise. More advances methods shape the noise spectrum to distribute more of the noise energy to frequency regions where the ears are less sensitive. Thus further increasing the effective SNR in the most sensitive frequency regions.

Now we already saw that for standard CD audio with 16 Bits we can achieve a signal to noise ratio that is just good enough to keep quantization noise below the hearing threshold. But to reach that we have to make the music as loud as possible, so we get the best possible SNR. We can do that when mastering for a CD, but working this way would be a pain when recording and mixing, where we typically deal with much more dynamic range and several signals at very different levels. Thus for recording and mixing, we usually deal with 24 Bit audio, which gives us an SNR of up to 137 dB for normalized music signals. That’s a quite nice reserve to work with. However we would still have to watch our gain staging as reducing such a signal in level and then turning it up at a later stage would raise the quantization noise level.

Enter: Floating Point Quantization

This is the number format that is mostly used today for actual processing. It’s the only practical format on desktop computers and also the format used in some modern DSPs. In most practical applications, 32 Bit float numbers are used. You can think of these numbers as a 24 Bit value (the mantissa) with an integrated gain trim that can operate in steps of 6.02 dB (the exponent). The arithmetic circuits inside a processor always make sure that all the bits in the mantissa are used up by adjusting the exponent accordingly. The result is that the signal resolution is always kept the same, which means that the quantization noise level is always relative to the signal level. There is thus no need to maximize the signal level to get the best possible SNR.

Still, the SNR of a 32 Bit float signal is the same as for a 24 Bit fixed point signal. But the dynamic range is absolutely ridiculous, more than 1500 dB! That alleviates the need for obsessive gain staging and reduces a lot of dynamic range problems. You can put several volume knobs in series, moving the signal up and down as you like, and the SNR always stays the same. By the way, this is the only case I can think of right now where dynamic range and SNR are a totally different ballpark. Usually, SNR is determined by some constant noise level and the maximum level before distortion, as is the case with fixed point audio and all analog equipment.

To wrap up the essence: in modern floating point DAWs we have to worry much less about things like gain staging and quantization errors. It is much harder to mess the SNR up than it was with fixed point systems, and we can often get away in much more situations without having to increase bit depth. But on the other hand, this fact can also be a source of too much sloppyness, for both developers and users. It creates the illusion that proper gain staging is a thing of the past, which is not true in many cases. I observe that too many software tools today are very ignorant towards healthy and reasonable operating level norms. For example many modern software synthesizer are far too loud!

If you’d like to learn a bit more about quantization, you should have a look at the Wikipedia article about it. You can also look at how the SNR formula for sines is derived in this white paper from Analog Devices.

What’s your gain staging strategy? Do you even care? Leave a reply in the comments!

  • Hi Christian and thank you for your great work here.
    I have learned very early in my work life that gain staging ist the key to good sound. I have noticed in my first jobs in sound reinforcement, that, even without any processing, the signals become so much more in your face and at the same time create less feedback (and other) problems if only the operating level of the used equipment is proper fed.

    So that became a rule to me:
    Soundcheck was gain check (and getting rid of equipment that destroys the sound rather than supporting it like lots of graphical EQs that time). And that was also clear in studio works: Feed the components adequate and never exaggerate it (that was when i learned that you cant get an idea of BD and HH levels wit VU meters…).
    And so there was the “in your face” sound that you need in order to make really sense of dynamic shaping.
    It felt like it would be exact the other way around the “opinion” of a majority that tried to get punch out of a signal with processing, that has not been there in the first place.

    And of course this was all taking place in the digital world, too.
    First, when noticed: Shit in = shit out: If there was no punch in the recording, there would never be punch in the mix…
    Then all the colorful plug ins flooded the market and made people even more dream of good sound without learning about the essentials.

    So yes:
    Gainstaging is essential in every stage of a production, no matter if analogue or digital, no matter of how experienced one is.

    I have two strategies that help me keeping the gains in a good range. Wait, there is a third one, which is on top of those two: I separate strictly between recording, arrangement and the mixdown. When it comes to mixdown I do only accept signals, that sound great and work well with the other ones. In my world, that has to be taken care of in the other stages.

    – The remaining things are easy: I do recordings (and bounces) with a maximum peak of -10db in general and of course less when signals are known to be critical- and keep those levels in the channel during the mixdown.

    – i push my DAW outputs on +12 when starting a mix ( i.e. Logics master and stereo out on +6) and reduce this to +6 after the first progresses. If i see red lights on the master, I correct on channels or busses.

    …And one last thing, that is more a general mixing advice: If you find a signal, that is too quiet- don’t push it but ask yourself what is in the way. I only push signals, if i want to attract attention to them and take them back if another signal takes over.

    So, like often in my comments- there is a little philosophy in between. I hope it helps anyway.

    • Thanks for commenting, Stephan! I think consistent gain staging and level calibration was one of the greatest revelations for me a few years ago. Calibrating my monitors and sticking to consistent levels improved my results by insane amounts!