9

High Resolution Audio – The State of the Debate

A recently published meta-study confirms that listeners are able to discriminate high resolution audio recordings from standard ones, at least to some degree. It pays to take a closer look.

Probably the most intensely fought debate surrounding digital audio is if the standard sampling rates for distribution formats like CDs and DVDs (44.1 or 48 kHz) are really sufficient. Claims that recordings made at double or even four times the standard rates have a significantly higher quality are all over the place. From users as well as – obviously – from companies that sell equipment or high resolution recordings.

On the other hand, there is hardly any scientific evidence that the advantages of high resolution audio are really that drastic. Neither properly conducted blind tests, nor human hearing research show a clear picture.

There is no evidence that humans can perceive audio frequencies above 20 kHz (at least not at healthy and practically relevant levels). And statements about a lack of time resolution are usually based on a misunderstanding of the sampling theorem. Blind listening studies so far lead to mixed and arguable results in both directions.

I find this contradiction extremely fascinating, especially because it’s situated right at the outer limits of human hearing abilities. Clearly, to the average person listening on average equipment, there’s no difference. But what about a well trained listener using high-end equipment?

Recently, a study published in the AES Journal created some buzz around this very topic. In some places, it’s celebrated as the final proof for the superior quality of high resolution audio recordings. But let’s take a closer look.

Looks Can Be Deceiving

The study I’m referring to is Josh Reiss’ “A Meta-Analysis of High Resolutions Audio Perceptual Evaluation”, published in the June 2016 issue of the AES Journal. Being an Open Access paper, it’s also available to non-members. But before we dive right into it, there are a few interesting things to note.

If you first have a short look at the press releases of both the Queen Mary University and AES, there’s a lesson to learn about the importance of peer review. In both press releases, Reiss is cited that “our study finds high resolution audio has a small but important advantage in its quality of reproduction over standard audio content”. This is an interesting conclusion, as the actual study doesn’t say anything like that anywhere.

The thing is, the study is about the ability of listeners to discriminate high resolution audio from standard formats. This doesn’t imply any judgement about subjective preference or even objective quality. So the press releases around the study draw a much stronger conclusion than the actual study ifself.

The AES Journal is a peer-reviewed publication. That means that before a paper is published, it is reviewed by a board of independent scientists. This process ensures proper scientific rigor and stringent presentation and interpretation of the topic at hand. Until publishing, such a paper usually goes through a couple of iterations where reviewers submit comments and critique, which the author then incorporates into his work. Obviously, to meet the reviewers’ quality criteria, Reiss had to chose his words much more carefully. For good reasons, as a look inside the study reveals.

A Meta-Study on High Resolution Audio

So what’s in it? Reiss did a so-called meta-study. This means that no actual experiments are conducted, but the results from a large set of previous studies are compiled, processed and evaluated. The purpose is to have a broader sample to analyze with statistical methods in order to either confirm or reject a hypothesis with more confidence.

This method is mostly used in medicine and pharma research, and it’s the first time that such methods have been employed in the field of audio engineering. In this case, Reiss evaluated a total of 80 different studies, from which 68 were rejected for a number of reasons concerning test methodology and other factors. That leaves 18 studies to be incorporated in the meta-study.

It’s really worth reading the paper as a whole, as it illustrates some of the challenges behind scientifically sound blind listening tests. I told you it’s a lot of work! But for now, I’d like to pick a few bullet points that I think are at the core of the question if we desperately need high resolution audio.

Significance

First, let’s be clear about what outcomes to expect. The task for a listener in these tests is to tell if a recording is different from another. The usual way to do this is by something like ABX testing, where three recordings are presented to the listener. One recording (X) is the reference, and the listener’s task is to pick if either A or B is the same as X.

If there is no perceivable difference, the outcome would be random. In this case, listeners are on average expected to be right in about 50% of trials. A score less than 50% would mean that they reliably picked exactly the wrong one (which would hint to a faulty experimental design).

An outcome of more than 50% hints to an actual ability to perceive the difference. But again, looks can be deceiving. It is entirely possible to get a score of for example 75 % by rolling dice. The question is, how high is the probability that we got this result by chance? Simply speaking, a result is statistically significant if the probability that it’s a result of pure chance is very low.

Results

Most of the 18 studies at hand actually had results near the 50% mark, which suggests that there is no audible difference. But there are a couple of studies that fall around the 60% mark. Interestingly, these around-60% studies were ones where participants were trained for the discrimination task beforehand.

That means in essence: trained or “expert” listeners are able to discriminate standard and high resolution audio recordings about 60% of the time.

Thus, statistically speaking, it is likely that there is an audible difference, although it’s still quite hard even for trained listeners to reliably discriminate.

Well, you can draw your own conclusions. But to me, listening to music is not a sport. Obviously, it takes training and likely a lot of concentration to have a small chance of discriminating standard and high resolution audio recordings at least a bit better than rolling a dice. This has nothing to do with recreational listening.

Nevertheless, as a researcher and audio enthusiast, I’m eager to find out what exactly this audible difference is.

But the most important thing here: telling the difference is not the same as proving that high resolution audio is really superior, as is claimed in the aforementioned press releases. I’ll tell you why.

Alternative Interpretations

I mentioned before that we currently don’t have a good idea exactly why we would need high resolution audio and what exactly is better about it. It can’t be for the extended frequency range, it can’t be for time resolution. There might be something about pre- and post-ringing of ultra-steep reconstruction filters. But even there it’s not entirely clear what the audible artifacts of that could actually be like.

So let’s look at it from the other direction. Is it imaginable that high resolution audio actually creates more problems than it solves?

To me there are currently two hot candidates. The first one is that most digital to analog converters are able to operate at different sampling rates. The thing is that different operating conditions might lead to different characteristics. In his book “Mastering Audio”, mastering guru Bob Katz describes some interesting findings from extensive experiments that seem to show that exactly this might be the case. Perceivable differences between standard and high resolution audio recordings seem to be less likely when digital to analog converters are operated at high sampling rates the whole time and limiting the audio resolution by low-pass filtering.

A second issue that’s worth thinking about is intermodulation distortion. That’s a type of distortion where frequency components interact with nonlinear distortion so that not only harmonic overtones, but difference tones are created that occur in a lower frequency range (don’t worry, there’ll be an article about that). Over at Ian Shepherd’s excellent site there’s a nice and short explanation of the problem. Visit Xiph.org for a more elaborate view and some audio files to test it with your own equipment.

The thing is that inaudible high frequency content can lead to noise and artifacts in the audible frequency range. This doesn’t happen if the inaudible but noise-inducing stuff is removed in the first place. Even more, nonlinear distortion of audio devices often increases at high frequencies.

To me, these two issues alone are much more likely to cause results like those described in Reiss’ study.

So what do you think about high resolution audio? Let’s discuss in the comments!

  • Hi Christian, you know I like reading whitepapers and and so, but my passion for high definition audio is driven by my ears not in reading metastudys and abstract arguments from collegues. If you don´t like 96/24 why just don´t use it? …it´s cheaper 🙂 Don´t blame it and tell people they are stupid because theorie is the truth and all is sayed with shannon. Anyway I miss your personal experience reading the article. You´ve addes a link to the web listening test …I could´t resist and made it with an ipad 2 air in a leather box at the morning toilett …no yoke sorry for that :-). …I will attach the screenshots. Regards Bodo

    • …screenshots

      • Christian Luther

        As a funny side note, when I made the test the other day I quite reliably picked the 320k one over the uncompressed. 😉

    • Christian Luther

      Hi Bodo! That listening test is about a whole different issue, as I said. And yes, it isn’t spectacularly hard to do. But anyway, congratulations!

      What I do here is expressing my opinion and backing it up with what I consider logical reasoning. I don’t care if you change your mind or not, you can do whatever you want with this information. It’s not like it’s a matter of life or death.

      In my eyes, there’s no point in bragging about how good ones ears are. I’ve made experiments too where the difference was clearly audible. But that doesn’t say anything about quality or preference (I tend to not being quite sure about which one I like better in such tests). As the article says, there are several possible explanations to be further explored. However, you are free to ignore that.

      • …for sure I can ignore, but I like reading your posts 🙂

        I´m just a bit tired of the typical line of argument …”you can´t her it”
        Real scientists should face the truth that people can hear and feel it and serach for the reasons why!

        The AES metastudy can be a good point in the history to start the search in “why and what”

        • Christian Luther

          I didn’t say “You can’t hear it.”, given the evidence (AND my own experience) that would be ridiculous, wouldn’t it?

          What I’m saying is that given all I know at this point, the reason for an audible difference might not necessarily be that we hear ultra-high frequencies, but other side effects like the two described at the end of the article.

  • I record with 48khz usually. For the dead simple reason that it is the highest sampling rate I can get across ADAT Lightpipe without having less channels (I could go for 96khz, having only half the channels). Another not so obvious reason is: CPU power in my old and weak DAW computer. I use oversampling for dynamics, so actually they run at 96khz internally. Imagine a couple of tracks at 96khz: then you have a lot 192khz processing internally, which is expensive in terms of computing power. I think that it is important to work on the weakest member of the signal chain, especially if you’re on a budget. So I think a reasonable preamp, good AD converters and good mics are way more important for most people than going 96khz. And please don’t spoil it all with that cheap freeware reverb afterwards. 😉

    One thing that bothers me: Someone told me that with higher sampling rates, you gain dynamic range. I did not understand this. Why should that be the case? If it is true, can somebody exmplain?

    • Christian Luther

      It’s neither true nor false. Something in between. The dynamic range according to the definition stays the same. But if you double the sampling rate, the quantization noise (which is what limits dynamic range) spreads out over double the bandwidth. In total, dynamic range doesn’t increase, but the dynamic range in the original (lower) frequency range increases by 3dB. With noise shaping, it would be also possible to move more of the noise power into the (new) upper octave, increasing dynamic range in the lower part of the spectrum even more.

      But that doesn’t help much in practice. Dynamic range of a complete system is mostly determined by the weakest part. And that’s the analog circuitry these days, not the digital resolution.

      • I fully agree that this aspect doesn’t help in practise. In most productions, you will not even use the full dynamic range of your equipment, especially not in pop or rock music. It also depends on other things, e.g. noise on an instrument somewhere in the back of the mix will be hardly audible and even if more front, the hearing easily gets used to (constant, white) noise very quickly and you don’t perceive it any more. (Different if the noise pumps up and down together with a compressor.)

        My conclusion: More dynamic range is in practice not a valid argument for higher sampling rates.