High Resolution Audio – The State of the Debate
A recently published meta-study confirms that listeners are able to discriminate high resolution audio recordings from standard ones, at least to some degree. It pays to take a closer look.
Probably the most intensely fought debate surrounding digital audio is if the standard sampling rates for distribution formats like CDs and DVDs (44.1 or 48 kHz) are really sufficient. Claims that recordings made at double or even four times the standard rates have a significantly higher quality are all over the place. From users as well as – obviously – from companies that sell equipment or high resolution recordings.
On the other hand, there is hardly any scientific evidence that the advantages of high resolution audio are really that drastic. Neither properly conducted blind tests, nor human hearing research show a clear picture.
There is no evidence that humans can perceive audio frequencies above 20 kHz (at least not at healthy and practically relevant levels). And statements about a lack of time resolution are usually based on a misunderstanding of the sampling theorem. Blind listening studies so far lead to mixed and arguable results in both directions.
I find this contradiction extremely fascinating, especially because it’s situated right at the outer limits of human hearing abilities. Clearly, to the average person listening on average equipment, there’s no difference. But what about a well trained listener using high-end equipment?
Recently, a study published in the AES Journal created some buzz around this very topic. In some places, it’s celebrated as the final proof for the superior quality of high resolution audio recordings. But let’s take a closer look.
Looks Can Be Deceiving
The study I’m referring to is Josh Reiss’ “A Meta-Analysis of High Resolutions Audio Perceptual Evaluation”, published in the June 2016 issue of the AES Journal. Being an Open Access paper, it’s also available to non-members. But before we dive right into it, there are a few interesting things to note.
If you first have a short look at the press releases of both the Queen Mary University and AES, there’s a lesson to learn about the importance of peer review. In both press releases, Reiss is cited that “our study finds high resolution audio has a small but important advantage in its quality of reproduction over standard audio content”. This is an interesting conclusion, as the actual study doesn’t say anything like that anywhere.
The thing is, the study is about the ability of listeners to discriminate high resolution audio from standard formats. This doesn’t imply any judgement about subjective preference or even objective quality. So the press releases around the study draw a much stronger conclusion than the actual study ifself.
The AES Journal is a peer-reviewed publication. That means that before a paper is published, it is reviewed by a board of independent scientists. This process ensures proper scientific rigor and stringent presentation and interpretation of the topic at hand. Until publishing, such a paper usually goes through a couple of iterations where reviewers submit comments and critique, which the author then incorporates into his work. Obviously, to meet the reviewers’ quality criteria, Reiss had to chose his words much more carefully. For good reasons, as a look inside the study reveals.
A Meta-Study on High Resolution Audio
So what’s in it? Reiss did a so-called meta-study. This means that no actual experiments are conducted, but the results from a large set of previous studies are compiled, processed and evaluated. The purpose is to have a broader sample to analyze with statistical methods in order to either confirm or reject a hypothesis with more confidence.
This method is mostly used in medicine and pharma research, and it’s the first time that such methods have been employed in the field of audio engineering. In this case, Reiss evaluated a total of 80 different studies, from which 68 were rejected for a number of reasons concerning test methodology and other factors. That leaves 18 studies to be incorporated in the meta-study.
It’s really worth reading the paper as a whole, as it illustrates some of the challenges behind scientifically sound blind listening tests. I told you it’s a lot of work! But for now, I’d like to pick a few bullet points that I think are at the core of the question if we desperately need high resolution audio.
First, let’s be clear about what outcomes to expect. The task for a listener in these tests is to tell if a recording is different from another. The usual way to do this is by something like ABX testing, where three recordings are presented to the listener. One recording (X) is the reference, and the listener’s task is to pick if either A or B is the same as X.
If there is no perceivable difference, the outcome would be random. In this case, listeners are on average expected to be right in about 50% of trials. A score less than 50% would mean that they reliably picked exactly the wrong one (which would hint to a faulty experimental design).
An outcome of more than 50% hints to an actual ability to perceive the difference. But again, looks can be deceiving. It is entirely possible to get a score of for example 75 % by rolling dice. The question is, how high is the probability that we got this result by chance? Simply speaking, a result is statistically significant if the probability that it’s a result of pure chance is very low.
Most of the 18 studies at hand actually had results near the 50% mark, which suggests that there is no audible difference. But there are a couple of studies that fall around the 60% mark. Interestingly, these around-60% studies were ones where participants were trained for the discrimination task beforehand.
That means in essence: trained or “expert” listeners are able to discriminate standard and high resolution audio recordings about 60% of the time.
Thus, statistically speaking, it is likely that there is an audible difference, although it’s still quite hard even for trained listeners to reliably discriminate.
Well, you can draw your own conclusions. But to me, listening to music is not a sport. Obviously, it takes training and likely a lot of concentration to have a small chance of discriminating standard and high resolution audio recordings at least a bit better than rolling a dice. This has nothing to do with recreational listening.
Nevertheless, as a researcher and audio enthusiast, I’m eager to find out what exactly this audible difference is.
But the most important thing here: telling the difference is not the same as proving that high resolution audio is really superior, as is claimed in the aforementioned press releases. I’ll tell you why.
I mentioned before that we currently don’t have a good idea exactly why we would need high resolution audio and what exactly is better about it. It can’t be for the extended frequency range, it can’t be for time resolution. There might be something about pre- and post-ringing of ultra-steep reconstruction filters. But even there it’s not entirely clear what the audible artifacts of that could actually be like.
So let’s look at it from the other direction. Is it imaginable that high resolution audio actually creates more problems than it solves?
To me there are currently two hot candidates. The first one is that most digital to analog converters are able to operate at different sampling rates. The thing is that different operating conditions might lead to different characteristics. In his book “Mastering Audio”, mastering guru Bob Katz describes some interesting findings from extensive experiments that seem to show that exactly this might be the case. Perceivable differences between standard and high resolution audio recordings seem to be less likely when digital to analog converters are operated at high sampling rates the whole time and limiting the audio resolution by low-pass filtering.
A second issue that’s worth thinking about is intermodulation distortion. That’s a type of distortion where frequency components interact with nonlinear distortion so that not only harmonic overtones, but difference tones are created that occur in a lower frequency range (don’t worry, there’ll be an article about that). Over at Ian Shepherd’s excellent site there’s a nice and short explanation of the problem. Visit Xiph.org for a more elaborate view and some audio files to test it with your own equipment.
The thing is that inaudible high frequency content can lead to noise and artifacts in the audible frequency range. This doesn’t happen if the inaudible but noise-inducing stuff is removed in the first place. Even more, nonlinear distortion of audio devices often increases at high frequencies.
To me, these two issues alone are much more likely to cause results like those described in Reiss’ study.
So what do you think about high resolution audio? Let’s discuss in the comments!