2

Know Your Ears – The Outer And Middle Ear

Sound travels a long way until we finally consciously perceive it. Even after arriving at our ears, it is converted, transformed and decoded multiple times. This is the first part of a series that walks us through the long journey from vibrating air to our consciousness.

Knowing the exact processes involved in turning moving air into perception of sound helps a great deal in making artistic decisions and realizing the effect that we intent to achieve. Many techniques and tricks in music production relate directly to the way the human auditory system processes sound and extracts information from it. Understanding these processes is key to creating more realistic, impressive and touching recordings. Apart from that, it’s amazingly fascinating!

A Quick Tour From Air To Brain

Let’s start first with a quick overview on the whole human auditory system.

Sound is a mixture of traveling waves of air vibration which arrive at our ears from several directions. These “capture” the sound and guide it through the ear canal to the tympanic membrane. This small membrane picks up air vibration like the membrane of a microphone.

Attached to the tympanic membrane is a chain of small bones which transports the mechanic vibration to the inner ear. The inner ear – or cochlea – itself is a spiral structure filled with liquid. The mechanic vibration is imposed onto this liquid, which flows around the lengthy basilar membrane.

The basilar membrane acts like a filter bank, where different sound frequencies result in vibration at different locations along the membrane. These vibrations are sensed by the hair cells that are distributed along the basilar membrane. They convert vibration into nerve impulses that are then transported to the brain stem.

In the brain stem, several specialized neural facilities process the nerve impulses to enhance and extract the information needed for speech and music processing and spatial analysis in the listening environment. The result is a large set of different representations of sound, which is further processed by increasingly complex neural networks of the brain that recognize voices and instruments, understand speech, analyze rhythm and harmony, retrieve associations with past experiences from memory and much much more.

In this series, we are going to look at all these stages and focus on how sound is changed and transformed along the way. Today we’ll start with the outer and middle ear.

The Outer Ear

The outer ear – or pinna – serves the function of capturing sound from our surroundings. It implements a directivity that enhances sound from the front, especially in the 2-4 kHz frequency range which is most important for speech. But there’s much more to it.

The probably most interesting property of the outer ear (together with the head as a whole) is that it has different transfer characteristics depending on the direction the sound is coming from. This way, additional information is added to the sound that can be later evaluated to analyze the environment.

These direction-dependent filtering characteristics are known as head-related transfer functions or HRTFs. They describe the filter characteristic imposed onto a sound arriving from a specific direction. Imitating these characteristics over headphones is a great way to create realistic 3D audio, for example for games and virtual reality applications. But unfortunately HRTFs vary highly from person to person, thus it is a challenge to find a one-size-fits-all solution without accepting compromises such as sound coloration or reduced precision of the spatial “image”.

But I digress. For now the essence is that the pinna enhances typical speech frequencies from the front and adds information about sound source location to the incoming sound. The detailed analysis of HRTFs is a whole field of its own.

The Ear Canal

Let’s got further into the ear canal. Its main purpose is the protection of the sensitive and damageable tympanic membrane.

Of course this line of defense can easily be overcome using tools such as cotton buds. But as a general guideline it’s a good idea to not stick anything into the ear canal that is smaller than a fist. There are enough ways to non-intrusively clean the ears, for instance with some warm water.

The transmission of sound through the ear canal is not neutral. Its shape leads to even more amplification of speech relevant frequencies in the 2-4 kHz range. While the pinna enhances this range only by a few dB, the boost through ear canal resonances can get into the 20 dB range!

The Middle Ear

At the end of the ear canal, the tympanic membrane takes up the air vibration and translates it into mechanic vibration. This vibration must now be transported further to the cochlea, which is essentially a liquid-filled cavity. This is done via a chain of three small bones: malleus, incus and stapes.

But why use this complex and sensitive construction? Why not connect to the inner ear via the ear canal directly?

The problem is different properties of air and liquids with respect to sound waves. In engineering, this is usually described as acoustic impedances, which differ strongly between the two media. This difference in impedance would lead to most of the acoustic energy being reflected at the boundary instead of absorbed, which would greatly reduce the efficiency of the overall system.

The middle ear bone construction solves this problem by adapting the impedances through the physical principle of leverage. It is optimized to enhance the energy transport from tympanic membrane to the inner ear. Think of it as a kind of input transformer.

But apart from that, these little bones – or ossicles – also serve a protective purpose. Attached to the construction of ossicles and tympanic membrane are two small muscles, the tensor tympani and the stapedius, which are able to change the tension of tympanic membrane and ossicles in order to decrease their efficiency. As a result, less energy is transmitted to the cochlea to protect it from damage through excessive sound pressure.

This automatic process is called the acoustic reflex. The reflex typically starts to trigger at sound pressure levels around 10-20 dB below the threshold of discomfort and it can reduce the level transmitted to the cochlea by up to 15 dB. This is a pretty heavy compressor built right into your ear! And not the only one by the way. Without this reflex, we would lose 15 dB of dynamic range.

What’s next?

So far we’ve followed the first stages of the auditory system that deal with transporting sound as vibration. Along this signal path, we already discovered a couple of EQs and even a compressor.

Obviously, sound gets treated heavily in and around our heads. Some of these effects are much stronger than the typical treatments used during music production. A 20 dB boost around 3 kHz and a compressor at 15 dB gain reduction surely belong to the class of heavy processing.

However, these effects can already help explain some higher-level phenomena such as the increased sensitivity to speech frequencies around 3 kHz or the effect that music sounds different depending on listening volume (the Fletcher-Munson effect).

This series will continue with more interesting sound processing tools built right into our auditory system, such as multiband compression, spectral analysis, correlation meters and more!

Image credit: perpetualplum via Foter.com / CC BY