Learn audio decoding and rendering with Cavern

Why is streaming sound quality actually poor?

The sound of streaming is usually lackluster compared to the disc versions. Contrary to popular belief, this has nothing to do with the codecs used. While codecs destructively throw away information like high frequencies or the precise amplitude of frequency components, this lossy compression never causes the drastic changes seen in streaming audio tracks. Changes of this scale are unfortunately intentional. The property that's nearly always changed, is called dynamic range, and we have a name for this, since it's been an issue since the early 2000s in the music industry. It's called the loudness war, that popularized the overuse of dynamic range compression.

What is the loudness war?

Simply mixing music and movies louder than ever before. In the music industry, this was first caused by the lack of volume regulation in radios. Louder songs stood out, people liked them better, and nobody wanted to be left behind. Each new song wanted to be the loudest, as that resulted in a better position on the charts.

Eventually, they reached a point where volume couldn't be increased further on the mixing side: signal limits were reached. However, this didn't stop the loudness war, the industry found a solution. Using dynamic range compressors (especially sidechaining), louder instruments and effects could gently be attenuated to lower volumes without distortion. This means the overall volume could then be increased. Just increasing the volume didn't cause irreversible damage, as listeners can control that in their home. Compressions used to be unfixable at playback before Cavern XD, as volume differences were already destroyed in the songs.

How did it reach the cinema?

The cinema is a controlled environment: it used to have a fixed program volume of 85 dB, a maximum of 105 dB, and constant oversight to keep it there. Cinemagoers already paid for the movie, and if it was mixed too loud, they blamed the director as the hands of the cinemas were tied. By the early 2010s, the oversight started to fade, and operators got the first chance to decrease the volume when needed.

As the screenings got fainter, even the directors not participating in the loudness war had to increase volumes. If they wanted to get their dialog level at the old 85 dB when the volumes were turned down to 79, they had to mix them 6 dB over where they used to be. This means there's 6 dB less headroom. The problem got so bad there are movies with practically no headroom, and Hollywood sound designers mixing with hearing protection.

This is only an advertisement and keeps Cavern free.

Ending the war, but...

Some movies bypassed this practice by mixing the dialog so low, cinemas had no choice but to actually increase the volume. Some directors simply just didn't bend the knee. Because home codecs had dialog normalization, and loud movie tracks were turned down automatically, the practice was not as bad as in music, but it still spread.

For music, around the late 2010s, when streaming became the new norm, the loudness war actually seemed to be accounted for. Major music streaming platforms enforced normalization, bringing every song to the same volume, and allowing for a huge headroom. This meant that dynamic songs using volume as an artistic tool, or emphasising punchy and lifelike drums, were rewarded by being allowed to use said headroom.

Moving from the cinema to streaming was not the only drastic change in the industry. Another transition was in progress, from TVs to phones. These small speakers were incapable of producing a dynamic range even remotely close to what's present in movies. This had to be accounted for on either the mixing or the publishing side. Dynamic range compression appeared yet again, stronger than ever. This was not a software option in mobile clients, even though it would have been possible. This was burned in the audio tracks, and got delivered even to the highest end systems.

Case study: John Wick: Chapter 3 - Parabellum

John Wick: Chapter 3 is praised for its exceptionally dynamic sound, and Cavernize objectively rates it as one of the best and most dynamic tracks of all time. While this is true for its streaming sound too, cracks start to form, and they are very visible. Let's take a look at a small slice of waveform from its demo scene, from the Blu-ray version:

This is an insanely dynamic track with over 20 dB of dynamic range, the optimal and true to life representation of gunshots. If this is too much for the listener, night mode implementations could properly account for it, reducing the impact. Launching the streaming version of the same movie paints a different picture:

A reduction of 6 dB, impacts just became half as powerful, the movie itself is fainter now. While this is still 14 decibels of dynamic range, which is a pretty good value, it's nowhere close to the original. This damage to a movie that was already affected by the loudness war (such as Annihilation, for example), would be severely detrimental. Turning the volume back up just makes things worse:

Everything is double the volume now: speech, cars in the background, footsteps, steam... This was practically irreversible with legacy tools, until Cavern XD.