Pro Audio Files

How Yanny vs. Laurel Reveals Flaws in How We Listen to Audio

If you’re anything like me, you spent most of May 16th, 2018 entrenched in debate with friends, family and coworkers. Is it ‘Yanny’ or ‘Laurel’?

Early in the morning, my wife asked me which I heard when she played the audio on her iPhone. I immediately said ‘Yanny’ and she looked at me, bewildered. I asked her, “how do you possibly hear Laurel?”

As someone who grew up infatuated with sound, and even studied it in college for seven years (Go Bees!) I was instantly fascinated with how we could each be hearing entirely different words. I spend most working hours of every day listening to and editing audio, and part of my profession involves an understanding that our society listens to media on playback devices ranging greatly in both quality and cost. So, I decided to take a listen on the monitors in my home studio, which certainly provide a better sonic experience than the speakers on an iPhone.

The word being spoken was clearly ‘laurel’. The version that you most likely heard originated from the vocabulary.com audio clip demonstrating proper pronunciation of the word ‘laurel’ (sorry, team ‘yanny’), and had probably been downloaded, re-uploaded and subsequently sonically mangled by the means with which we regularly share and consume media.

I then read plenty of articles and watched numerous videos explaining exactly why a certain percentage of the population heard one thing, and another percentage heard another. The gist is this:

Human speech is actually composed of many frequencies, in part because we have a resonant chest cavity which creates lower frequencies, and the throat and mouth which creates higher frequencies. The word ‘laurel’ contains a combination of both which are therefore present in the original recording at vocabulary.com, but the clip that you most likely heard has accentuated higher frequencies due to imperfections in the audio that were created by data compression. To make it worse, the playback device that many people first heard the audio clip playing out of was probably a speaker system built into a cellular phone, which is too small to accurately recreate low frequencies.

To illustrate, I filtered out the low and high frequencies of the original audio clips beneath. The clip with preserved low frequencies sounds more like ‘laurel’, the clip with preserved higher frequencies sounds more like ‘yanny’. I suggest listening to these on the highest quality speaker system you have access to, for the reasons I mentioned earlier.

 

 

ADVERTISEMENT

 

Here you can see the filters I used to remove the low and high frequencies from the audio clip, the orange areas represent the filtered portions of the frequency spectrum:

Filter used to remove eveything above 1.5 kHz

Filter used to remove eveything beneath 1.5 kHz

And here is a visual of how those filters changed the resulting audio:

Resulting spectrogram of high frequencies intact.

Resulting spectrogram of low frequencies intact.

This helpful interactive tool from The New York Times allows you to use a slider to more clearly hear one or the other. Pitch shifting the audio clip up seems to accentuate ‘laurel’ whereas shifting it down accentuates ‘yanny’.

In summary, this perfect storm of the human voice creating both low and high frequencies, the audio clip having been subject to data compression used to create smaller, more convenient files, and our tendency to listen out of devices with subpar playback components lead to an apparent near-even split of the population hearing ‘laurel’ or ‘yanny’.

So, while the scientific argument may be settled as to why this audio equivalent to ‘the dress’ happened, I’ve yet to see a single opinion stating why this is actually a problem for consumers and more importantly, artists. While data compression has steadily improved over the past decade, there are still billions of videos and songs with data compression (and therefore, inferior quality) ready to be consumed via the internet. Furthermore, with which type of device do we increasingly experience the internet? Cell phones.

Even if users opt to play music and other media out of desktop or laptop speakers, earbuds or consumer-grade headphones, these systems are inferior to the technology used to produce the music they’re listening to. The sad truth is that we write, perform, record, mix and master a piece of music with an acute attention to detail and precision during each and every step of the process, only to have our target audience ultimately listen to a degraded file, out of an inferior playback system. Additionally, the equipment that artists and engineers use to record and mix audio has become cheaper, smaller, and more accessible, but not necessarily better.

Although audio is my passion, I understand why everyone can’t be a purist. High Fidelity playback systems are expensive and not at all portable, and I love the near-instant access to a vast collection of music and media that the Internet provides us. It certainly seems that we as a society have chosen quantity over quality, and I can’t say I’d choose differently myself. My hope though is that we take this unique moment to realize that the means with which we consume media can potentially degrade the art we claim to adore, to the extent that its meaning is changed entirely.

Missing our best stuff?

Sign up to be the first to learn about the latest articles, videos, courses, freebies, giveaways, exclusive discounts and more.

We'll never spam you. Unsubscribe at any time. Powered by ConvertKit

Ian Vargo

Ian Vargo

Ian Vargo is a Producer, Mixer and Audio Professor based in Los Angeles. He has worked on numerous major label and independent records. Get in touch on his website or learn more from him in his new Mastering in the Box course.

Free Video on Mixing Low End

Download a FREE 40-minute tutorial from Matthew Weiss on mixing low end.

Powered by ConvertKit
/> /> /> /> /> /> /> /> /> />