pf logo

audiodiscourse.jpg (10290 bytes)


1.jpg (46543 bytes)

Auroville, 5
by Srajan Ebaen

Audiophiles familiar with the sound of live music agree: No matter how good our systems become, they never fool us into thinking we’re hearing the real thing. Our internal audio bullshit detector is surprisingly astute, acute, and accurate. Driving down a city boulevard with our car windows open, we distinguish—without the benefit of sight—a real band performing in a bistro from the sounds of a string quartet wafting out of the pad of a music lover enjoying his audio rig. Consensus about this near-instant ability to tell the real from the artificial is easy to come by. Just poll some seasoned audiophiles who attend enough live concerts to know the difference. They have worn hope and wallets thin at ever arriving at a convincing facsimile.

Still, explanations for how we so readily distinguish between fake and real—and, by extension, how even the best systems fall short—remain vexingly elusive. It’s like an adolescent asking his parents how to know when one is truly in love. The briefest and most honest answer is always "Trust me, you’ll just know." All other answers point at fragments of the truth—case evidence from one’s own experience—but fail to give the complete sense of satisfaction and accomplishment we’d enjoy if we really nailed the answer. Of course, that has never prevented parents from trying to explain. Nor audiophiles from ruminating. Since this frustrates any chances at real success, while absolving us from blame, we’re free to give some thought to the quixotic enterprise of asking why even the most carefully assembled, expensive systems are bound to fail if we expect reality of them.

(1) Unless microphone placement duplicates the location and precise angle of our ears during the recording process, we’re capturing something other than what our ears heard. Our biological microphones plainly aren’t suspended above an orchestra, shoved into the flare of a saxophone, or tickling a singer’s lips in extreme closeup. Nor do we possess more than two physical recorders to begin with, and those twin precision instruments aren’t spaced farther apart than about seven or eight inches. Rather, the contours of our inner ears, the shape of our skulls, and the angles of our outer ears precisely and uniquely calibrate them. From this we must conclude two things.

(a) The perception perspective of the very first step in the recording process (microphone placement and using more than two mics) alters the relationship of listener distance, reflection angles, and the concomitant effects of separation versus blending a real-life listener experiences. We’ve indelibly changed the original event in ways that can never be reversed.

(b) Even if we used two microphones embedded in a fake head to preserve perception perspective, its shape and size could only precisely model one unique listener.

(2) For argument’s sake, let’s posit that we do construct an artificial head and its outer/inner ear. Let’s further suggest that we place the microphone diaphragm in the precise location of the biological sound membrane. Clearly we would suffer severe comb-filtering effects imposed by the reflective shape and length of the ear canal and the shadowing effect of the head. These alterations would remain uncorrected by the natural ear/brain mechanism of a living human being.

This unconscious biological error-correction mechanism is a learned process and is unique to each person.It’s a program written in response to certain sensory stimuli that arose in that person’s infancy. It’s like learning to walk. Each organism develops its own patterns that soon become embedded (conditioned). It transforms effortful volition—thinking about each step, calculating how to perform it in response to the environment—into elegantly instinctual behavior. Even if two people were to share precisely cloned anatomy—as truly identical twins might—their internal brain computers and software would not match. They would still decode the data coming in via their identical ear pathways through different neurological reactions.

But there’s more. Even if the decoding mechanism was identical, how about the effects of consciousness? Our attention continuously filters what we perceive. Unlike a machine that’s not distracted by thought and emotion, and can attend to one "single-minded" purpose—responding to the sound pressure waves of a concert without preference, selectivity, shifts or lapses in attention—a human cannot. Compared to what the machine registers, the listener suffers severe dropouts. His attention is never just on the sound alone. Attention is infiltrated by thoughts, feelings, and other sensory inputs like sight, touch, and smell. Simultaneously, sounds, via cultural conditioning, exposure, and learned skills, are translated into music, whereas machines merely record the sounds, clinically and accurately.

Even if a listener could focus on the sonic event to the exclusion of "musical reconstruction" and all other senses, even if he could perceive all sounds equally, not giving preference or emphasis to certain ones, could his ear/brain mechanism record all sounds without any filtering, shaping, selecting, and mixing? Could a listener record without responding, since said response would introduce an immediate and uniquely subjective element? Clearly not. Human beings constantly respond to and interpret incoming sensory signals through the filter of both their superficial personalities and the deeper cellular mechanisms of body/mind that have been, and continue to be, conditioned by life.

To drive this point home, compare multiple generations of recordings (recording a recording of a recording of a recording) to your experience of listening to the same recording over and over again. Minus certain possible resolution losses, the machines will record the same event over and over in exacting sameness to create identical (or virtually identical) clones. However, our subjective experiences of hearing the same recording in sequence over and over again are anything but reruns. They are astoundingly different from each other. It’s thus plainly impossible to record the original event (as perceived, filtered, and altered by a listener) with a machine that concentrates solely on the aural dimension and doesn’t suffer the instability of human attention. Hence, the entire enterprise of "fidelity to the live event" is nothing but an impossible chase for fool’s gold. There are as many live events as there are listeners, and each of these events is far richer and more multi-dimensional than any microphones could ever capture.

Having successfully ridiculed this chase from a conceptual or "philosophical" perspective, let’s consider a few limitations on the playback end of things.

(3) Have you ever heard a live band in your living room? Do you know how loud a solo violin really is? Or a drum kit? Or a saxophone? If so, would you agree that attempting realistic playback levels with our stereo system almost invariably introduces distortion (dynamic compression, glare, brightness, a "something’s wrong, let’s turn this down") that hurts our ears? It’s as though our systems cannot reproduce realistic loudness without introducing reminders of artifice.

(4) Human hearing seems to be very sensitive to leading edges. Compared to how a live instrument slices into the air with unmitigated immediacy and directness, most systems veil, dull, or soften how they render the arising of sound out of silence. It’s as though they lack the necessary response speed to catapult sounds from zero to maximum loudness in a natural manner, unhampered by mechanical driver, crossover, or feedback loop delays and other technical limitations. Conversely, systems that minimize this leading edge handicap often sound too sharp, bright, and aggressive. Leading-edge fidelity as now attempted seems to introduce a new flaw that perceptibly misses reality.

(5) When a two-channel system reproduces more than two performers, all sounds originate from only two sources. On stage, there clearly are more. Conversely, a single piano plays back over two speakers whose combined vibrating surface areas are far smaller than the piano’s, and upright rather than vertical to boot. Stereo systems conjure phantom images that have to stand in for actual, individual sources of sound. I believe that inherent in this setup is one strong reason why, with our eyes closed, we can distinguish between sounds arising from, say, six individual singers versus a recording of six singers played back over two speakers, or perceive the discontinuity when we hear a single sound source (a solo performer) reproduced by two speakers. Never mind surround sound. In the live venue, active sound sources behind the listeners don’t exist.

(6) Only high-power amp/high-sensitivity speaker systems stand a chance of possessing enough inherent acceleration potential to attempt a dynamic range that accurately tracks the input signal across all frequencies without compression.

(7) Multi-driver speakers introduce phase and time errors.

(8) Electronic systems suffer harmonic distortion that subtly alters the timbres of voices and instruments.

(9) Oversampling introduces pre- and post ringing that is clearly measurable but absent in nature.

Someone more tech-savvy could significantly expand on this list to document all of the very real engineering limitations that face the audio designer when asked to keep up with reality. But as we’ve already seen, despite the most impressive technological advances, the expectation of ever capturing a live event completely is intrinsically at odds with reality. Plainly, the instruments in the service of such a task record aural events very differently from how we humans do. Considering the implications, it’s actually damn surprising how satisfying the "unreal" can be. That’s a function of how listener participation adds a transformative element. It disregards (or filters out) the reminders of artifice while adding (or enhancing) non-distractive elements that subjectively make the experience more real. A strong emotional response can arise even when the likeness rendered by the system is a mere skeleton to which our creative imagination must add meat.

Another way of saying this is that what makes up the complete listening experience contains elements that operate in dimensions (of consciousness, emotion, and attention) that are beyond the ability of instruments to measure or quantify. In the absence of such measurements, we cannot construct mechanical devices that can produce them. Listening to music will always remain a subjective enterprise. This fact justifies endless personal approaches to make it more pleasing or convincing, in accordance with the listener’s biological and psychological makeup. Cheers to the mystery of being human. It’s entirely beyond predictive measurability. The more subjective it becomes, the more real it turns, but only to the one experiencing it. Everyone else is free to disbelieve, disagree, and debate.

The Bible has it that when Pilate asked Jesus what the truth was, the Nazarene remained quiet, as though the truth could not be spoken, only experienced. So, a music lover’s truth arises in the silence of a mind less agitated and distracted. In this inner silence, sounds arise. Magically and without effort, we recognize them as music and are moved. As soon as we speak about this experience, it evaporates. Words and measurements cannot capture it or do it justice. Best to keep quiet. Good advice at the end of a short essay attempting to talk about a topic about which nothing conclusive can be said—except that it can’t. Still, those sensitive to the subject may sympathize with such impossible efforts. We do it whenever we try to spread the gospel about our beloved audio hobby. Perhaps it’s best to let the music speak for itself.

Visit Srajan Ebaen at his site