pf logo

audiodiscourse.jpg (10290 bytes)


It Ain't Just Rocket Science
by Gary Schwede

This article was sourced for PFO by Clark Johnsen - thanks Clark!


A talk presented at the 21st Asilomar Microcomputer Workshop, April 20, 1995

Session Theme: Are We Digitizing The Richness Out of Life? This theme was suggested by a talk given by Neil Young at last year's workshop.

Introductory remarks by session chairman A. J. (Nick) Nichols (transcript approximate)

Gary Schwede has had a variety of experiences in science and engineering, including designing microcomputer systems, networking hardware, bicycle speedometers, and a variety of other digital and analog systems. He has, in fact, been a rocket scientist, building and firing rockets in the deserts of New Mexico. Most of his recent work has been in digital audio.

Speaker's Introduction

Because the raw bandwidth of audio signals, sometimes taken as about 20kHz, is small compared to other things like video, or processor MIPS rates, many people think it must not be very hard to deal with audio, relatively speaking. But audio is logarithmically wideband; that is, it covers a range of many octaves. That makes it much harder to process than the bandwidth alone might suggest. Leonard Laub just gave us an excellent introduction to some of the problems of digitizing continuous inputs in general. I'm only going to talk about compact disk audio, the CD standards, and I'll only talk about playback, not recording or processing. I won't talk about lossy compression, which is a big issue, as Leonard mentioned. I want to tell you about the problems with CDs and some ideas I have about making things better...

Fundamentally, digital computers are symbol manipulators.

Our digital computers use just two basic symbols. That way, we keep the state and operation of digital machines so far from the world's pervasive and haphazard perturbations, like thermal noise, that we can be very sure that the symbols we build up from those bits are stored and manipulated without error.

Furthermore, by building up machines that use elementary, robust, and, therefore, fast binary decisions, we can have very complex processing and very dense storage. So it is natural that we try to capture and transform even the things that really matter in the world to the domain of digital symbols.

That's what we do when we digitize.

But what really matters in the world? What is the "richness" in life?

I will say that the richness of the natural world, in contrast to the world of logical manipulations, manifests itself in ever-unfolding layers of fractal complexity and continuous variations. This is true also of some of the ways in which we communicate with one another, for instance, by making and listening to music.

For most people, music is far more than just sound. The true meaning and value of music are not evident in the artifacts of its creation or recording. A Stradivarius is an elegant and wonderful machine in itself, but Perlman's Strad is not what brings tears to my eyes when I hear him perform the Brahms Concerto in D.

And it certainly wasn't the signal-to-noise ratio at the University of New Mexico Arena (accurately known to the locals as "The Pit") that blew me away in 1973, the first time I went to a rock concert, and I heard Neil Young sing "Out on the Weekend."

I remember that with crystalline clarity. In my mind's ear, I can still hear it today. It was an important event in my life.

*Making and appreciating music must be far more subtle and complex than we realize; otherwise, music would be a science, not an art.*

From time to time, I've been involved in various innovations in digital audio recording, processing, and reproduction. My primary motivation to participate in this field is not cash—it's damned hard to make a buck in audio—it's my conviction that music is one of the genuine riches of life.

So, it troubles me deeply to hear Neil Young say something like "The music died when digital was born."

And it's not only Neil.

When the compact disc was introduced, Philips's marketing campaign promised "Pure, Perfect Sound Forever." But it soon became clear that many musicians and other careful listeners did not like that variety of "perfection." Some complained that digital sound was "harsh" or "unnatural" or "unmusical" or even that it caused muscular weakness, fatigue, and, possibly, mental illness (although I suspect that is due to the program material...)

So, despite its technically excellent specifications and measurements, "Pure, Perfect Sound" sounds bad, at least to some people.

Confronting dissatisfaction with a technology that measures well, as engineers we have two basic choices (and this is a recurrent dichotomy):

Ignore, ridicule, marginalize, or, in the extreme, outlaw the criticism [1]. Reaffirm the standard models, and only embroider them; or—

Listen, without prejudice, to the critics and to the results, and try to understand what might be wrong with our models of either the technology or the desired product. If, in the course of this examination, deficiencies are uncovered, try to remedy them, improving the models and the technology.

In the field of audio engineering, an eternal flame burns with an intensity that rivals anything on the internet. The "Great Debate" pits so-called "objectivists" ("deaf, egghead meter-readers") against so-called "subjectivists" ("charlatans, flat-earthers, and snake-oil salesmen"). This flame has been burning since the beginning of audio recording, and shows no evidence of cooling off; indeed, the advent of digital audio has introduced a vast new fuel source.

The core of the controversy is measurement versus subjective evaluation. On the one hand, measurement is the essence of science. Measurement enables incremental progress in understanding the world and improving our technologies. On the other hand, subjective evaluation is the essence both of art and of value [2]. A product's worth is inescapably a subjective valuation.

The late Richard Heyser observed:

"One of the worst-kept secrets in audio engineering is that what we hear does not always correlate with what we measure."

The human auditory system is fantastically sensitive and still quite mysterious. We have only approximate models even of hearing—much less of appreciation. Music is a multidimensional experience, but the measurements we make are only two-dimensional plots of the variation of one fairly arbitrarily chosen abstract parameter against another.

About five years ago, I was engaged to design some very-high-quality digital audio processing and conversion equipment. One thing that interested me was the mismatch between the discrete-symbol-world of digital storage and processing and the continuous nature of the signals and the product we are trying to produce.

Another was the bizarre and (to some) seemingly irrational world of audiophilia.

Ever since the introduction of the CD, some audiophiles and manufacturers have been obsessed with "improving" CD sound. It has been claimed that discs made with gold metallization, rather than aluminum, are sonically superior. Installing rings of special damping material on CDs, cooling them to cryogenic temperatures, or painting their edges just the right shade of green have all been alleged to improve CD sound. Likewise, exotic materials or construction techniques applied to CD players are advertised to provide "more musical" reproduction—at a price, of course.

If we don't immediately reject these claims as absurd and false, and we admit the possibility that the audio output of any given CD player might not be perfect, then we must make some guess about why these things might help. An obvious, digitally oriented guess is that, somehow, more accurate data are being retrieved as a result of the "improvements."

To test that hypothesis, I went to a nearby Good Guys store and bought an inexpensive Sony CD player. I got a service manual for it, opened it up, and located error flag signals from three of the four stages of digital error-correction. I buffered these error flags and sent them to a counter which gave me totals for the errors at three stages of the ECC. I performed over 100 mostly-pleasant experiments with CDs from my collection. I played each disc all the way through, under various conditions, and recorded the error totals.

The results were boring. Unless a disc is noticeably damaged or dirty, or the player is deliberately jolted during play, incorrect data are essentially never emitted from the digital stage of even this inexpensive CD player.

The designers of the CD storage standard, the optical and servo mechanisms, and the error correction, and the manufacturing technologies in this consumer product certainly have accomplished their goals. They give us the right data, essentially every time. So the disc treatments cannot improve the data retrieval, and maybe the objectivists are right: "bits is bits!" and all CD players sound the same.

On the other hand, we don't listen to bits. We listen to continuous sound signals.

So what else might stabilizer rings, or green paint, or granite bases do? For that matter, what is it about a $4000 CD player that makes it so valuable?

The deeper I looked, the more problems became apparent. Digital audio isn't perfect. It's not even close.

(At this point in the presentation I have to switch to "egghead mode." It would be impossible to thoroughly explain the technical details in ten more minutes, so I'm just going to try to indicate some of the issues.)

The crux of the matter is that getting back from the noisy, but robust and predictable domain of bits to the quiet, fragile domain of continuous signals is prone to errors and distortions, quite different from those which arise in either domain by itself.

The problems are made much worse by some of the CD technology standards.

It's easy to notice some obvious shortcuts and deficiencies in my inexpensive example CD player. The disk transport servos demand huge currents, causing volt-sized transients on the DC power supply. Even though the analog section has its own regulator, it has only about 60dB PSRR. So at about the -100dB level, where properly converted and dithered 16-bit signals still matter, we will certainly get feed-through, ground noise, and magnetic coupling effects. In fact, power-supply noise is a very common problem in digital audio systems. Digital-to-Analog (and A-to-D) converters rely on stable voltage references, and so they are notoriously sensitive to power supply noise.

Louis Fielder of Dolby Labs [3] has developed specific perceptual models for "stringent applications" in audio. These suggest that truly high quality audio conversion requires dynamic range of about 122dB, and that distortion tones 129dB below the maximum input signal are perceptually significant. That means 16-bit converters are woefully inadequate. -129dB is a power ratio of one part in 2.8 million. Measuring distortions that small in the enormously dynamic environment of a music signal is beyond the state of the art, even if we knew exactly what distortions to look for.

In recovering a continuous signal from the stored digital information, we get literally dozens of new sources of noise and distortion, either worse than or totally absent from analog recording technology. [4]

Reconstructing an analog audio signal is especially difficult because of the log-wideband nature of audio combined with the very fast current steps produced by the DAC. This can be attacked by minimalist and wideband design of the current-to-voltage converter which follows the DAC. But low negative feedback tends to make the IVC more sensitive to fed-through and radiated digital noise from the optical system, demodulator, ECC, and microcontroller.

The 44.1kHz sampling rate of the CD standard leaves a ridiculously narrow antialias guard-band. It is impossible to avoid huge phase errors while still rejecting the folded alias of the audio signal. That is what led to the development of oversampling digital filters preceding the DAC. Oversampling makes rejecting the alias energy easy, but introduces round-off noise and another source of conducted and radiated digital noise just where we don't want it: near the DAC.

(While we're on the subject, the various "one-bit" converter technologies are extreme examples of oversampling. Analog stages following sigma-delta converters have proportionally faster transient edges to handle, and the digital noises from their switched-capacitor filters are horrendous. It is not uncommon to find 100 mV of 58MHz noise pollution everywhere in the same box with a sigma-delta audio DAC.)

To isolate digital noise from delicate analog stages, most high-end systems put the disc drive in a separate box from the digital-to-analog converter. That really helps, but it introduces another mystery, and another problem.

The digital bitstream must be sent from the drive to the converter. There is a standard interface to do that, using a single self-clocking bi-phase serial channel. The interconnect can be an optical fiber, a single-ended coaxial cable, or a balanced wire pair. The mystery is this: some people claim to hear big differences in the sound depending on the construction of the digital interconnect! Are they nuts, or is something else going on?

That's as good an intro to as any to the topic of clock jitter. Jitter has become the all-purpose digital audio nemesis. After the word got around that data errors were rare enough to ignore, the effect of every tweak and topology was somehow related to reducing clock jitter. But what is jitter, anyway?

The single serial bitstream actually contains two kinds of signals: the explicit digital sample data, and a buried, continuous signal, the steady clock that will eventually determine the timing of conversion of each sample back to analog.

(Remember the difference between robust digital symbols, immune to small perturbations, and delicate continuous signals, that can be corrupted by tiny errors.)

While the data are correct, any tiny error in the regularity of the conversion clock (that is, the jitter of that clock) causes a distortion of the converted signal. Jitter intermodulates in a particularly nasty way, producing FM sidebands around every frequency component in the audio signal. To avoid the problems we know about, jitter should be kept to 50 ps or less.

Because of the standard, the clock must originate in the disc drive unit. Born in squalor, it gets corrupted by all the nasty digital noises we're trying so hard to avoid. Power-supply noise makes it bad to begin with. As it propagates through divider chains and LSI chips shared with other functions, it accumulates still more jitter due to all the other activities in the vicinity and to logic threshold shifts in each gate it traverses. When the serial data are encoded and transmitted, the digital data themselves cause further threshold shifts in the clocking logic. Because standard optocouplers and RS-422 transceivers are not much faster than the bit rate, noticeably more jitter accumulates in the data link. And so on.

When the critical conversion clock finally makes it through the digital oversampling filter chip and is applied to the DAC, it has been corrupted by a long chain of digital logic. It's not too uncommon to see nanoseconds of jitter.

Sometimes, special circuits in the DAC box are used to try to clean up the damage to the clock signal. Many variations of this "reclocking" scheme exist, all to try to remedy something which could have been avoided in the first place.

If the system had been designed top-down, we'd have put the clock in the nice quiet analog section near the DAC, and slaved the digital drive unit to a buffered version going from the DAC box to the drive. That would avoid introducing jitter at the beginning.

Furthermore, if we had a higher-bandwidth interconnect for the data from the drive, we could put the digital oversampling filter where it belongs, in the digital part of the system, and not with the sensitive analog part.

In conclusion, I'm sure Neil is onto something.

We do not know really how to measure the quality of the final product. Ears and brains (and maybe souls) are far more sensitive than our instruments, and they are sensitive to dimensions other than those we measure.

In addition, there are many problems with digital audio standards and technology, including at least the following:

  • 6-bit samples are inadequate.

  • The sampling rate is too low: it forces us to put a digital filter in intimate congress with the converter circuitry.

  • The self-clocked serial interface is a terrible topology. The drive should slave to a clock from the converter. Only symbols should flow from the drive to the converter.

So music is a lot harder to digitize and to recover than we thought.

And Yet—if we are open, and we listen, and we think, we can improve the real product, music, which is truly one of the riches of life.

Notes and References

[1] Lest you think I'm joking: David Clark (not me, but the other David Clark - that is, I am this David Clark; Dave Clark, PFO Editor, and not that David Clark!!) and others have worked with politicians in the state of New York to try to outlaw the sale of audio equipment advertised (or even reviewed) in terms which are not measurable by double-blind testing. This is to be done in the name of "consumer protection."

[2] The recognition of subjective marginal utility as the essence of value is what separates a self-organizing, specialized, market-oriented economy from a centrally-controlled, dysfunctional one.

[3] Fielder, Louis D. (Dolby Laboratories, Inc.) "Human Auditory Capabilities and Their Consequences in Digital-Audio Converter Design" AES 7th International Conference, Toronto, preprint 4.A.

[4] Schwede, Gary W. "Mechanisms of Unmusicality in Compact-Disc Sound: Measurements, Observations, and Conjectures" Preprint 3145 (E-7) Presented at the AES 91st Convention, 1991 October, New York.

Q & A (transcript is approximate):

Q: Surely you don't really think the green paint could possibly affect the data?

A: No, the data are correct, but there is at least a possible mechanism for influencing the sound. If the paint absorbs some of the laser light bouncing around inside the CD, that might make the tracking and reading mechanisms not have to work so hard. That, in turn, could reduce power-supply noise, and jitter. How you got the data might matter. But I don't know how to measure it, just offhand.

Q: If CDs are so bad, what do you listen to at home?

A: Well, I kept my LPs, and I listen to them fairly often. But CDs are very convenient, and I've built some of my own gear. Some CDs sound pretty darn good, to me, my ears, on my system. I enjoy both.

Q: [from a human-interface researcher] Why can't Neil Young pass a double-blind test, if his ears are so good?

A1 [Gary]: First of all, what I'm interested in is "What should we be doing as engineers?" not "Prove us wrong, then we'll listen..." And why should Neil Young be bothered to take such a test?...

A2 [from an audience member]: It could be that the situation, the laboratory setup, might interfere with his ability to hear what he's trying to hear...

A3 [Gary]: Anyway, no test can prove a negative—that he can't ever hear the difference. My point is that we should listen to him, not just challenge his insights.