POSITIVE FEEDBACK ONLINE
- ISSUE 43
Jitter in Digital Audio Data Streams
Jitter is an often-debated subject on the web audiophile forums. Like cables, there tend to be believers and non-believers. The goal of this treatise is to educate and share the current state of my jitter understanding.
Most audiophiles do not even realize that they have jitter until it is reduced. I liken it to looking through a window made of really old glass, when glass had ripples and bubbles in it. There is a spreading and distortion that widens and defocuses some images and creates an overall mild distortion. It is still obvious what is on the other side of the window, but it is not coming through with crystal clarity. Reducing (you will notice that I do not say "removing") jitter is like replacing the glass with a clean, flat piece of glazing. Things are now visible in great detail and with a "vividness" that was not there with the rippling glass. Jitter can be blamed for much of the "fatigue" that results from listening to some digital playback systems, just like it is fatiguing peering through rippled glass for any length of time.
Jitter has been with us since the inception of the CD format by Sony and Philips in 1982. It is a pervasive problem with all digital audio. It has prevented digital audio, both CD's and computer-driven-audio from competing with good vinyl and tape for decades. It is only recently that manufacturers have become aware of the problem and developed improved chips and systems to deal with jitter.
What is Jitter?
Technically, playback jitter is the inaccuracy in the timing of the "ticks" of the clock that transfers the samples of digital data into the D/A converter chip. To move data in a digital system from one point to another, it is usually clocked. In order for the D/A conversion to work, a new data word must be presented to the D/A converter periodically, or at a fixed frequency. The system clock or clocks do this. In an ideal world, each transfer of data to the D/A on a clock tick should occur at a precise point in time. If some data transfers occur a tiny bit early and others occur a tiny bit late, this is jitter. It is kind of like a timepiece that ticks away the seconds, where the duration of each second is not exactly a second, but over a large number of seconds, the timepiece is still telling accurate time. The duration of some second intervals is slightly short and other intervals are slightly long, but the average of all of them still gives accurate time. Likewise, when digital audio is played-back, the clock "ticks" that present each new data word to the D/A converter can have shorter and longer intervals from one tick to the next. The effect of this jitter on the D/A conversion and the analog waveform is frequency modulation, the same type of modulation that is used for FM radio. The difference is that with FM radio, the carrier frequency is fixed and the jitter is the music signal that is modulating the frequency of the carrier wave. With digital audio, the carrier is the music and the jitter is the modulation. This makes it a much more complex signal than even FM radio. The jitter associated with digital streaming audio is usually a mix of non-correlated and correlated jitter, correlated being that jitter that is somehow related to the music data or waveform and uncorrelated usually being random jitter. Jitter has both an amplitude and a frequency component. We will later discuss which of these I believe is more important.
Difference between Audio Streaming and other data transfers
I am often asked how does digital streaming audio and data transmissions to disks or printers differ. It is after-all only data being transferred from one point to another. Actually, there is more to it than this. Digital audio streaming is a "real-time" process, meaning that the actual timing of the transfer of each data bit from the source to the D/A converter is important and must be as precise as possible. Data transfers to disk or a printer are not real-time because there is no urgency for the data to arrive at the printer or disk to prevent errors from happening. The data arrives whenever it does and then the device does its job with the data, either writing it to the disk or storing it in a print buffer. If the data does not arrive in time to be written on a particular sector of a disk, the hardware just waits for the disk to rotate again. Streaming audio data on the other hand must arrive at precise time intervals in order that the D/A device create an accurate representation of the original recording. If it does not "keep-up" the pace, then dropouts will occur and the D/A converter will fall out of sync. The clock that moves the data into the D/A cannot be missing any "ticks" and each tick must be precisely placed in time. The audio data transfer must include both 1) accurate data and 2) accurate timing, whereas non-real-time transfers only require accurate data.
The recording of digital data is essentially a periodic sampling of a voltage (the voltage being the music waveform created from an instrument or microphone), where the period in theory is very precise. These captured sample voltages are then converted to digital data words and stored in memory or on recording media. There is no timing information actually stored in the data samples, but the timing is implicit in the samples themselves. There is however control information, which specifies the sample-frequency that should be used for playback, and other info such as pre-emphasis and word-length. If the timing that captured these data samples included jitter, then this is a characteristic of the samples and cannot be realistically eliminated during playback. This recording jitter is always there in the music file. Playback jitter is another matter however.
Playback Jitter Contributions
Playback jitter originates from a large number of contributors, which are usually additive. These range from the master clock, which has its own jitter, to logic devices, to mechanical systems for spinning a CD. One digital cable can even add more jitter than another. Each contributor adds more jitter to the signal as it makes its way to the D/A converter. This summation of this jitter is the system jitter.
Here is a lengthy, but probably not complete list of jitter contributors, including how each of these can or might add jitter to a digital audio system:
Jitter and USB
There is much misinformation on the forums about USB for audio streaming. USB is a fairly jittery interface on it's own. Some of the integrated circuit devices that were created to provide easy plug-and-play USB audio interfaces don't do enough to reduce USB jitter IMO. Many manufacturers adopted these plug-and-play devices to quickly and cheaply add USB to their DAC products. Unfortunately the less-than-stellar reviews that ensued had some of these manufacturers regretting these decisions. This gave USB a bad name in many circles.
Fortunately, there are other low-jitter USB interfaces available now that not only support 24/96, they even compete with the best CD playback devices. In 2009, I believe we will see USB support for 24/192 and even lower-jitter interfaces. USB is IMO the wired audio interface that will be most prevalent in the near future.
Jitter and Networked audio
Networked audio (Ethernet), both wired and WiFi is a unique case. Because the data is transmitted in packets with flow-control, re-try for errors and buffering at the end-point device, it is not as much of a real-time transfer as USB, S/PDIF or Firewire. The computer transmitting the data packets must still keep-up" the pace to prevent dropouts from occurring, but the real-time nature of the transfer is looser. Unlike with other protocols, there can be dead-times when no data is being transferred. Networking also avoids the use of the audio stack of the computer audio system since it treats all data essentially the same. This avoids kmixer on XP systems and the audio stacks on Mac and PC Vista. Because of the packet-transfer protocol of Ethernet and data buffering at the end-point, the jitter of the clock in the computer is a non-issue. The only clock that is important is the one in the end-point device. Examples of end-point devices are: Squeezebox, Duet and Sonos. This would seem to be the ideal situation, which it certainly is. The only problem that can occur is overloading the network with traffic or WiFi interference, which may cause occasional dropouts. The problem for audiophiles is that the majority of these end-point devices were designed with high-volume manufacturing and low-cost as requirements, with performance taking a lower priority. As a result, the jitter from these devices is higher than it could be. It should be the lowest of all the audio source devices available.
Jitter and Re-Clockers
There are a number of re-clockers available, some older and some newer technologies. There are three main types of re-clockers:
1) A true re-clocker that uses a free-running oscillator and stores data in a buffer
2) Re-clocker that uses a series of PLLs (Phase-Locked-Loops) to reduce jitter
3) Re-clocker that uses ASRC (Asynchronous Sample-Rate Conversion) to reduce jitter (also a PLL)
You will notice that I always use the terms: "reduce jitter" or "extremely low jitter", even with my own products. This is because it is impossible to completely eliminate jitter, contrary to the claims of some manufacturers.
The true re-clocker (1) can deliver the lowest jitter of the three types because it is not influenced by any outside signals.
A series of PLLs (2) will reduce jitter, but PLLs are affected by the jitter in the input signal to some extent. The more PLLs that are cascaded and the lower the filtering of the PLL loop filter, the better the jitter reduction will be. Some high-end DACs use this technique.
The ASRC up-sampler of (3) is somewhat sensitive to incoming jitter and has the disadvantage of changing the data by up-sampling it. If you don't like the sound of that particular hardware up-sampler, there is nothing you can do about it. Examples of this are in many modern DACs.
Re-clockers of the same type are not all equal either. The jitter of the master-clock in the re-clocker can vary. The design of the circuits, the power sub-system and circuit-board layout has a huge impact on the performance of a re-clocker. In order to achieve extremely low jitter levels, all of these disciplines must be mastered and the implementation must be flawless.
Low-jitter clock technology has improved dramatically in the last 2 years, so newer re-clockers will usually take advantage of this.
Jitter Correlation to Audibility
The correlation of jitter measurements to audibility is in its infancy IME. The problems start with the characterization of jitter. Generally, manufacturers of crystal oscillators specify jitter in terms of RMS jitter amplitude. The problem is that they often neglect to state that this is specified at 10kHz and higher. There is also no spectral or frequency content information specified. This makes it very difficult to tell which oscillators will have audible jitter or objectionable jitter.
For instance, Empirical Audio uses two oscillators that are both specified at 2psec RMS jitter. The two oscillators sound radically different to me when used in a re-clocker in a resolving audio system. This leads me to believe that the spectrum, or frequency content of the jitter is as important or maybe even more important than the amplitude. I also believe that correlated jitter or jitter with a relationship to the data pattern or audio signal is also more audible than random jitter. This seems to be the consensus in a number of AES papers.
Studies by the AES (analysis, not human testing) conclude that these are the thresholds of audibility:
 120psec P-P jitter audibility threshold for 16-bit DAC and 8psec P-P jitter audibility threshold for 20-bit DAC
 20psec P-P of data-correlated jitter audibility threshold at certain frequencies and "A simple model of jitter error audibility has shown that white jitter noise of up to 180psec P-P can be tolerated in a DAC, but that even lower levels of sinusoidal jitter may be audible"
Since many measurements (that don't specify any particular frequency content) performed by Stereophile in  are above 150psec or close to this, I do not believe that we have reached the limits of jitter audibility yet. I suspect that P-P jitter needs to be almost an order of magnitude smaller, or around 15psec to be inaudible in all systems.
I believe the ability of the human ear/brain, particularly the trained ear, to hear minute differences, particularly data-correlated jitter, is grossly underestimated. The live listening AB/X studies published to date (that I have read) are inconclusive IMO. The systems used were not resolving enough IMO, the recording quality was not good enough and the test signals were random and not correlated and therefore inadequate to properly test for jitter audibility. I tend to believe the numbers arrived at by the AES analytical studies rather than the A/BX listening tests.
There are a series of double-blind tests being performed by many audiophiles using synthetic jitter tracks provided by HDTracks. These may shed some new light on true audibility. Again, the effectiveness of these experiments is only as good as the quality of the tracks provided, the jitter that was synthesized and the audio systems that are used for testing. The results from the first set of jitter tracks shows just how unresolving most audiophile systems are. There are couple that could pick out the majority of the tracks by increasing jitter, but the majority could not hear any difference between the tracks, even though the jitter ranged from 0 ns to 1000ns I believe.
Another interesting thing about audibility of jitter is it's ability to mask other sibilance in a system. Sometimes, when the jitter is reduced in a system, other component sibilance is now obvious and even more objectionable than the original jitter was. Removing the jitter is the right thing to do however, and then replace the objectionable component. The end result will be much more enjoyable.
Jitter can even be euphonic in nature if it has the right frequency content. Some audiophiles like the effect of even-order harmonics in tubes, and like tubes, jitter distortion can in some systems "smooth" vocals. Again, the right thing to do is reduce the jitter and replace the objectionable components. It is fairly easy to become convinced that reducing jitter is not necessarily a positive step, however this is definitely going down the garden path and will ultimately limit your achievement of audio nirvana.
Sibilance in a system caused by preamp, amps and other components and cables can also be so high that changes in jitter are not very audible. This is why there is such contention on the web forums about jitter and its importance. What matters in the end is if you are happy with the sound of your system, and whether or not you can hear this distortion.
Commonly asked questions about jitter
Q: What is the format for transmission over S/PDIF and does FLAC or AIFF affect the jitter of this?
A: All formats stored on disk result in the same data-stream over S/PDIF. These are converted by the player software before they are transmitted. The transmission formats are different than the stored formats. Transmission of digital data is specific to each interface, USB, Firewire or S/PDIF. Once these are received, they are all eventually converted to S/PDIF and then I2S or directly to I2S bus. The jitter of the S/PDIF signal is in theory independent of the stored data format, but since software is generating the master clock, it can have an effect with some software and operating systems when using interfaces such as Firewire and USB.
Q: Why re-clock the digital data?
A: Because jitter at the clock inputs of the D/A converter causes modulation of output analog signal from the D/A converter. This is distortion. This modulation is a function of both the magnitude of the jitter and the spectra (frequency) of the jitter. This is one of the things that makes digital audio sound "digital" and not analog, along with sample rates that are not high enough. The evidence of this is really obvious when you compare several DAC's to one another. With a high-jitter input signal, they all tend to sound radically different. With a low-jitter digital input signal, they all start sound very similar. Each DAC behaves a bit differently in the face of jitter, the simplest ones tending to sound the worst with high-jitter input and the best with low-jitter input.
Q: If there is no clock in my computer interface, why do I need to re-clock?
A: The output from the PC, whether it is USB, Firewire or S/PDIF from a soundcard or Mac has the clock embedded in the data-stream. The clock is generated by the computer clock or by a local clock on the sound card. There is always a clock, or the data will not be transferred.
Q: Is the jitter different if I use Losses format versus .wav?
A: Jitter is in theory independent of the format that the data is stored, however since the master clock in many computer audio systems is generated by software, these things can all have an effect both on jitter and absolute frequency as well. I don't rule it out anyway.
Q: Can I use I2S interface from my computer to reduce jitter?
A: I2S is not a native interface for a PC or Mac, so it must be generated from another interface, such as USB, Firewire or S/PDIF. I2S is the native interface for the D/A chip, so all interfaces must end-up converted to I2S eventually. I2S is not the original clock in the PC, but is synchronous to the original clock.
I2S was created by Philips when the CD format was invented. It is comprised of three or four signals, including SDATA, SCLK, L/RCLK and MCLK. These are the standard interface on most D/A chips. If an I2S interface is thoughtfully designed, it can achieve lower jitter than a S/PDIF interface. The advantage of I2S is that it includes all of the relevant clocks and the serial data.
Q: If my DAC already has jitter reduction, what difference will a re-clocker make?
A: Most DACs use ASRC (Asynchronous Sample-Rate Conversion) to reduce the incoming jitter. All of these devices up-sample or re-sample the data using a local oscillator. To track the incoming data the re-sampling device must use a PLL to track the incoming stream. Since the local clock has its own jitter and a PLL is utilized, there is new jitter added and the PLL still has some sensitivity to incoming jitter. Re-clocking just before the DAC input can still make a big difference in overall jitter.
Q: If the clock is not present, will an external DAC just assume the input to be as per its own clock? If the rip were done by CDROM using the same clock freq as a DAC, will this give any added benefit?
A: DACs don't have clocks in general. The only clocks in typical DACs are for upsampling. DACs rely on the clock embedded in the incoming data-stream, whether it is S/PDIF, AES or Toslink. DACs recover the clock(s) using hardware. If the interface to the DAC is I2S, then the clocks are discrete so they drive the D/A directly without needing clock recovery.
Ripping has nothing to do with the timing accuracy of a data file. It is simply data. There is data and then there is the timing of when the data is presented to the D/A chip. This timing is not stored on the disk. Only the data is stored on the disk. The timing is recreated at playback time. No relationship to the music timing or beat.
Q: Can the original information without any timing errors be reconstructed using an external re-clocker?
A: True re-clockers generate a totally new clock, which is synchronous or tracking to the original clock, but with lower jitter. The original information is only data, not timing. The data is not changed at all in a true re-clocker. The timing is only implied by the standard frequency that is used at recording time, when the analog data was converted to digital. If the A/D clock had jitter, then the recording timing will be inaccurate. This cannot be fixed once the data is stored as a digital recording. If the D/A clock has jitter, then the playback timing will be inaccurate. This jitter can be minimized with re-clockers, up-samplers etc..
 P. S. Lidbetter, "Basic Concepts and Problems of Synchronisation of Digital Audio Systems", presented at the 84th Convention of the AES, March 1988.