FEEDBACK ONLINE - ISSUE 32
The Future Is LAME: The Truth About MP3
...LAME = Lame ain't an Mp3 Encoder
Shazzam! This article will have many people confused and bedazzled. Let 'er rip says the captain, err me: having used the LAME mp3 codec at its highest bit rate setting of 320 kbps, I am here to state that in most cases the mp3'ed LAME disc sounds better than the native .wav file. Shock. Gasp. Say whhhaaat? Yep, flush that Kool-aid chap and have a seat.
Let's back up for a minute and find out how we (I) got here. About a year or so ago, my good friend Walter Röhrl (Not his real name, but scout's honor: you know him under his real name, guaranteed! Who knows, if ya'll are nice, he may come out after all!) and I were talking about digital in general, when he dropped what I could only describe as "the bombshell": "I mostly listen to mp3's encoded via LAME at either the 320kps or 256kps setting; I find that this improves the sound quality of the original disc; you'd be surprised how good even 128kps can sound, provided you have a capable DAC." Fancy you saying that Walter, since your DACs are considered to be among the best in the industry.
My surprise was greater then you could imagine when I heard Walter utter those words. He practically challenged me to download Poikosoft's Easy CD-DA software package which contains all the necessary codecs and such to transfer your music collection to LAME encoded mp3s (or try any of the other codecs available: FLAC, Fraunhofer mp3, etc). You then convert those mp3s back to .wav files and burn them onto CDR—voila, you are now listening to better-than-the-original sound.
Now if Joe Schmoe would tell me to give this a try, I'd be suspicious—but Walter Röhrl? Alas, the results spoke for themselves: in particular, bass appeared tighter and punchier; there was that much more definition and spatial information, while treble became less edgy and smeared, sort of like cleaning your windows squeaky clean with a fresh rag and a bottle of Windex. This was a pretty impressive result for a codec that tosses out a bunch of data that we supposedly can't hear anyway.
The icing on the cake? Just mention "mp3" within esteemed audiophile circles and watch how it causes nothing but disdain, discomfort and (ultimately) disinformation being shared by those very same high-end "experts." I simply could not believe the results I was hearing, though I did ultimately accept them for being what they are: simply better then the original. Just the thought of tossing "Perfect Sound Forever" out the window and replacing it with a lossy codec ought to speak volumes for where we are with all things digital.
Lest you think that I am caught all by myself in this controversy, I did the safe thing—I shared my findings with some respectable audio colleagues. This went something like this: "In my hand I have a disc which contains 3 files—the original .wav, a corresponding flac and lastly the coveted LAME 320kps file. I will not tell you which is which; you will figure that out on your own." 9 times out of 10, all candidates involved pick the LAME file as the "best" sounding. (Conversely, if not ironically, the original .wav file was considered to be the worst offender). No joke. Bologna out the door. 99 bottles of beer on the wall be damned. Mind you, we aren't talking about 'el cheapo Best Buy combo amps and speakers; no sir, these are carefully crafted, setup and balanced reference systems from all walks of life.
As I began working more and more with these LAME encoded files, Walter Röhrl and I started to talk more turkey. In fact, Walter and his associates have done listening tests at mastering and recording studios, whereby multiple feeds were being monitored off the mic and console. DSD? 24/192? 16/44? Try again! Nope, apparently everyone involved would always pick the LAME feed as the best sounding. Now that must be a shocker—indeed, this is stuff like alien invasions, Area 51, Tesla having invented "beaming" machines, all swooped into one big pot.
Icing on the Cake
Fascinated by these findings, I proceeded to track down the gentleman responsible for much of the development that goes on at LAME. After all, what better place to get the info from then the horse's mouth: Gabriel Bouvigne. However, if you now think that Gabriel is in on the "Better Sound Through LAME" discovery (or controversy?!), you are on the wrong path. Such is the divide between mainstream audio and, well, high-end audio. Audiophiles are a bunch of freaks, looking for the next big thing that will propel us into ever higher nirvana; always waiting for the next golden goose. Gabriel, a pragmatist, doesn't believe all the mumbo-jumbo about LAME bettering the listening experience—he simply says "Placebo Effect." No matter how I approached it, his answers were always the same: there's no such thing as LAME making things sound better—it simply isn't coded with that in mind. (As you will see from the following interview, he does in the end concede—and therefore an open door for the possibility —that LAME can in fact alter the sound quality of oh so early and crappy digital).
What follows below are some notes about getting up and running, and an interview with Gabriel on all things LAME. Now, don't be "lame" and discard my findings—I am just the messenger. I report on what I hear. This is what I hear, so take it for what it's worth. I do believe however, that most people with a truly open mind will come to the same conclusion that I did. Just in case someone attempts to put words in my mouth, I must insist that my reported improvements strictly concern LAME-encoded files at the highest mp3 resolution of 320kbps. 128kbps mp3 on the other hand can sound quite good too (recall my write-up of Ray Kimber's DSD to 128kbps mp3 to DSD demo @ CES in January). That resolution, however, good as it may be, is a far cry from the performance improvement you will hear when dealing with 320kbps encoded files. As an aside: I have discovered that Ed Meitner's EMM Labs gear is perfectly suited for native mp3 playback, and you'll have the added benefit of listening to your mp3 discs with the very best DAC I know of. Now that's something. I guess this demonstrates the infancy of where we (still) find ourselves with all things digital. It took us almost 30 years to get 16-bit/44.1 KHz right—what's in store for LAME and professional quality mp3 decoders?
What I do know is that computers are inherently much more complex than any CD player you can possibly get your hands on today. Time will tell.
Recap/Reload/Digest (or throw up!)
So without further ado, here is a transcript of my interview with Gabriel Bouvigne, current developer of the LAME encoder.
What is your background, how did you get involved in the LAME project?
I have a degree in digital image processing, and I am working on video encoders in a French company. When I was student, I was already interested by audio compression. I was working during my spare time on a very simple mp3 encoder (called Shine) when Mark Taylor (working at the Los Alamos National Laboratory) contacted me about the Lame project. This was in 1999, and today I am still involved in this project.
What exactly is your area of expertise & involvement in LAME?
In Lame, I am mostly working on the psychoacoustic model, bit allocation, and encoding speed. The psychoacoustic model is the part of the encoder that is evaluating the masking levels of the different parts of the frequency spectrum. Once we have this data, the bit allocation part of the encoder is trying to provide a good sound quality by using data from the psychoacoustic model, while choosing how to use a limited amount of available bits.
What is your opinion on high-fidelity music playback?
In my opinion current digital technology is able to provide a very good quality. Using audio CD we can reach a very good recording quality. State of the art A/D converters and noise shaping-based dithering are able to really push the audio CD, with a far better quality than what was initially available in the 80's. I think that the digital part is really well mastered. However, once you are reaching the analog side, things are quite different. From the DAC up to your ears, there are a lot of components that can have a huge impact on the sound. Amplifiers and speakers can easily provide a poor sound if not carefully designed. That is probably where most of differences between two setups come from. Unfortunately, those components are often relatively expensive if you expect a correct sound quality.
Do you consider yourself someone who enjoys high quality music playback?
High quality music playback is something that I don't really practice often. Most of the time I am only playing music as a background music (when not listening to the same 10 seconds of a critical music sample again and again in order to try tuning LAME). But I really hate bad musical systems or very entry-level ones. When treble are lacking, attacks are smashed or there is no deep bass but only a kind of boom-box instead, that's a really displeasing experience, and you do not have to be careful to notice it.
When you look back at history, what do you believe to be the "cause" of the birth of mp3?
The technical birth of mp3 was because of financial considerations. It was mainly created in order to reduce live transmission costs for high quality audio. The practical birth of mp3 as we know it, which is its widespread adoption by the public, was initially caused by piracy. We have to be honest about this. It was not industrial piracy, but a modern variation of K7 trading. Computers barely became able to decode mp3 audio in real time, and considering the file size, young people (including me) quickly realized that it could be a very convenient way to trade music. We could then put a few dozen hours of music on our hard drive, and conveniently access this music.
When we look back to the beginnings of mp3, the early 90s and Fraunhofer's first mp3 encoder, it was pretty much a necessity due to lack of storage space. Today, we just have Hitachi announce 1TB hard drives; meanwhile, "lossless" codecs are in high demand, either via FLAC or AIFF, etc. How do you feel about LAME today? What is the reason for someone to choose LAME over FLAC or any other lossless codec?
Ubiquity and convenience.
First, mp3 is something that is now handled by nearly every digital device, being your computer, your portable audio player, your phone or your DVD player. It's everywhere. Because of this, having your music in mp3 format is quite convenient. You do not have to re-encode it to make it playable on another device, it will just work as is. The size is also still an advantage. On your computer, it provides an interesting saving of disk space, and on most portable audio players, lossless would be too big to be usable.
It's true that right now, someone could choose to use lossless to store audio on his/her computer instead of a lossy compression scheme like mp3. For 44.1kHz and stereo, that is starting to be manageable, considering the size of current hard drives. But what about multi-channel that you will have tomorrow? That will require a substantial size increase with lossless encoding. I think that it's quite likely that in the future we will still mostly use compressed audio. It might not be in mp3 format, but it will still be compressed.
Have a look at video. Is there anyone really asking for uncompressed video instead of MPEG-based DVD? On the contrary, the adoption of HD video is pushing even further the need for lossy compression.
Regarding LAME (and mp3 in general), I have a mixed feeling. On the one hand, I know that it's not the best lossy format, and that is has some problems and some inefficiencies. But on the other hand, we have a huge user base, and I know that we can still push the quality further, even with the old mp3 format.
History again... the Fraunhofer codec was considered to be the best mp3 encoder until LAME came around—today, LAME is considered the de facto best mp3 encoder available. What makes LAME so special?
The goals and the development context of Fraunhofer encoder and Lame are quite different. Fraunhofer's encoders are some good encoders that are competitive at about all their compression levels, while providing a quite good encoding speed. It makes them very convenient in user-friendly applications targeted to the average user. Within Lame, we mainly focused on high audio quality. We tried to always improve quality, version after version. Encoding speed has not been our main priority. We also have not yet addressed the low bit rate area, where Fraunhofer's encoders are providing a way more acceptable quality than Lame. Lame is (right now) mainly an encoder able to achieve a high audio quality, but Fraunhofer's encoders are more balanced, and in a commercial context this might be a more viable strategy that an encoder target to only a specific part of all users.
DRM is becoming ever more an issue—rumor has it one of the major labels is pondering removing DRM from their content all together (Editor's note: as of several weeks ago, iTunes officially supports EMI's DRM free music. No doubt Pandora's Box has been opened...). Will LAME ever include any DRM? How do you feel about the DRM issue?
First, you have to know that I worked 3 years in a DRM company, so my view might be a little twisted. Regarding LAME, it will never include any DRM by itself. That is because LAME is an encoding engine, and most of the DRM schemes are working on already encoded data. Someone could decide to add some DRM on top of LAME (we are not preventing it), but this would be outside of the scope of the LAME project.
Now, on the topic of current DRM. DRM means "Digital Rights Management," but it seems to me that most current schemes regarding audio are instead Digital Rights Restriction instead. DRM should be about protecting and managing rights of content owners, but does not mean that it should restrict what users can do.
Let's think about the following scheme: You buy an audio CD, and a software tool allows you to register it to the publisher online server, by authenticating the physical CD. You are then given back a login/password. When you have the CD inside your computer, you can download from the server the tracks that are on your CD in WMA/MP3/AAC format (at your own choice), without having to encode them yourself. If you are at work, you can log into the server using your provided login/password to listen to a streaming version of the tracks. As a consumer you gain some convenience, and the publisher gains some knowledge about your portable devices (wma or aac format?), how often you listen to those tracks at work. If you buy several audio CDs, the publisher can know that you like this band but also this other one.
Those are valuable information for a publisher. To me, that would be a nice DRM scheme, where both publisher and consumer are winning.
Instead of such thing, what do we have today? Some crippled CDs that are not Red-Book compliant (which means that you are not sure if it will work in your player), crippled download platforms without any interoperability, and prices of those inferior "fake" audio CD that are not even lower than regular price, although you can do less with them.
To me there is something seriously wrong with some of the current business models.
You have mentioned how you keep refining and improving the LAME encoder engine and algorithm—do you believe that the decoder in PC playback can influence the sound quality on the other end? It seems that decoders have been somewhat static in development compared to the dynamic changes LAME goes through with each revision.
In the MPEG audio world, decoding is defined by the standard. It means that decoders' quality is frozen: either they are compliant, or they are not, nothing else. That is why there are not many evolutions on the decoders' side.
There are only a few areas where decoders can impact sound:
But that's all. (Note: you have to know that the mp3 format is internally independent from any bit depth, so decoders are often using 24- or 32-bits precision internally)
While on the subject of decoders, can you comment on that subject for us please?
The market has now has a plethora of 24/192 resolution files—are there any plans to offer still higher quality LAME files over 320kps? Is there any reason why 320kps would be the ceiling of LAME (I know that this has more to do with decoder support - let's just say hypothetically, I will elaborate later on this interesting dynamic...)?
The mp3 standard states that decoders are only required to support decoding up to 320kbps. Support of higher bitrates is possible, but that is not mandatory. However, in my opinion, using higher bitrates would not be that wise. The mp3 format has some design limitations making it inefficient in some specific cases, and even increasing bitrates a lot would not overcome them easily. A better choice would be to use another, more efficient, audio format, like MPEG-AAC.
Regarding 24/192, mp3 being restricted to 48 kHz, you would encode them down-sampled to 48 kHz, which, in my opinion, would not be a real problem. Lame itself is using floating point with 32 bits of accuracy internally and is able to handle a dynamic range of about 120dB, so feeding it with 24 bits data is not a problem.
You have said earlier that you believe any "improved" fidelity in comparing a 320kps LAME file with the original .wav file to be the effect of placebo. I have personally "proven" this not to be the case: routinely, the LAME encoded 320kbps file sounds indeed better then the original .wav file - particularly the bass reason is improved, as is the overall definition of the performance. Music seems to simply sound "tightened" up compared to the rather "dull" sounding original. Do you have any thoughts, or perhaps explanations, for this effect?
Lame by itself does not know what is "better sound." All it knows is what is perceptually closer to the original. Its intent is to sound as much as the original, but not in a different ("better"/"worst") way. You can encounter differences between the original and the encoded version because of some points:
But you also know that to be sure of not being a victim of the Placebo Effect, you have to do some double-blind tests, don't you?
What 'pre-conditioning' is the audio subjected to, prior to encoding? For example, bandpass filtering, etc., especially the 'default' setting for any given bitrate setting.
According to target bitrate (or quality level in VBR mode), we are selecting a default lowpass value. Based on this lowpass, Lame will eventually also down-sample the audio signal. Down-sampling allows Lame to use smaller frequency bands in its coding, increasing coding efficiency.
Note: Lame has some knowledge about the mp3 format (obviously) that allows it to choose the optimal sampling frequency based on the lowpass value, taking into consideration some design issues of the mp3 format. That is why, as an example, when using a 15 kHz lowpass, it's using a sampling frequency of 44.1kHz and not 32kHz. Lowpass goes up to about 20 kHz at lower compression settings.
Is there a 'maximum' encoding level that guarantees no 'overflow' conditions in a decoder? Specifically, how much 'headroom' should a WAV file have to optimally feed the encoder?
I am not really sure to understand the question, but I think that you are asking about potential clipping issue.
When encoding/decoding a track, its peak level can be altered. If you are unlucky, in some cases if your input track was already approaching peak level, you could encounter clipping at the decoding stage. Nowadays, it can be especially problematic on tracks that belong to the current "loudness race" (btw, it could be nice if this loudness race could stop, and go back to using a correct dynamic range).
First, there an important point to keep in mind regarding this encoding-induced clipping: it does not occur on the encoder side, but on the decoder side, even if it's induced by the encoding. That means that inside the mp3 file, there is no clipping. Mp3 can internally go over full scale.
You can then use tools like Mp3Gain on your already encoded files to test for potential decoding clipping, and based on this use the same tool to eventually reduce gain of the encoded file. The nice thing is that this gain alteration is a lossless (and so revertible) process, one that is just altering the gain value inside the mp3 frames without altering the encoded samples. You can then recover a file that will not clip on decoding (but with a reduced gain).
In Lame, we currently have no good provision to try avoiding potential clipping that could occur when decoding. It's also not possible to really give you a maximum "safe" input level. A higher input level and an increased compression ratio both increase the likelihood of clipping, but you're never sure about it before encoding.
My experience of MP3 encoding (using LAME) is that the so-called 'improvement' (vis-à-vis 44 kHz/16-bit LPCM) is highly program dependent. Some CDs, from the first year or so of the 'CD era,' are remarkably 'better' (or at least 'different') sounding after being encoded to 320K MP3 and then put back onto a CD for test listening. Somehow, the encoder is reducing the effects of 'primitive' A/D conversion, whilst supposedly being 'true to the perceptual model of the source.' What is happening here? (My tests controlled all playback variables - the pre/post audio was burned to the same CD-R.)
Is there a kind of common belief among you that Lame is improving sound? Honestly, Lame is not designed to improve sound, but rather to sound as much as possible like the input. It does not mean that in some cases it doesn't improve things, but it would be a side effect, and is clearly not our intent.
I am sorry to say this, but once again I am thinking about a potential Placebo Effect (based on my experience). I, along with most of the codec developers, am a believer in the benefits of double-blind testing and ABX results.
However, I also have to say that I am surprised about a potential Placebo Effect going this way, as usually the "audiophile placebo effect" is fighting against compressed audio. Not a potential judgment on my side, just a stipulation based on experience. Still, I would not discard the possibility that some of the earlier CD releases do indeed sound significantly different using Lame.
Are there plans for LAME to support MPEG-AAC? If so, when?
Lame will stay an mp3 encoder. While AAC is technically attractive and solves mp3 deficiencies, that is not in the scope of the Lame project. Perhaps there will be a similar project targeting AAC in the future, but right now the market, and users, are mostly using mp3. With unlimited time, I would be interested in working on an AAC encoder. But with our limited spare time, we have to make choices.
To close… is there anything else you would like our readers to know? Maybe some curious tidbit of trivial information not much publicized or perhaps something else… feel free to speak your mind.
Sure. Not an exclusive piece of information, but there is something I'd like to mention.
The quality of Lame is there because of the work of many people over some eight years, including a few very dedicated developers, but an important piece of the work comes from people conducting listening tests. Being part of collective listening tests or just private tests, some people spent a lot of hours to test critical or problematic samples, and reported results to us, version after version. This is very important, and is likely to be one of the major points that allowed us to increase quality.
So, while I have this opportunity, I'd like to publicly thank those testing contributors.
Thank you so much for your time and effort, Gabriel—I really appreciate all the insight you have given me, & I look forward to meeting you one day in person!
[For more information about LAME, see http://lame.sourceforge.net.]