Loading 3 Votes - +

How Audio Compression Works

Introduction

I will be examining the different audio compression algorithms. What this is not is an attempt to recommend which encoder to use for backing up your music collection; rather this article is a look into the different techniques being used to revolutionize the way the world listens to music. Whether you support free distribution of music or cheer for the record companies as they sue their own consumers there is no denying that portable digital audio is taking over the music world.

How Compression Came To Be

The worldwide standard for portable audio is CD audio format. The audio CD format was developed by Philips and Sony in 19801 and has changed little since the initial introduction in 1982.2 Taking a deeper look into a single song, known as a CDDA file, you will find that CD’s audio quality of sound is achieved by sampling the input at a rate of 44,100 times per second (44.1 kHz). CDDA format stores each sample in 16 bit segments, twice to achieve stereo quality, for a total of 1.4 Mbps, or approximately 10 Mbits per minute.3 Depending on the size of the CD collection being backed up, terabytes of space may be required.

The development of what was to become MP3 compression began in 1987 at the Fraunhofer IIS in Germany. During the next four years the Fraunhofer IIS was able to achieve a data reduction of 1:12 (or 112 kbps at 44.1 kHz vs. 1.4 Mbps at 44.1 kHz).4 The techs at Fraunhofer continued to tweak their algorithm and improve sound quality, but due to the cost of hard drive space, there was not a large demand for their technology.

The technology/Internet explosion of the late 1990’s changed everything. At first, only geeks new what an MP3 was. One such nerd, Justin Frankel, was disappointed in the players that were available, so he programmed his own. In 1996, Winamp 1.0 was released to the world5 and was the catalyst for the MP3 explosion (sadly Nullsoft and Winamp are no more, the last original member quit recently and the namebrand is owned by AOL). MP3’s became so popular over the next two years that in 1998 Fraunhofer began requesting licensing fees from companies that used their algorithms within the MP3 standard.6 Since use of the Fraunhofer codec was suddenly costly, other algorithms began to develop as free alternatives to Fraunhofer’s. LAME (an acronym for "Lame Ain’t an MP3 Encoder") was first introduced in 1998 and has since established itself as the premier MP3 encoder.7 Other challengers to MP3 being the de facto compressed audio standard include Ogg Vorbis, Windows Media Audio, and AAC.

What Is Compression?

The first concept in compression is lossy vs. lossless algorithms.Lossless algorithms are primarily concerned with the retention of 100% of the audio stream. Some lossless file formats can reduce the size of files while retaining complete quality, however, there is very little reduction is storage space required. Lossy algorithms are not restricted by a requirement to maintain of the data; they analyze the input stream and trim data for size reduction, maintaining enough quality while appearing to retain the complete acoustic range. A lossless algorithm is able to totally reconstruct the original data stream while a lossy algorithm cannot because it discarded information in order to reduce the file size.

There are two kinds of lossless algorithms: compressed and uncompressed. Uncompressed lossless algorithms take the original CDDA file and convert it to the new file format, retaining all information and creating a virtual copy. WAV files are an example of an uncompressed file format. Compressed files, such as Windows Media 9 Audio and Apple Lossless Encoding, remove redundant or unneeded information yet retain the original quality.

The lossless file formats are great if you have hundreds or thousands of gigabytes of storage, but what if you don’t? The average “hit song” requires approximately 35 Mb of hard disk space in CDDA or uncompressed lossless file format. Compressed lossless files are smaller, but are still quite large; the best that lossless compression can achieve is 1:2 reduction in file size. CDDA and WAV file formats are old standards in the digital world and when they were introduced computers ran on 5.25” floppy drives. These magnetic disks could not handle 1/1000th the amount of data contained within a single CD. A more efficient file compression algorithm would be required to store audio on common computers.

A lossy algorithm is much like the lossless compressed algorithm described above, but it goes a step further: the algorithm removes data that it deems unnecessary. Based on the average human’s inability to distinguish minute details and to hear accurately at higher frequency ranges, much of the data is discarded but enough detail is maintained to trick the listener’s auditory senses. What is discarded? How do they achieve such a reduction in size? How much detail is actually lost?

A Look Inside

I am going to examine each of the popular file formats: MP3, Ogg Vorbis, WMA, and AAC. First up is the current king of audio compression, MP3.

Fraunhofer MP3

The Fraunhofer institute in Germany created MP3 compression utilizing "perceptual coding techniques addressing the perception of sound waves by the human ear."8 To achieve CD quality at a lower bit rate, Fraunhofer performed extensive studies on the methods of sound interpretation by the human auditory system to tweak its compression algorithms. This process is called psychoacoustic modeling.

The first attempt at psychoacoustics was MPEG Layer-1, followed by MPEG Layer-2. These first two models greatly reduced the data rate required to reproduce CD quality sound, while MPEG Layer-3 reduced it even further (Layer-1: 384 kbps, Layer-2: 192 kbps, Layer-3: 112 kbps)9 required to reproduce CD quality sound. I am not going into detail about these two codecs because Layer-3 is just a more complex version, which achieves better data rates (if you compare an MP2 and MP3 at 128 kbps the quality difference is very noticeable), but here is a visual representation of their differences:

7_article_24_thumb_figure1

’’’Figure 1.‘’’ MPEG Layers 1 and 2^^^Haritaoglu, Esin Darici. ‘’ISO/MPEG Standardization’’. University of Maryland for Advanced Computer Science Studies. June 18, 1997. Accessed November 29, 2004. Available from: ___http://www.umiacs.umd.edu/~desin/Speech1/node17.html#iso1___. The diagram provided visual representation of MPEG Layer-1 and Layer-2 algorithms; the diagram was redrawn for resizing purposes.^^^ | border | align: center

7_article_24_thumb_figure2

’’’Figure 2.‘’’ MPEG Layer 3 (AKA MP3)^^^Haritaoglu. Available from: ___http://www.umiacs.umd.edu/~desin/Speech1/node19.html___. Provided an alternative chart to Fraunhofer’s that is easily compared to the previous Layer-2 diagram. Redrawn for resizing purposes.^^^ | border | align: center

Now it’s time to get into the bits and pieces of exactly what functions each of these blocks perform. Fraunhofer displays the following diagram on their website to break down MP3 encoding to explain the process:

7_article_24_thumb_figure3

’’’Figure 3.‘’’ MP3 chart off Fraunhofer^^^Fraunhofer IIS. Accessed November 23, 2004. Available from: ___http://www.iis.fraunhofer.de/amm/techinf/layer3/layer3_block.gif___. Provided the official diagram that Fraunhofer IIS uses.^^^ | border | align: center

The first block you see is the filter bank; Fraunhofer uses a hybrid solution for the MP3 filter bank which combines a polyphase filter bank and a Modified Discrete Cosine Transform (MDCT). The polyphase filter bank is the initial splitting of the audio stream into equidistant frequency sub-bands. Within these samples, the critical (primary) frequency is chosen and later used for compression.10 One problem that can arise from this selection process is critical frequencies that overlap sub-bands, which would cause serious quality and clarity problems during encoding.11 To prevent this problem, the MDCT is used on the 32 sub-bands from the filter bank.12 Each sub-band’s range is expanded to overlap with its neighbor sub-bands. The new samples are processed through the MDCT and added with its inverse function to cancel out any errors that may occur in the MDCT.13 The process of cancellation is called time domain aliasing cancellation.14 This is a time consuming process, but the calculations and time required are reduced using recursive factorizing.15,16

The resulting samples are passed to the joint stereo coding portion of the encoder. The joint stereo coding is where redundant and irrelevant information is removed from the bit stream.17

The perceptual model calculates what the frequency limits for each of the sub-bands will be, which is used in determining the critical frequency. Another function of the perceptual model is determining the masking threshold within each of the sub-bands. Masking is when one sound causes another to become inaudible, or covers up another sound. So according to the psychoacoustic model the encoder can determine whether to eliminate or reduce the number of bits allocated for every sound below the masking threshold within each sub-band.18 Another factor in the determination of what gets dropped is the bit rate that the encoding. The lower the bit rate, the more items get dropped.

The data stream is now passed to a pair of loops: noise control/distortion loop (control loop) and the rate loop. The entire data stream is divided into time slices, as well as the sub-bands. The control loop checks within each time slice to determine whether the noise within each sub-band exceeds the masking threshold, or the allowed noise. If the time section’s sub-band exceeds the allowed noise then the scalefactors for that sub-band must be adjusted and more time sections are required. The scalefactors relay what the gain is for each sub-band;19 gain is the ratio of signal output to signal input.20 However, creating more sections increases the bit rate, which is not desirable. To offset the increased bit rate requirement, the rate loop is initiated.21

The rate loop performs and checks the Huffman Coding. Huffman coding is a method of analyzing a set of data and producing an optimal set of bit strings (symbols) to represent some of the data. The more frequently used input samples (frequencies) are given smaller bit strings, thereby performing lossless compression on the lossy data from the psychoacoustic portion of the encoding, and different Huffman coding tables are used in different frequency spectrums to achieve maximum size reduction.22 If the number of bits that Huffman coding produces is larger than the allowed bitrate, it decreases the number of time slices.23

Once these two loops are satisfied with the outcome of their continual adjustments compression, the MP3 is ready for play. One limitation of MP3 is that the audio quality degrade under 112 kbps. The standard for encoding on the internet is 128, but because hard drive space is so cheap, 192 kbps is becoming more common.

LAME MP3

The LAME (LAME Ain’t an MP3 encoder) project began as a series of patches to the ISO Layer-3 standard in 1998 after Fraunhofer IIS’s decision to collect royalties on MP3 encoding. By 2000, LAME was a full-blown MP3 encoder.24 "The goal of the LAME project is to use the open source model to improve the psycho acoustics, noise shaping and speed of MP3."25 The LAME psychoacoustic and noise-shaping model, named GPSYCHO, modifies several aspects of the standard MP3 encoding to improve sound quality. The first portion of the encoding process that is modified is the actual psychoacoustic model.26

The next diversion from the ISO standard is how the outer (control) loop selects the size of the time sections. The ISO selection selects the final size that is calculated during the control loop, but LAME selects the best quantization determined during all runs of the outer loop for each time section.27 The optimal selection for some time slices may be over the encoding bandwidth (for instance, 128kbps), which is supposed to be against the rules…

An MP3 file is actually a series of frames; each frame containing the information required to decode the stream for audio playback. Every frame may not utilize all the bits they are allocated, causing an abundance of space within that frame. The encoder can save these bits to the bit reservoir to utilize at a later time if a frame’s optimal size is slightly larger than the allowed size. If you are using a variable bit rate (you set min/max bit rates), then you do not need to worry about this problem, but for fixed bit rates this is a constant concern. LAME claims to better handle bit reservoir than the ISO standard, but they acknowledge that Fraunhofer handles the bit reservoir extremely well.28

LAME is free and arguably equivalent or better quality than Fraunhofer’s using higher bitrates, but works well at lower bit rates as well.

Ogg Vorbis

Ogg Vorbis was introduced as an alternative to MP3 and is completely free. Unlike LAME, which took the ISO standard and improved upon each of their techniques, Xiph only took a few ideas from MP3 and added support for multiple (more than two) sound channels. The flow for encoding a Vorbis file is:

7_article_24_thumb_figure4

’’’Figure 4.‘’’ Ogg Vorbis Flow Chart^^^Kosaka, Atsushi. Okuhata, Hiroyuki. Onoye, Takao. Shirakawa, Isao.Yamaguchi, Satoshi. ‘’A Hardware Implementation of Ogg Vorbis Audio Decoder with Embedded Processor’’. Department of Information Systems Engineering, Osaka University, Japan. Accessed December 1, 2004. Available from: ___http://www.kmutt.ac.th/itc2002/CD/pdf/17_07_45/WA2_OF/2.pdf___. Page 2. This document provided a clear flow chart of the Ogg Vorbis encoding implementation. Redrawn by the author for this article.^^^ | align: center | border

Ogg differs from MP3 encoding in the use of a Fast Fourier Transform rather than the use of a filterbank. The MDCT is actually a modified FFT, and because of these modifications Vorbis uses the FFT analysis for tonal estimation while the MDCT is again used for noise analysis.29 Once again a psychoacoustic model is used to discard unneeded information.

Floor mapping is like joint-stereo encoding in MP3, but it does not do the actual combining. Mapping bundles or associates like channels (left and right for instance) of the audio stream together for grouped encoding and decoding (the actual combining is done later). If channels within an audio stream are similar, they may be eligible for combination which reduces the overall size of the output file.30 A floor is a vector that "is a low-resolution representation of the audio spectrum for the given channel in the current frame."31 There are two types of vector representations, but the first type is not commonly used. The current vector representation (floor 1) calculates the curve on a "piecewise linear interpolated representation,"32 a graph, with dB amplitude and frequency as the axis.

The floor removal takes the MDCT values and divides them by the floor values in order to flatten or round the peaks of each frequency spectrum. This reduces the audio quality slightly, but is unnoticeable; much like having another psychoacoustic model further reduce the unneeded audio data.33 The data resulting from the floor removal is then converted to a Bark scale.34

Channel coupling is the combining of each band to eliminate redundant information, greatly reducing data requirements.

Quantization is the same concept from MP3, breaking the input stream into time slices, but is performed in a different manner. Each residue vector, Ogg defines residue as the audio spectrum of one channel after the floor is subtracted and channel coupling is performed,35 is encoded using a combination of Huffman and one of the vector quantization codebooks36 rather than the double iterative processes that MP3 uses. A codebook is a logical mapping of the vector sets to the Huffman code that represents them,37 and the encoder can choose the best-fit representation from any available codebook.38 This makes it seem that Vorbis is using pre-determined codebooks, but it is also possible to do on the fly codebook creation, which Vorbis does early on in the encoding process. These additional codebooks are a supplement to existing codebooks.

A huge advantage that Ogg Vorbis has over MP3 is the ability to encode multiple channels, providing for surround sound. MP3 surround was released during the writing of this article and is available for testing without a license fee until December 31st. My emails to Fraunhofer requesting information beyond their "Introduction to MP3 Surround" went unanswered.

WMA

There are fewer and fewer areas in the computing world that Microsoft does not touch, and audio compression is one of them. Windows Media Audio compression was originally developed as part of the Advanced System Format, whose two components are WMA and WMV (video); both designed for streaming information via the Net.39 It is not currently possible to analyze the actual algorithm that Microsoft uses to encode files because Microsoft protects all company secrets better than the U.S. protects nuclear secrets, but some smart people are guessing at how they perform their encoding. The best estimate that I could find believes that Microsoft may be using an approach similar to Ogg Vorbis, which is the twin vector quantization technique with on the fly codebook creation.40 Whatever the algorithm is that Microsoft has developed, WMA has amazing clarity at extremely low bit rates and encodes very quickly.

AAC

In response to surround sound, Fraunhofer released AAC, or MPEG-2 (4) Advanced Audio Coding. AAC is an improvement on MP3 with sampling frequencies ranging from 8-96 kHz and up to 48 channels (Ogg Vorbis is supposed to support up to 255).41 AAC’s popularity has increased greatly during the past couple years thanks to Apple’s IPod support and their availability on Apple’s ITunes. The following diagram will look very familiar, with a few additional processes:

7_article_24_thumb_figure5

’’’Figure 5.‘’’ AAC encoding flow chart from Fraunhofer^^^Ibid.^^^ | border | align: center

The first change is the Filter bank, which only uses MDCT and an increased window size, rather than the hybrid filter bank of MP3.

TNS, Temporal Noise Shaping, is based on the principle that "a tonal signal in the time domain has transient peaks in the frequency domain. The dual of this, is that a signal which is transient in the time domain is "tonal" in the frequency domain.”42 A tonal sound is a simple constant or repeating sound of a small frequency range43 and a transient is a loud, short-lived sound that causes a rapid change from small to large amplitude. By applying the relevant equations to the frequency domain, AAC is able to distribute quantization noise in the time domain.

Prediction is based on the fact that certain sound patterns are easy to predict. Prediction looks at the previous two frames to analyze sound patterns. This information is utilized in the reduction of redundant data.44

AAC has many revisions, the latest being High Efficiency (HE-AAC). The improvements are focused on improving quality at lower bit rates, which the updates are very successful at.

AC-3

AC-3 is the official name for one of the most commonly used encoding processes. Most readers will realize they are frequent users of AC-3 once they realize that its unofficial name is Dolby Digital 5.1. AC-3 is (surprise!) the third iteration in Dolby’s development of a data compression technique to provide multi-channel support for HDTV; AC development began in 1987. The primary goal in development of AC-3 was the requirement for it to serve a broad range of uses, such as movie theaters, and the decoder needed to be low “cost.”

7_article_24_thumb_figure6

’’’Figure 6.‘’’ AC-3 Flow Chart^^^Todd, Craig C. Davidson, Grant A. Davis, Mark F. Fielder, Louis D. Links, Brian D. Vernon, Steve. ‘’AC-3: Flexible Perceptual Coding for Audio Transmission and Storage’’. Dolby Laboratories. March, 1 19994. Accessed December 5, 2004. Available from: ___http://www.mp3-tech.org/programmer/docs/ac3-flex.pdf___. Page 6. The AC-3 Flow Chart was redrawn by for use in the article.^^^

The AC3 filter bank is also a MDCT but with a twist: it uses a Kaisler-Bessel derived window to determine its number of sub-bands (called windowed samples by Dolby).45 These 512 overlapping samples are converted to 256 frequency domain points (critical frequencies).46

The frequency domain points are passed to the Spectral Envelope encoder where each sample becomes an exponent and mantissa. The collection of all exponents is called the Spectral Envelope.47 The Spectral envelope needs to be reduced in size before it is transmitted though. Only the lowest frequency sample’s exponent is transferred in its entirety; all others are transferred as differentials. AC-3 uses four different strategies to combine the exponents together to reduce the amount of space the envelope requires when transferred.48

AC-3 uses a hybrid bit allocation algorithm and is quite different from any of the other encoders thus far: it makes use of a hybrid backward/forward adaptive bit allocation combination. The primary difference between forward and backward adaptive bit allocation is that the forward implementation requires a part of the bit stream be dedicated to store bit rate information. Backward allocation requires that the decoder calculates what the bit rates are, causing the decoder to be rather resource expensive. The "Core Bit Allocator" is the backward implementation, forward being the "Bit Allocator."49 Using psychoacoustic models, the predicted masking curve is found for each sub-band, compared to a hearing threshold, and the larger is retained. The predicted masking curve is then subtracted from the original information to determine the signal-to-noise ration for each sub-band. The SNR’s are associated with the quantized mantissas from their respective sub-band. The mantissas’ accuracy depends upon the number of bits available for each time slice and all channels have to compete for the allocated bits. If the limit is reached, then the mantissas are rounded and accuracy is lost. If more bits are available, individual SNR’s may be increased (essentially decreasing quantization steps or increasing time samples).50

The envelope, SNR’s, and mantissas are then placed into the AC-3 file format and await playback.

Conclusion

All of the above audio compression techniques have advantages and disadvantages depending on your compression needs. Your final choice may be what the audio player that you purchase supports (most support a wide range of file formats within the last two years). Ogg Vorbis is increasingly desirable for professional level encoding due to its high quality and the fact it is free. If you are into video games, many of the latest titles now ship using Ogg Vorbis format rather than MP3. WMA performs well at very low bitrates, but do you really want Microsoft controlling yet another aspect of your computer? MP3 is the long reigning standard bearer for audio compression, but reading through pages such as /., there are claims that MP3 is dead. I find this difficult to believe since MP3’s and file sharing programs are still blamed for slumping record sales and who really wants to re-rip their entire music collection? All portable audio devices support MP3 format and the availability of MP3’s on the internet is astounding, just ask U2 and Eminem. One day MP3 may go the way of VHS tapes, but Fraunhofer is working diligently for MP3 to support new technology. Seeing how the competitors compress DVD audio – which raises the transfer rate from 1.4 mpbs to 9.2 mbps (at 192 kHz 24 bit sound) – will be fun to watch.

1 Erickson, Grant. A Fundamental Introduction to the Compact Disc Player. University of Minnesota. November 29, 1994. Accessed November 22, 2004. Available from: [~http://www.tc.umn.edu/~erick205/Papers/paper.html~]. Grant Erickson’s article explained the development of audio sound and compact discs.

2 Ibid.

3 Fraunhofer IIS. Audio and Multimedia MPEG Audio Layer-3. Fraunhofer Institut Integrierte Schaltungen (IIS). Accessed on November 22, 2004. Available from: http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html. Fraunhofer’s overview of development provided the details necessary to understand how MP3’s work.

4 Ibid.

5 In a magazine long, long ago. Probably available on the internet in many locations.

6 Scheirer, Eric. Frequently Asked Questions: MPEG, Patents, and Audio Coding. MIT Media Laboratory. October 21, 1998. Accessed November 26, 2004. Available from: [~http://web.media.mit.edu/~eds/mpeg-patents-faq~]. In FAQ #8, Eric Scheirer clarified what parts MP3 standard encoders for which Fraunhofer is requesting royalties.

7 LAME, About LAME. Accessed 22 November 2004. Available from: http://lame.sourceforge.net/about.html. The official LAME site provided the history of LAME.

8 Fraunhofer IIS. Accessed November 23, 2004. Provided the definition of psychoacoustics. Available from: http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html.

9 Ibid. The table displays the lowest level of compression at which CD quality sound is achievable.

10 Heather, James. A Brief Overview of MPEG/Audio. Motorola. Accessed November 29, 2004. Available from: [~http://www2.cs.uregina.ca/~gerhard/courses/Audio/MPEG.ppt.pdf~]. Slide 7. Overview of the Polyphase Filter Bank.

11 Ibid. Slide 9.

12 Wikipedia. Modified Discrete Cosine Transform. Wikimedia Foundation. November 29, 2004. Accessed November 30, 2004. Available from: http://en.wikipedia.org/wiki/MDCT. Provided an overview and in depth description of the MDCT.

13 Lincoln, Bosse. An Experimental High Fidelity Perceptual Audio Coder Project in MUS420 Win 97. Stanford University. March 7, 1998. Accessed November 29, 2004. Available from: [~http://ccrma.stanford.edu/~bosse/proj/node27.html~]. Provided the author with a practical application of the MDCT.

14 Ibid.

15 McBride, Mark A., Iterative Recurrence Solutions. OmniNerd. 24 Sep 04. Accessed 6 Dec 04. Available from: http://www.omninerd.com/articles/articles.php? id=mcbride-200409-iterativerecurrencesolutions. Mark McBride’s recursion article explains the basics of recursion.

16 Wikipedia, Modified Discrete Cosine Transform.

17 Fraunhofer IIS, accessed November 22, 2004. Available from: http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html. The details section provided an overview of the Joint Stereo Encoder.

18 Heather. Slide 13. James Heather’s presentation explained the concept of masking.

19 Wesen, Bjorn. A DSP-based decompressor unit for high-fidelity MPEG-Audio over TCP/IP networks. Axis Communications, AB. 1997. Accessed November, 30 2004. Available from: [~http://www.sparta.lu.se/~bjorn/whitney/implemen.htm~]. Bjorn Wesen provided the actual function of the scalefactors within the data stream.

20 Wikipedia. Gain. Wikimedia Foundation. Accessed November, 30 2004. Available at: http://en.wikipedia.org/wiki/Gain. Provided the definition of Gain in acoustics.

21 Fraunhofer. Accessed November 30, 2004. Fraunhofer IIS provided an overview of the two loops and their basic functions. Available from: http://www.iis.fraunhofer.de/amm/techinf/layer3/index.html.

22 Ibid.

23 Ibid.

24 LAME. Accessed November 27, 2004. Available from: http://lame.sourceforge.net/about.html.

25 lAME. Accessed November 27, 2004. Available from: http://lame.sourceforge.net/doc/html/history.html. LAME provides a consolidated collection of revisions by release version.

26 For a little more detailed information, search the history of modifications available at http://lame.sourceforge.net/doc/html/history.html.)

27 Lame. Accessed November 27, 2004. Available from: ___http://lame.sourceforge.net/gpsycho/outer_loop.html_[~. LAME provides an algorithm and description of their Control Loop implementation.

28 Lame. Accessed November 27, 2004. Available from: ~]http://lame.sourceforge.net/gpsycho/gpsycho.html__. An overview of GPSYCHO with comments and comparisons to Fraunhofer’s MP3 Codec.

29 Ibid. Atsuchi Kosaka and associates provided a description of how the MDCT is applied by Ogg Vorbis.

30 Xiph.org. Vorbis I Specification. Xipf Foundation. Accessed December 1, 2004. Available from: ___http://www.xiph.org/ogg/vorbis/doc/Vorbis_I_spec.pdf_[~. Page 4. Xiph.org provides details on what floor mapping is and the process to calculate and execute.

31 Ibid.

32 Ibid.

33 Kosaka. Accessed December 1, 2004. Page 2. Combining the information in Kosaka’s article with Xiph.org allowed for a more accurate description of MDCT application.

34 Bark scale available at ~]http://en.wikipedia.org/wiki/Bark_scale__.

35 Ibid. Page 48. Details how quantization is calculated in Ogg Vorbis, 48.

36 Coleman, Mike. Vorbis Illuminated. Mathdogs. August, 2001. Accessed December 1, 2004. Available from: http://www.mathdogs.com/vorbis-illuminated/x62.html. Provided additional details to augment the Vorbis Specification about Huffman coding and quantization.

37 Coleman, Mike. Available from: http://www.mathdogs.com/vorbis-illuminated/bitstream-appendix.html. Clarified the use of codebooks within the Ogg Vorbis algorithm.

38 Xiph.org. Page 5. Codebook application described in the specification document.

39 Microsoft Corporation. Advanced Systems Format (ASF) Specification. June 2004. Accessed December 2, 2004. Available from: ___http://download.microsoft.com/ download/e/0/6/e06db390-1e2a-4978-82bb-311810d8a28d/ ASF_Specification.doc___. The ASF documentation specifies the file format required for WMA.

40 Dipert, Brian. Digital audio gets an audition: Part two: lossy compression. Reed Electronics Group. January 18, 2001. Accessed December 2, 2004. Available from: http://www.edn.com/index.asp?layout=article&stt=001&articleid=CA74935&pubdate=1/18/01. Brian Dipert’s “Music Mysteries” section of the article is the best analysis of the WMA and examines a few possibilities for that algorithm.

41 Fraunhofer. MPEG-2 Advanced Audio Coding: Data Compression for the 21st Century. Accessed December 3, 2004. Available from: http://www.iis.fraunhofer.de/amm/techinf/aac/index.html. Fraunhofer’s AAC page provides an overview and diagram for AAC encoding.

42 Lincoln. Accessed December 3, 2004. Availabe from: [~http://ccrma.stanford.edu/~bosse/proj/node8.html~]. Bosse Lincoln explains the theory at the basis of TNS within AAC.

43 Robinson, Alan. Intro to Audio Lossy Compresison and the MP3 Standard. Accessed December 4, 2004. Available from: [~http://wonka.hampshire.edu/~alan/research/mp3/introduction.html~]. Mr. Robinson’s article provided a usable definition for a "tonal" frequency.

44 Doliwa, Peter. MPEG-4 Advanced Audio Coding. Accessed December 4, 2004. Available from: http://www.ibr.cs.tu-bs.de/lehre/ss04/skm/mpeg-4-aac.pdf. Doliwa’s article clarified that AAC uses previous frames to perform prediction.

45 Wikipedia, Modified Discrete Cosine Transform. Accessed December, 5, 2004. Available from: ___http://en.wikipedia.org/wiki/Modified_discrete_cosine_transform___. Wikipedia’s MDCT article also describes and provides the formula’s for Kaisler-Bessel, which is used by AC-3.

46 Todd. Page 7.

47 Ibid.

48 Ibid.

49 Ibid., 4-6

50 Ibid., 4-6, 11

Similarly tagged OmniNerd content:

Thread parent sort order:
Thread verbosity:
0 Votes  - +
Ogg Vorbis by markmcb
Ahhh…. a computer geek-related article… it’s been too long… :-) Todd, nice sources man! I’ve been reading through many of them after reading the article. Digital audio is just one of those things that I’ve always taken for granted. It’s cool seeing how it all works. Does anyone out there use Ogg Vorbis encoding? I’ve read some things that say it has superior sound quality to MP3, and then other things that say you can’t really tell the difference. I rip all of my audio at 160kbs(MP3) and I can’t tell any difference between that and my CDs. I’m just curious if there is actually anyone who can hear a difference. Also, I’ve heard of people converting their MP3s to Ogg. Am I wrong, or is this taking an already lossy file and applying yet another lossy encoding to it? Wouldn’t this process just degrade your existing archive? It seems to me that the only way to switch and maintain quality is to re-rip your entire collection using Ogg.
Todd, thanks for your informative article. It gave me a better understanding of how digital audio works. As a computer user (not a computer understander) this is quite interesting, although I admit I don’t understand the nuts and bolts. The main thing I learned is how they have used psychoacoustics to figure out what the ear doesn’t need to hear, then eliminated unnecessary sounds to make audio files relatively small. Quite interesting. The progress could obselete music compression, though, couldn’t it? I am sure this compression is still quite necessary for video games and movies, as Matt M. has discussed, but I wonder: Now that hard drive space is cheap and plentiful, and Internet connections can handle huge amounts of data, is file compression necessary anymore for music? Why not have the whole 35MB song? In a year or two they’ll have a 1GB iPod, I’m sure. Also, words like "Ogg Vorbis" and "lossy" make this article especially cool for a lover of words…eh, Ryan Kistner? Words are fun…

Share & Socialize

What is OmniNerd?

Omninerd_icon Welcome! OmniNerd's content is generated by nerds like you. Learn more.

Voting Booth

America's involvement with the ISIS crisis should be?

18 votes, 2 comments