2007/04/01 Etxebeste Aduriz, Egoitz - Elhuyar Zientzia Iturria: Elhuyar aldizkaria

Nowadays it is not difficult to drive all the fonoteca of the house, although the fonoteca is very large and the trunk of the car is small. What's more, you can carry in your pocket the entire library and more, 20,000 songs, imagine about 2,000 albums. For this it is only necessary to compress the music.
01/04/2007 | Etxebeste Aduriz, Egoitz | Elhuyar Zientzia Komunikazioa
(Photo: R. Etxebeste)

Although there are different formats of music compression, the most successful is undoubtedly MPEG-1 Audio Layer 3, much better known as MP3. MPEG is a working group of the International Organization for Standardization (ISO) that created the famous MP3.

In order to preserve digital sound, a large number of data is needed, which, compressed in MP3 format, can be considerably reduced without any loss in sound quality. That is the key to the success of mp3. But this success is also directly related to the Internet. In fact, thanks to this format, the music files were made accessible on the network.

MP3 files start to appear on the network from the mid 90's. At the end of that decade, with the appearance of software such as Winamp (1997) or Napster (1999), musical compression became very simple for anyone, as well as reproduction, networking or simply downloading. This meant a big hug for users as it allowed to get a lot of free music. The sharing of music through P2P or peer-to-peer networks has become a common practice, which has led to various controversies and legal problems. It has also significantly influenced the music industry.

MP3 players have also been very successful. No wonder, they have a reasonable price and you can save a lot of music in a very small place to listen to it wherever and whenever you want. Thanks to MP3 you have created pocket music.

MP3 players have been very successful.
R. Etxebeste

These are some of the factors that have influenced the success of MP3. But, as we have said, the key is in the level of compression achieved without appreciable loss of quality. And where is the mystery? How does the MP3, the music that needs so much space, get into such a small place? How can 20,000 songs be heard in your pocket? Because the algorithms used to compress in MP3 format work to measure our ear. These algorithms remove the information our ear will not detect and can be much.

Sound with numbers

To know how the sound is compressed, first you have to understand how the sound is digitized. The sound is a wave and to digitize it must be represented with numbers. According to Nyquist's theorem, numerical expression of the wave at a given frequency requires the adoption of 2 measurements per cycle. Therefore, for a multi-frequency sound to be expressed with numbers the highest frequency will be required twice. Man can hear a maximum sound of 20 kHz (20,000 cycles per second). The digitization of this sound would require 40,000 measurements per second. For example, in CDs 44,100 measurements per second are used, the established quality standard for digital sound is 44,1 kHz.

In addition, to save sound intensity information, it is necessary to assign a value to each of these measurements. With a bit values 0 and 1 are indicated, that is, if the sound exists or does not exist. With the two bits, besides the zero, 3 different intensities can be represented and with 16 bits 65.536. Well, all avatars and nuances of music are correctly expressed in 16 bits per measure.

According to Nyquist's theorem, to express numerically a wave it is necessary to take two measurements per cycle.
G. Roa

In addition, if you want to preserve stereo music you must use 2 channels. Finally, CD quality sound requires 1.411.2 Kbps (16 bits/size x 44.100 measurements/s x 2 channels). Or what is the same, to save a second sound of that quality you need 1.411.2 kb (176 kB).

No honey for donkey mouth

This CD quality is very good, perhaps too good, because the human being is not able to receive all the information it contains. That is what psychoacoustics say at least. Psychoacoustics analyze the perception of sound characteristics. And that perception, of course, has its limits. For example, we can only hear sounds that are between 20 and 20,000 Hz and with age the ability to hear high frequencies decreases. In fact, few adults can hear more than 16 kHz, with a limit of 10 kHz for 60-70 years.

Well, MPEG algorithms use psychoacoustic models to measure leftover data and eliminate those we cannot hear. On the one hand, all sounds below 20 Hz and above 20 kHz can be directly removed. On the other hand, when the sound is in stereo, there is usually repeated information on both channels. Below a certain frequency we are not able to distinguish where the sound comes from, so below those frequencies it is enough to encode a single channel.

When there are sounds of many frequencies at once, some cover others and we cannot hear them all.

But the psychoacoustic models are the ones that work most with the coating effect. A certain frequency sound covers a weaker sound with a similar frequency and we are not able to hear the weakest. It is what is known as coating effect, and the closer the frequencies are, the greater the covering. For example, if with a sound of 1 kHz we have another 1.1 kHz, but the latter has 18 dB less, we can only hear the first. If the second sound is 2 kHz, even 18 dB less, it will be heard, as in this case a difference of 45 dB would be needed to cover the second.

The coating effect can be simultaneous, but it also occurs between sounds very close to time. In addition, the resolution capacity of our ear varies greatly depending on the frequency. The highest sensitivity is between 2 and 4 kHz, in the same range as the human voice. All this is taken into account by the encoder that compresses the sound.

To do this, the frequency spectrum from 20 Hz to 20 kHz is first divided into several subcategories, and then, applying psychoacoustic models in each band, it is calculated which information is more important and what less. Depending on its importance, the number of bits used to store this information is different, that is, some data can be deleted directly and many others can be stored using less than 16 bits than others.

Thus, after eliminating what is left over and encoding what can be expressed with less data, with a standard algorithm it is only necessary to compress that information. Finally you can get to compress from 1.411.2 kbps of uncompressed sound to 32-320 kbps. However, from a compression level, the loss of quality is significant and usually does not exceed 128 kbps.

The human being is not able to receive all the information contained in a CD.

On the other hand, the quality of the compression varies greatly according to the encoder used, since not all codify in the same way. With a good encoder, with 128 kbps most will not notice loss of quality. However, in cases where the material to be compressed is more difficult, or where the listener's ear is formed, 192 kbps may be needed to avoid losses. However, the degree of compression may vary. Each part of the MP3 files can have a different degree of compression, so that when the sound has more dynamism more bits can be used for better quality.

In short, mp3 is a custom format. Prepared for our ears. Able to make small music without touching its greatness.

Etxebeste Aduriz, Egoitz
More information
Images/Sounds; Peripherals; Software

Gai honi buruzko eduki gehiago

Elhuyarrek garatutako teknologia