[home]
[syllabus]
[policies]
[assignments]
[help]
[links]
Digital Audio Concepts
Sound Basics
Sound originates from a disturbance of the air by any object. For example, two
hands clapping cause a disturbance of the air around the hands: the hands are
the source of the sound. The local region of air has increased energy caused by
the motion of the air molecules. This energy spreads outwards in sound waves.
A typical source of acoustic energy today is the loudspeaker: the cone of the
loudspeaker vibrates in the air causing disturbances dependent on the
electrical signals reaching the loudspeaker from the sound system.
Effectively, the loudspeaker converts electrical energy into sound energy,
which travels through the air as waves radiating from the loudspeaker:
Here is a graph of the variations in air pressure over time, for a pure sine
tone — the sound produced by a tuning fork or an electronic
oscillator.
Frequency and Pitch
Subjectively, we often refer to a sound as having a high pitch or a low pitch.
But from a technical standpoint, the sensation of pitch depends upon
frequency — how many vibrations (periods or cycles) per second you
are hearing. Frequency is the most fundamental quantitative unit of sound.
When we talk about the radiation of sound, we are talking about an energy
transfer in the molecules of air resulting in a moving high pressure
(compression) zone traveling at 1130 feet/second. For this sound to have a
major significance, it must be composed of multiple high and low pressure zones
following each other. The time between the intervals of high pressure zones
defines a cycle, and frequency is defined as the number of cycles per second.
Formerly, frequency was described in units of cps (cycles per second). Today
it is more common to refer to a specific frequency in units called Hz (Hertz).
Hertz simply means cycles per second. If we talk about the fundamental
frequency of the concert "A" when played by a piano, then we are referring to
440 Hz. This tells us that when the piano plays a concert "A," it is
generating 440 high and low pressure zones (periods or cycles) which propagate
through a specific point in a given second. The range of human hearing is from
20 Hz to 20,000 Hertz.
Intensity of Sound
Intensity is the magnitude of variance in air pressure resulting from sound.
From a graphical point of view intensity is directly related to the amplitude
of the wave. The intensity of sound is usually expressed in terms of decibels
(dB). The decibel is not an absolute measure of sound intensity; rather, it
defines a relationship between two sound intensities. The decibel is a
logarithmic ratio between what is defined as a zero decibel (0 dB) reference
and the measured sound intensity level.
| Threshold of hearing | 0 dB |
| Leaves rustling in the breeze | 20 dB |
| A quiet restaurant | 50 dB |
| Busy Traffic | 70 dB |
| Vacuum cleaner | 80 dB |
| Threshold of pain | 120 dB |
| Jet at takeoff | 140 dB |
Timbre, Tone Color
Sounds have another perceptual attribute: that of timbre or tone quality. We
may describe sounds as being tinny, full, brassy, trumpet-like, etc. Timbre
allows us to identify sounds.
Timbre is defined as that attribute of a sound that allows us to differentiate
between two sounds of the same pitch, intensity and duration. The shape of the
periodic wave producing the sound determines the relative strengths of the
harmonics — or additional higher frequency components. A naturally
occurring sound has a waveshape that is more complex than the sine wave example
in the graphic above.
It used to be thought that timbre was related only to the relative strengths of
the harmonics produced by an instrument, but recent research in computer
synthesis of instruments has shown that the pattern of change over time of each
of the components contributes to timbre. Sound recognition is also dependent
on the sounds that are associated with the attack, for example the noise
at the start of a trumpet sound, and to a lesser extent on the release,
as when a piano key is released.
Digital Audio Basics
Converting analog signals
Here's what happens when sound is recorded digitally:
The analog signal is converted to digital form
The analog signal — a continuous variable defined with infinite precision
— is converted to a discrete sequence of measured values which are
represented digitally.
Aliasing
We sample the signal only at equal time intervals. We don't know what happened
between the samples. Consider a "glitch" that happened to fall between
adjacent samples. Since we don't measure it, we have no way of knowing the
glitch was there at all.
In a less obvious case, we might have signal components that are varying
rapidly in between samples. Again, we could not track these rapid inter-sample
variations.
We must sample fast enough to see the most rapid changes in
the signal.
If we do not sample fast enough, we cannot track completely the most rapid
changes in the signal.
Some higher frequencies can be
incorrectly interpreted as lower ones.
In the diagram, the high frequency signal is sampled just under twice every
cycle. The result is, that each sample is taken at a slightly later part of
the cycle. If we draw a smooth connecting line between the samples, the
resulting curve looks like a lower frequency. This is called aliasing
because one frequency looks like another — it travels under an alias.
Harry Nyquist (1920's) showed that to distinguish unambiguously
between all signal frequency components,
we must sample at least twice the frequency of the
highest frequency component.
In the diagram, the high frequency signal is sampled twice every cycle. If we
draw a smooth connecting line between the samples, the resulting curve looks
like the original signal. This avoids aliasing.
The highest signal frequency allowed for a given sample rate is called the
Nyquist frequency. Here are some standard sampling rates.
| 96 kHz (96,000 samples per second) |
DVD-Audio |
| 48 kHz | DVD-Video, DV cameras, DAT, samplers |
| 44.1 kHz | CD, DAT, samplers |
| 32 kHz, 22.05 | Older samplers |
Most professional-level computer software supports all these rates.
Quantization
When the signal is converted to digital form, the resolution is limited by
the number of bits available (that is, the number of values available to
encode each sample). We refer to this sort of resolution as sample word
length or bit depth.
The diagram shows an analog signal which is then converted to a digital
representation — in this case, with 8-bit word length. The smoothly
varying analog signal can only be represented as a "stepped" waveform due to
the limited resolution. The word length of hardware used for the sampling
process determines the available resolution and dynamic range.
The effect, called quantization error, looks very like low-level random
noise. The signal-to-noise ratio is affected by the number of bits in the data
format.
©2003, Jeffrey Hass, John Gibson, Christopher Cook