| Ambiophonics,
2nd
Edition:
Replacing
Stereophonics
to
Achieve
Concert-Hall
Realism |
| Chapter
3 |
| Ralph
Glasgal |
| October
1999 |
www.ambiophonics.org
Understanding
Stereophonic
Sound Fields
Human
hearing using
two ears is
called
binaural and
was developed
by evolution.
Binaural sound
is what most
of us listen
to all the
time.
Audiophiles
sometimes
think of
binaural sound
as a recording
made with a
dummy head and
played back
through
earphones.
This is a poor
imitation of
the real thing
and is not
what we will
mean when we
refer to the
binaural
hearing
mechanism in
this book.
Stereophonic
sound, by
contrast, is
simply one
man-made
method of
recreating a
remote or
recorded sound
field in a
completely
different
space.
Stereophonic
sound fields
are almost
always
auditioned by
binaural
listeners
whose brains
must then
reconcile the
lack of a
binaural field
with the
presence of a
stereophonic
one. The
commonplace
(but misnamed)
stereophonic
recordings
that normally
consist of two
full-range
unencoded,
discrete
channels, one
left and one
right are
(despite
adjustments by
recording
engineers
based on what
they hear
using studio
stereo
monitors) not
inherently
stereophonic
and therefore
need not
suffer the
ills that
playback via
the stereo
triangle
engenders.
That is, the
microphones
don't know
that the sound
they pick up
is going to be
played back
via two widely
spaced
loudspeakers
and thus none
of the
imperfections
of the stereo
triangle
discussed
below apply to
the recording
before it is
played back.
Although
we later
describe an
Ambiophonically
optimized
recording
microphone
arrangement,
almost any mic
setup used to
produce two
channel
recordings
works
reasonably
well when
reproduced
Ambiophonically.
Indeed one of
the basic
premises of
this book and
the technology
it describes
is that the
usual
two-channel
recorded
program
material
contains
sufficient
information to
allow accurate
simulation of
a binaural
concert-hall
experience.
This is indeed
fortunate
since it
allows the
existing
library of LPs
and CDs to be
reproduced
with
unprecedented
realism and
shows that
multi-channel
mic'ing and
recording
methods, where
music is
concerned, are
actually
counter
productive
according to
the tenets of
binaural
technology.
That
as few as two
channels
should be more
than adequate
can be
intuitively
understood by
simply stating
that if we
deliver the
exact sound
required to
simulate a
live
performance at
the entrance
to each ear
canal, then
since we only
have two ear
canals, we
should only
need to
generate two
such sound
fields. The
questions are
why existing
stereophonic
and earphone
binaural
recording
techniques
fall short,
and what can
be done to
make up for
these
shortcomings
at least where
music
reproduction
is concerned.
Monophonic
Sound
Before
the advent of
stereo
recording we
had
single-channel
or monophonic
recordings.
Most
recordings
were made by
using one or
more
microphones
and mixing
their outputs
together
before cutting
the record or
making a tape.
Such a
monophonic
recording, if
reproduced by
two
loudspeakers,
can be thought
of as a
special case
of
stereophonic
sound
reproduction.
It is the case
where a sound
is the same at
both ears and
the interaural
cross-correlation
factor of the
sound is 1. In
a concert
hall, such a
signal coming
from the stage
is sensed as
coming from
that stage
regardless of
which
direction
concert goers
face.
Let
us now
consider a
listener in
the balcony of
a large hall
during a live
concert. For
this listener,
the angle that
the stage
subtends is
very small.
Both ears get
essentially
the same
signal, the
direct sound
from the stage
is weak
because of
distance, and
the hall
ambience is
strong and
both are
largely the
same at each
ear. Thus, the
players seem
to be remote,
but still
front and
center.
However, the
balcony
listener is
enveloped in a
completely
realistic but
mostly
monophonic
reverberant
field and,
therefore,
hardly notices
that his
ability to
localize left
and right
sounds is
minimal. The
lesson we want
to draw from
this is that
mono
recordings can
be made to
sound every
bit as
realistic in
the home
concert hall
as stereo
recordings, if
you don't mind
the impression
of sitting
further back
in the
auditorium.
The same
applies to
recordings of
solo
instruments
such as the
piano or a
singer
standing in
the curve of
the piano.
(See "Caruso
On Stage"
from www.ambiophonics.org
for advanced
experiments in
mono
reproduction)
The
reproduction
of single
central mono
or panned
sources via
two spaced
front
loudspeakers
is also prone
to exactly the
same crosstalk
effects that
result from
stereophonic
reproduction,
but,
fortunately
the solution
is the same
(see below)
for both mono
and two
channel
recordings. To
summarize, it
is possible to
have realism
without
separation,
via a
combination of
true hall
ambience with
a corrected
front stage
and this is
the main
thrust of the
Ambiophonic
method.
The
Stereophonic
Illusion
There
is a slightly
flawed theory,
still quoted
quite often,
that a perfect
replica of a
given
concert-hall
sound field
can always be
produced by
putting an
infinite
number of
stage-facing
microphones at
the front of
the stage, all
the way up to
the ceiling.
After being
stored on a
recorder with
an infinite
number of
channels, this
recording can
then be played
back through
an infinite
number of
point-source
loudspeakers,
each placed
exactly as its
corresponding
microphone was
placed. The
performance
replication of
such a wall
would not be
perfect,
because the
loudspeakers
would not
radiate sound
with the same
directional
characteristics
as the sound
impinging on
the microphone
and the final
result would
also be
impacted by
the quality of
the room into
which all
these speakers
were radiating
but at least
the stage
would be wide,
have depth,
and be
realistic
sounding.
As
the number of
microphones
and speakers
is reduced,
the quality of
the sound
field being
simulated
suffers. By
the time we
are down to
two channels
height cues
have certainly
been lost and
instead of a
stage that is
audible from
anywhere in
the room we
find that
sources on the
stage are now
only
localizable if
we listen
along a line
equidistant
from the last
two remaining
speakers and
face them.
While there
are many two
channel
speaker
arrangements
possible, the
most popular
two-channel
reproduction
method is the
stereophonic
technique of
reproducing
two-channel
recordings
through two
loudspeakers
with the
listener and
the two
speakers
forming an
equilateral or
wider
isosceles
triangle.
Stereo takes
advantage of
one rather
unnatural
psychoacoustic
illusion,
which is that
as a recorded
sound source
moves on the
stage from the
left to the
right, and as
the playback
signal
likewise
shifts from
the left
speaker to the
right speaker,
most listeners
hear a virtual
sound or
phantom sound
image move
from one
speaker
position to
the other.
Compared to
real life
hearing, the
phantom
audible
illusion does
not move
linearly with
a tendency for
the sound to
jump to the
speaker
location as
the sound
moves to the
side. If
identical
sounds come
from each
speaker, (the
monophonic
case above)
then most
central
listeners hear
a phantom
sound that
hangs in the
air at the
halfway point
on the line
between the
loudspeakers.
Just as there
are some
individuals
who cannot see
optical
illusions, so
there are a
few
individuals
who cannot
hear phantom
images. Just
as optical
illusions are
just
that-illusions
that no
sighted person
would confuse
with a true
three-dimensional
object, so
phantom stereo
illusions
could never be
confused with
a real
acoustical
sound field.
Nevertheless,
for some 70
years this
illusion of
frontal
separation and
space is so
pleasing to
most listeners
that
stereophonic
reproduction
has remained
the standard
music
reproduction
technique ever
since Alan
Dower Blumlein
applied for
his patent at
the end of
1931. (See The
Blumlein
Conspiracy
from www.ambiophonics.org.)
The
illusion
created by
stereo
reproduction
techniques is
far from
perfect, even
if the highest
grade of
audiophile
caliber
reproducing
and recording
equipment is
used. The
first problem
is that the
image of the
stage width is
confined to
the arc that
the listener
sees looking
from one
speaker to the
other.
Occasionally,
an
out-of-phase
sound from the
opposite
loudspeaker,
an accidental
room
reflection, or
a recording
site anomaly
will make an
instrument
appear to come
from beyond
the speaker
position.
These images,
however, are
almost always
ephemeral and
often not
reproducible.
Thus, in non-Ambiophonic
systems, in
order to get a
useful stage
width with
stable
left-right
localization,
the
loudspeakers
must be placed
at a wide
enough angle
to mimic the
angular
proportions of
a concert hall
or theater
stage. As we
shall see in
Chapter 4, it
is better if
speakers are
put closer
together so as
to complement
the pinna part
of the
binaural
hearing
mechanism.
With most
stereo
systems, there
is a
"sweet
spot" at
the point of
the triangle
where the
listening is
best. This,
unfortunately,
is what we are
faced with
when only two
front channels
(or three for
that matter)
are available
and the
"sweet
spot" is
also a
characteristic
of the
Ambiophonic
reproduction
technique
described
below although
the spot is
somewhat
larger and
less critical
in the case of
Ambiophonics.
It is
difficult
enough to
recreate
concert-hall
sounds from
two discrete
recorded
channels (and
even harder
using multi
channels) for
one or two
listeners in
the home,
without trying
to do it for a
whole room
full of
people.
Stereophonic
Crosstalk
By
far the major
defect of
stereophonic
reproduction
is caused by
the presence
of crosstalk
at the
listener's
ears generated
by the
loudspeakers.
Again, the
crosstalk is
an artifact of
stereophonic
reproduction
and is not
present in the
recording. We
will show that
eliminating
this crosstalk
widens the
stereo
soundstage way
beyond the
position of
the
loudspeakers,
eliminates
spurious
frequency
response peaks
and dips (comb
filter
effects), and
allows the
speakers to be
moved much
closer
together
eliminating
the need for
phantom
imaging or a
center
channel.
In
a concert
hall, direct
sound rays
from a
centrally
located
instrument
reach each ear
simultaneously:
one ray per
ear. See
Figure 1,
left. By
contrast, for
a centrally
located
recorded sound
source,
reproduced in
stereo,
identical rays
come from the
right and left
speakers to
the right and
left ears, but
a second pair
of uninvited,
only slightly
attenuated,
longer, right
and left
speaker rays
also passes
around the
nose to the
left and right
ears. See
Figure 1,
right.
Figure
1

Comparison
of live
concert hall
listening
geometry
with home
stereophonic
listening
practice
showing the
additional
unwanted
crosstalk
sound rays
impinging on
the pinna
from too
large an
angle,
thereby,
causing
unrealistic
playback
artifacts.
The
problem is
that these
unwanted rays,
which cross in
front of the
eyes and
diffract
around the
back and top
of the head,
are delayed by
the extra
distance they
travel across
the head. At
its greatest,
this distance
is just under
7 inches. For
an average
distance of
say 3 1/2
inches, it
takes sound
one-quarter of
a millisecond
to do this. A
quarter of a
millisecond is
half the
period and,
therefore,
half the
wavelength of
a 2000 Hz
tone. When two
signals, one
direct and one
a
half-wavelength
delayed, but
of similar
amplitude,
meet at the
ear,
cancellation
will occur. At
4000 Hz the
delay is one
full
wavelength and
the sounds
will add. Thus
at frequencies
from the
octave above
middle C and
up, all sounds
add or
subtract at
the ears to a
greater or
lesser degree,
depending on
the original
sound source
position, the
angle to the
speakers, the
listener's
head position,
nose size and
shape, head
size,
differing path
lengths around
the head, and
other
geometrical
considerations.
Note that if
the sound
source at the
recording
studio or the
listener at
home moves a
few feet or
inches to the
left or right,
a whole new
pattern of
additions and
subtractions
at different
frequencies
will assault
the listener.
This
interference
phenomena is
called comb
filtering, and
largely
explains why
many critical
listeners are
so sensitive
to small
adjustments in
stereo
listening or
speaker
position, and
to relatively
minute
playback
system
electrical and
acoustical
delay or
attenuation
characteristics.
Bock
and Keele
measured comb
filter nulls
as deep as 15
dB for the
60-degree
stereo
loudspeaker
setup. Note
that for
extreme side
images the
comb-filter
effect is
minimal. Thus
the frequency
response of a
normal stereo
setup actually
depends on the
angular
position of
the original
instrument or
singer. As
indicated
above, it is
fascinating
that these
frequency
response
anomalies are
not clearly
audible as
changes in
tone but
rather
manifest
themselves as
imprecisions
in imaging and
a sense that
the music is
canned. But it
is possible to
hear the
change in
timbre caused
by comb
filtering.
Simply play
pink noise
from a test CD
over your
stereo system
and rotate the
balance
control from
hard left to
hard right. As
the image of
the noise
passes thru
the center one
can clearly
hear a drop in
the treble
loudness of
the noise and
a distinct
change in its
character.
As
a possible,
but as yet not
truly proven
example of how
interaural
crosstalk and
the family of
acoustic
notches it
produces can
heighten many
listener's
sensitivity to
component
differences,
the case of
tube-versus-transistor
amplifiers
stands out.
There is often
a subtle but
audible
difference
between a
vacuum-tube
amplifier and
a transistor
amplifier used
alternately in
a given stereo
system that is
still
detectable
even after
distortion,
noise, power
and volume
characteristics
are matched.
This sonic
difference is
usually
described in
terms of the
stereophonic
sound stage
produced. The
image with one
amplifier is
said to be
more
transparent,
wider, deeper,
narrower,
shallower,
more detailed,
less ambient,
or have more
air than the
other.
However, if
you listen to
just one
channel with
just one
speaker and
even better,
one ear, there
is of course
no stereo
effect, and no
crosstalk null
patterns and
so these
audible
differences
evaporate
entirely. The
apparent
difference in
sound-stage
imaging due to
changing from
tube to
transistor
amplifiers
seems due to
the different
output
impedances of
these two
devices,
leading to
subtle but
slightly
audible
changes in the
stereo
crosstalk
sound field.
Vacuum tube
amplifiers
have a higher
output
impedance than
transistor
amplifiers,
sometimes as
high as one or
two ohms. Thus
if two
loudspeakers
have slightly
different
reactive
treble
impedances
(due to
crossover
component or
tweeter
tolerances or
control
settings) or
the amplifier
output
impedances are
not precisely
matched due to
tube aging or
bias drift,
the delay
differences in
the treble
range between
the two
channels will
be
appreciatively
greater in the
high-impedance
vacuum-tube
case than in
the more
constant
voltage
solid-state
case. Note
that a phase
shift
difference
between
speaker sounds
of only a few
degrees can
shift a stereo
crosstalk comb
filter null by
hundreds of
Hertz. Similar
arguments can
be made for
anything in a
system that
changes the
comb-filtering
pattern
including
vacuum tube
amplifier
speaker
cables, if
length and
type are
seriously
mismatched.
Even a small,
one-degree
phase shift
change between
the left and
right channels
at 2000 Hz
will cause a
shift of 71
Hertz in the
position of a
crosstalk null
or peak.
Crosstalk
comb-filter
patterns are
thus a
function of
any asymmetry
in amplifier
output
impedances or
delays,
differential
delays in
cables, or
differential
speaker time
delay by
virtue of
their
positions
relative to
the listening
position or
their
impedance
networks. For
instance, a
vacuum-tube
driven left
midrange
speaker can
interact with
a right
tweeter to
produce
interaural
crosstalk
peaks and
nulls that are
otherwise not
present in the
solid-state
amplifier
case. Such
patterns may
be audible to
some
individuals.
Any changes in
the interaural
crosstalk
pattern are
interpreted by
the brain as a
spatial
artifact such
as more or
less depth,
air, or
hollowness. Of
course, any
change in
listener
position, or
speaker
location
causes similar
shifts in the
crosstalk
peaks and
nulls and
further
complicates
equipment
comparisons by
ear in stereo
or surround
sound. The
irregular
directional
and largely
unpredictable
frequency
response of
the standard
stereophonic
60 to 90
degree
listening
arrangement
would never be
accepted in an
amplifier, a
speaker, or a
cable. Why
such a basic
listening
system defect
continues to
be so
universally
tolerated and
studiously
ignored is
difficult to
fathom.
The
binaural
perception of
directional
cues depends
on both the
relative
loudness of
sound and the
relative time
of arrival of
sound at each
ear. Which
mechanism
predominates
depends on the
frequency and
the direction
of the sound.
Unfortunately,
since these
delay and
stereophonic
comb-filter
artifacts have
an effect
extending from
below 500 Hz
on up, they
very seriously
impact on both
mechanisms and
thus impair
the ability of
the listener
to detect
angular
position with
lifelike ease.
It is also
these crossing
rays that
limit stereo
and surround
sound imaging
to the line
between the
two front
speakers. (See
below) If we
are to achieve
anything close
to
concert-hall
realism, we
must eliminate
these
crosstalk
effects and
provide a
directionally
correct single
ray for each
ear. But first
we will need
to present
evidence of
the
extraordinary
sensitivity of
the ear pinna
to such comb
filter
patterns.
Imaging
Beyond the
Speaker
Positions
Another
problem with
stereophonic
crosstalk is
that it limits
the apparent
stage width.
For sound
sources that
originate,
say, far to
the right of
the right
microphone, we
can
temporarily
ignore the
left channel
microphone
pickup. Then
in the
stereophonic
listening
setup, the
right speaker
will send
unobstructed
sound to the
right ear and
a somewhat
modified
version of the
same sound to
the left ear.
The ear-brain
naturally
localizes this
everyday sound
situation to
the speaker
position
itself. Thus,
no matter how
low the left
channel volume
is, the
recorded image
can never
extend beyond
the right
speaker in
standard
stereo. See
Figure 2. If,
however, the
right speaker
sound ray
crossing over
to reach the
left ear could
be blocked or
attenuated,
then at least
the low and
mid frequency
sound could be
localized to
the extreme
right, well
beyond the
speaker
position and
just where the
recording
microphones
said the
source was
located. (High
frequency
localization
is discussed
in the next
chapter.)
Remember, the
microphones
don't know
that the
playback will
be in stereo
with crosstalk
and therefore
it is not the
recording
setup that
limits stage
width.
Clearly,
eliminating
the extra
sound ray
would result
in wide
spectacular
imaging even
from existing
two channel
media. In
Chapter five
we will
discuss two
methods of
eliminating
crosstalk as
well as doing
away with the
stereo
triangle
altogether.
Figure
2

Images
in
stereophonic
systems are
restricted
to the arc
between the
speakers
because both
ears are
hearing the
same
loudspeaker.
Loudspeaker
Out-of-Phase Effects
In
stereo systems it is
necessary for the
right and left main
speakers to be in
phase or better
expressed be of the
same polarity. Phase
in this case means
that if identical
electrical signals
are applied to each
speaker, the
speakers will both
generate a
rarefaction, or both
generate a
compression in
response to a
simultaneous input
pulse. When a
monophonic recording
is played through a
pair of out-of-phase
loudspeakers, the
sound at the ears
lacks bass, the
phantom center image
is not present, and
a hazy, undefined
sound field seems to
extend far beyond
the speakers to the
extreme sides and
sometimes even
rearward. Similar
effects only
slightly less
pronounced are also
present using two
channel sources.
These
subjective effects
can be better
comprehended now
that we understand
all about stereo
crosstalk. It is
clear that equal but
out-of-phase very
low frequency
signals, with
wavelengths much
longer than the
width of the head
will always arrive
unattenuated and 180
degrees out-of phase
at either ear and
therefore will
always largely
cancel. This factor
accounts for the
thinness of the mono
or central (L+R)
stereo sound.
At
somewhat higher
frequencies the
cancellation is not
total. The left ear
hears pure left
signal from the left
speaker that is
reduced only
somewhat by the now
slightly delayed and
thus only partially
out-of-phase
crosstalk from the
right speaker.
Similarly, at that
same instant the
right ear is hearing
a reduced but pure
right-speaker sound
that is similar in
amplitude but not
identical to the
pure left-ear sound
because the
resultant sounds are
still out-of-phase.
We know that a
midrange frequency
sound heard only in
the right ear seems
to come from the
extreme right and a
sound heard only in
the left ear seems
to come from the
extreme left. This
phenomenon is still
operative even if
the two sounds that
come from the sides
are identical in
amplitude and
timbre. Thus, one
can easily hear two
identical bells as
separate left and
right sound sources.
If, however, we
exchange the bells
for pink noise, then
we can hear the
noise only as
separate sources
when they are not
precisely in step
(uncorrelated).
Since our signals
are out of phase
they are not
identical in time or
highly auto
correlated and
therefore audible as
separate entities.
Thus,
the inadvertent
crosstalk
elimination caused
by out-of-phase
speakers that occurs
at mid frequencies
widens the perceived
sound field. As the
frequency increases,
instead of simple
canceling, the
comb-filtering
effect predominates
and the position of
the images becomes
frequency, and
therefore program,
dependent, changing
so rapidly that no
listener can sort
out this hodgepodge
of constantly
shifting side
images. Most
listeners describe
this effect as
diffuse, unfocussed
or phasy. Even in
Ambiophonics, where
crosstalk is
eliminated the
speakers should
still be properly
phased. In general,
mechanical or
software crosstalk
elimination is not
fully effective or
needed at very low
bass frequencies and
so the bass
out-of-phase
thinness effect,
while much reduced,
remains. In
Ambiophonics, the
audibility of the
out-of-phase effect
is much reduced. The
stage image still
extends from the
speakers outward
when the recording
calls for this. That
is, sound sources at
the extreme right
and left image just
as they do when the
speakers are
in-phase. This makes
sense, since we are,
listening to one
sound source with
one ear.
To
repeat. In the
out-of-phase case,
for most of the
frequency range,
each ear is hearing
a signal that is
distinctive because
the signals are of
opposite polarity
and, therefore the
ear localizes each
sound as originating
from beyond their
respective speakers.
A phantom center
image does not form
and the infamous
hole-in-the-middle
appears. In the
out-of-phase
Ambiophonic case the
speakers are very
close together.
Therefore, the
middle hole is
almost nonexistent
and the bottom line
is that, except for
extreme bass
response, front
speaker phasing or
other timing
anomalies are more
critical in stereo
than in biophonics,
Absolute
Polarity
When
an instrument
produces a sound,
the sound consists
of a series of
alternating
rarefactions and
compressions of air.
The sonic signatures
of such acoustic
musical instruments
are determined by
the pressure and
spacing of these
rarefactions and
compressions.
Electronic recording
and reproduction
have now made it
possible to turn
rarefactions into
compressions and
vice-versa.
The
significance of this
to the problem of
establishing a home
concert hall is not
entirely clear. But
a few people seem to
be able to hear a
difference between
correct and
incorrect polarity.
Therefore, care
should be taken that
all amplifiers,
speakers and
ambience sources,
taken together, do
not invert. Since
acoustic reflectors
in concert halls do
not invert polarity,
the key early
reflections, at
least, should not be
inverted
accidentally in home
reproduction either
and should be
delivered to the
ears with the same
polarity as the
direct sound which
is, one hopes, also
of the correct
absolute polarity.
If
you cannot tell one
polarity from the
other in your own
system, don't
despair. For a few
people, polarity is
only audible when
special test signals
are used. One
possible reason for
difficulty in this
regard is the nature
of many instruments.
A listener to the
left of a violinist
hears one polarity,
while a listener to
the right hears the
other polarity,
assuming the string
is vibrating in the
same plane as the
ears of both
listeners. But no
matter where you
stand around a
trumpet you get the
same polarity. The
inverted polarity
sound in this case
is inside the
trumpet. Indeed it
has been reported
that test subjects
are more likely to
hear polarity
differences where
wind instruments are
involved.
On
balance, one would
have to say that it
does not pay to
agonize over the
absolute polarity
effect unless you
are certain that you
or your friends are
sensitive to it.

|