| The
Ear
Pinna
and
Realism
in Music
Reproduction |
| Commentary |
| Ralph
Glasgal |
| April
1999 |
Those
fluted, rather
grotesque,
protuberances
that extend
out from each
ear canal are
called pinnae.
The importance
of satisfying
one's pinnae
by reproducing
sound fields
that
complement
their complex
nature cannot
be
exaggerated.
Demonstrations
in the early
fifties of
live-versus-recorded
sound were
spectacularly
successful
because all
the sound,
reaching the
audience in a
real concert
hall, came
from an
appropriate
direction,
including the
ambience free
direct sound
from the stage
loudspeakers
and of course
all the
ambient sound
produced by
the hall
itself. Indeed
G.A. Briggs,
of Wharfdale
fame, showed
that in a hall
as large as
Carnegie Hall,
even stereo
was not
essential to
provide pinna
pleasure. Like
fingerprints,
no two
individuals
have exactly
identical ear
pinnae.
Thought to be
vestigial,
even as late
as the mid
20th century,
the intricacy
which
characterizes
these
structures
would suggest
that their
function must
not only be
very important
to the hearing
mechanism but
also that
their working
must be of a
very complex,
personal and
sensitive
nature. For
audiophiles in
search of more
realistic
sound
reproduction,
an
understanding
of how the
pinna, head,
and torso
interact with
stereophonic
or
surround-sound
fields is of
importance
since at the
present time a
major mismatch
exists.
Repairing the
discrepancy
between what
the present
playback
methods
deliver and
what the human
ear pinnae
expect and
require is the
last major
psychoacoustic
barrier to be
overcome, both
in hi-fi music
playback and
in the hot pc
multi-media
field. All
such
applications
are covered by
an audio
engineering
discipline
known as
Auralization.
Auralization
Theory and Its
Ambiophonic
Subset
Auralization
is the process
of generating
or
regenerating
an imaginary
or an existing
acoustic sound
field of an
audible source
in a defined
space by
mathematical
modeling or
direct
recording and
then making
this field
audible in
such a way as
to duplicate
the binaural
listening
experience a
listener would
have had at a
specific
location in
that original
space. As live
music
enthusiasts
rather than
seekers after
virtual
computer
reality, we
are primarily
concerned with
only part of
the general
auralization
problem,
namely the
recreation of
horizontal-staged-acoustic,
usually
musical,
events
recorded in
enclosed
spaces such as
concert halls,
opera houses,
pop venues,
etc., where
the listening
position is
centered,
fixed and
usually close
to the stage.
I have called
this
two-channel
subset of the
broader
auralization
problem
Ambiophonics
because it is
both related
to and a
suitable
replacement
for
stereophonics.
Another way of
stating a
major goal of
Ambiophonics
and describing
a still,
unsolved
problem of
virtual
reality or
surround
auralization
is the
externalization
of the
binaural
earphone
effect. In
brief, this
means
duplicating
the full,
everyday
binaural
hearing
experience,
either via
earphones,
without having
the sound
field appear
to be within
one's head, or
via
loudspeakers,
without losing
either its
directional
clarity or the
"cocktail
party"
effect whereby
one can focus
on a
particular
conversation
despite noise
or other
voices. So far
this goal has
eluded those
researchers
trying to
externalize
the binaural
effect over a
full sphere or
circle, but it
can be done
using
Ambiophonic
methods for
the front half
of the
horizontal
plane.
Externalizing
the Binaural
Effect
It
is intuitively
obvious, as
mathematicians
are fond of
observing,
that
duplicating
the binaural
effect at
home, simply
involves
presenting at
the entrance
of the home
ear canal an
exact replica
of what the
same ear canal
would have
been presented
with at the
live music
event. But to
get to the
entrance of
the ear canal,
almost all
sound over
about 2khz
must first
interact with
the surface of
the pinna. The
pinna of your
ear is in
essence your
own personal
pseudo-random,
multi-frequency,
multi-directional,
encoder or
acoustical
notch filter.
The pinna of
my ear has a
quite
different (and
undoubtedly
superior)
series of
nulls and
peaks than
does yours.
The sound that
finally makes
it to the ear
canal, in the
kilohertz
region, is
subject to
severe
attenuation or
boost,
depending on
the angle from
which the
sound
originates as
well as on its
exact
frequency.
Additionally,
sounds that
come from the
remote side of
the head are
subject to
additional
delay and
filtering by
the head and
torso and this
likewise very
individual
characteristic
is called the
Head-Related
Transfer
Function or
HRTF. In this
article when I
refer to pinna,
this should be
understood to
include the
shadowing,
reflection,
and
diffraction
due to the
head and
torso, and the
resonances in
the pinna
cavities,
particulary
the large bowl
known as the
concha. The
effects of the
head and torso
become
appreciable
starting at
frequencies
around 500 Hz
with the pinna
becoming
active over
1500 Hz.
Because the
many peaks and
nulls are very
close together
and sometimes
very narrow it
is exceedingly
difficult to
make
measurements
using human
subjects, and
not every bit
of fine
structure can
be captured,
particularly
at the higher
frequencies
where the
interference
pattern is
very hard to
resolve.
Because the
peaks or nulls
are so narrow
and also
because a null
at one ear is
likely to be
something else
at the other
ear, we do not
hear these
dips as
changes in
timbre or a
loss or boost
of treble
response, but
as we shall
see the brain
relies on
these
otherwise
inaudible
serrations to
determine
angular
position with
phenomenal
accuracy.
Much
research has
been devoted
to trying to
find an
average pinna
response curve
and an average
HRTF that
could be used
to generate
virtual
reality sound
fields for
military and
commercial use
in computer
simulations,
games, etc. So
far no average
pinna-HRTF
emulation
program has
been found
that satisfies
more than a
minority of
listeners and
none of these
efforts is up
to audiophile
standards.
Remember that
a solution to
this problem
must take into
account the
fact that each
of us has a
different
pattern of
sound
transference
around, over
and under the
head, as well
as differing
pinna.
The
moral of all
this is that
if you are
interested in
exciting,
realistic
sound
reproduction
of concert
hall music, it
does not pay
to try to fool
your pinna. If
a sound source
on a stage is
in the center,
then when that
sound is
recorded and
reproduced at
home it had
better come
from speakers
that are
reasonably
straight ahead
and not from
nearby walls,
or
multi-channel,
surround or
Ambisonic
speakers. The
traditional
equilateral
stereophonic
listening
triangle is
quite
deficient in
this regard.
It causes
ear-brain
image
processing
confusion for
central sound
sources
because
although both
ears get the
same full
range signal
telling the
brain that the
source is
directly
ahead, the
pinnae are
simultaneously
reporting that
there are
higher
frequency
sound sources
at 30=B0 to
the left and
at 30=B0 to
the right. All
listeners will
hear a center
image under
these
conditions,
which is why
stereophonic
reproduction
has lasted 64
years so far,
but almost no
one would
confuse this
center image
with the real
thing.
Unfortunately
a recorded
discrete
center channel
and speaker is
of little help
in this
regard. We
will see later
that such a
solution has
its own
problems and
is an
unnecessary
expense that
does nothing
for the
existing
unencoded
two-channel
recorded
library.
Single
Pinna Phenoma
A
very simple
experiment
demonstrates
the ability of
a single pinna
to sense
direction in
the front
horizontal
plane at
higher
frequencies.
Set up a
metronome or
have someone
tap a glass,
run water, or
shake a rattle
about ten feet
directly in
front of you.
Close your
eyes and
locate the
sound source
using both
ears. Now
block one ear
as completely
as possible
and estimate
how far the
apparent
position of
the sound has
moved in the
direction of
the still-open
ear. Most
audio
practioners
would expect
that a sound
that is only
heard in the
right ear
would seem to
come from the
extreme right,
but you will
find that in
this
experiment the
shift is only
some ten or
twenty
degrees, and
if you have
great pinnae
the source may
not move at
all. This is
one case where
the pinna
directional
detecting
system is
stronger than
the intensity
stereo effect
and explains
why one-eared
individuals
can still
detect sound
source
position. It
also explains
why matrix or
vector
addition
methods, such
as Ambisonics,
which rely on
addition or
cancelation in
the vicinity
of the head at
frequencies in
excess of
1000Hz are
just not good
enough.
Martin
D. Wilde, in
his paper,
"Temporal
Localization
Cues and Their
Role in
Auditory
Perception"
AES
Preprint 3798,
Oct., 1993
states
There
has been
much
discussion
in the
literature
whether
human
localization
ability is
primarily a
monaural or
binaural
phenomena.
But
interaural
differences
cannot
explain such
things as
effective
monaural
localization.
However, the
recognition
and
selection of
unique
monaural
pinna delay
encodings
can account
for such
observed
behaviour.
This is not
to say that
localization
is solely a
monaural
phenomena.
It is
probably
more the
case that
the brain
identifies
and makes
estimates of
a sound's
location for
each ear's
input alone
and then
combines the
monaural
results with
some
higher-order
binaural
processor.
Again,
any
reproduction
system that
does not take
into account
the
sensitivity of
the pinna to
the direction
of incidence
will not sound
natural or
realistic.
Two-eared
localization
is not
superior to
one-eared
localization,
they must both
agree at all
frequencies
for realistic
concert hall
music
reproduction.
Phantom
Images at the
Side
A
phantom front
center image
can be
generated by
feeding
identical
in-phase
signals to
speakers at
the front left
and front
right of a
forward facing
listener. The
surround sound
crowd would be
ecstatic if
they could
produce as
good a phantom
image, to the
side, in the
same simple
way, by
feeding
identical
in-phase
signals just
to a right
front and a
right rear
speaker pair.
Unfortunately,
phantom images
cannot be
panned between
side speakers
the way they
can between
front speakers
without
involving the
other ear
through
speakers
operating
under
Ambisonic or
other
interaural
coding scheme
or by using
dynamic,
individualized
pinna and head
equalization.
The reason
realistic
phantom side
images are
difficult to
generate is
that we are
largely
dealing with a
one-eared
hearing
situation. Let
us assume that
for a right
side sound
only
negligible
sound reaches
the remote
left ear. We
already know
that the only
directional
sensing
mechanism a
one-eared
person has for
higher
frequency
sound is the
pinna
convolution
mechanism.
Thus if a
sound comes
from a speaker
at 45 degrees
to the front,
the pinna will
locate it
there. If, at
the same time,
a similar
sound is
coming from 45
degrees to the
rear, one
either hears
two discrete
sound sources
or one speaker
predominates
and the image
hops backward
and forward
between them.
Of course,
some sound
does leak
around the
head to the
other ear and
depending on
room
reflections,
this affects
every
individual
differently
and
unpredictably.
The
sensitivity of
the ears, even
when working
independently,
to the
direction from
which a sound
originates,
mandates that
to achieve
realistic
Ambiophonic
auralization,
all signals in
the listening
room must
originate from
directions
that will not
confuse the
ear-brain
system. Thus
if a concert
hall has
strong early
reflections
from 55
degrees (as
the best halls
should) then
the home
reproduction
system should
similarly
launch such
reflections
from
approximately
this
direction. In
the same vein,
much stage
sound,
particularly
that of
soloists,
originates in
the center
twenty degrees
or so more
often than at
the extremes.
Thus it makes
more sense to
move the
front-channel
speakers to
where the
angle to the
listening
position is on
the order of
five to
fifteen
degrees
instead of the
usual thirty.
This
eliminates
most of the
pinna angular
position
distortion but
does limit the
maximum
perceived
stage width to
about 120=B0,
which is
double the
normal stereo
stage-image
width.
Remember that
in an
Ambiophonic
sound field a
slightly
narrowed stage
is simply
equivalent to
moving back a
few rows in
the auditorium
and has not
proven to be
noticeable
with most
recordings. In
the same vein,
simulated or
recorded early
reflections or
reverberant
tails from the
sides or rear
of a concert
hall should
either not
come to the
ears from the
front main
speakers at
all or should
be kept at as
low a level as
possible.
Pinna
Considerations
in Binaural or
Stereo
Recording
The
pinna must be
taken into
account when
recordings are
made,
particularly
recordings
made with
dummy heads.
For example,
if a
dummy-head
microphone has
molded ear
pinnae then
such a
recording will
only sound
exceptionally
realistic if
played back
through
earphones that
fit inside the
ear canal.
Even then,
since each
listener's
pinnae are
different from
the ones on
the
microphone,
most listeners
will not
experience an
optimum
binaural
effect. On the
other hand, if
the dummy head
does not have
pinnae, then
the recording
should either
be played back
Ambiophonically,
using
loudspeakers,
or through
earphones that
stand out from
the ears far
enough to
excite the
normal pinna
effect. (As in
the IMAX
system,
loudspeakers
can then be
used to
provide the
lost bass.)
But
one must also
take into
account the
head-related
effects as
well. Thus if
one uses a
dummy head
microphone
without pinnae,
then listening
with
loudspeakers
or
off-the-head
earspeakers
will produce
image
distortion,
due to the
doubled
transmission
around, over
and under both
the microphone
head and the
listener's
head. Even if
we go back to
a microphone
with pinnae,
and use
in-the-ear-canal
phones a
particular
listeners HRTF
is not likely
to match that
of the dummy
head. Until a
personalized
binaural
system is
created,
binaural
recordings for
earphone-only
listening are
not likely to
fulfill their
promise.
Again, one
alternative
used in the
Sony IMAX
system is to
use
off-the-ear
earphones and
loudspeakers
simultaneously.
The surround
loudspeakers
provide the
personal pinna
response and
HRTF cues for
both front and
rear sounds
while the
earphones take
care of the
intra-aural
part of the
field or
perhaps more
importantly,
insure that
even
listeners,
with theater
seats off
center, still
hear an image
that matches
the action.
This method is
great for
applications
where
360-degree
direct-sound
sources need
to be
reproduced, as
in movies. But
as we shall
see, IMAX, as
well as other
costly
surround sound
methods, are
both
unnecessary
and even
counterproductive
when
reproducing
staged musical
events.
Pinna
Foolery or
Feet of
Klayman
Arnold
Klayman (SRS,
NuReality) has
gamely tackled
the
essentially
intractable
problem of
manipulating
parts of a
stereo signal
to suit the
angular
sensitivity of
the pinna,
while still
restricting
himself to
just two
loudspeakers.
To do this, he
first attempts
to extract
those ambient
signals in the
recording that
should
reasonably be
coming to the
listening
position from
the side or
rear sides.
There is
really no
hi-fi way to
do this, but
let us assume,
for argument's
sake, that the
difference
signal (l-r)
is good enough
for this
purpose,
particularly
after some
Klayman
equalization,
delay and
level
manipulation.
This extracted
ambient
information,
usually mostly
mono by now,
must then be
passed through
a filter
circuit that
represents the
side pinna
response for
an average
ear. Since
this pinna-corrected
ambience
signal is to
be launched
from the main
front
speakers,
along with the
direct sound,
in theory,
these modified
ambience
signals should
be further
corrected by
subtracting
the front
pinna response
from them. The
fact that all
this
legerdemain
produces an
effect that
many listeners
find pleasing
is an
indication
that the
pinnae have
been seriously
impoverished
by Blumlein
stereo for far
too long, and
is a tribute
to Klayman's
extraordinary
perseverance
and ingenuity.
While
Klayman's
boxes cost
relatively
little and are
definitely
better than
doing nothing
at all about
pinna
distortion,
any method
that relies on
average pinna
response or,
like matrixed
forms of
surround
sound,
attempts to
separate early
reflections,
reverberant
fields or
extreme side
signals from
standard or
matrixed
stereo
recordings of
music is
doomed to only
minor success.
The Klayman
approach must
also consider
that an
average HRTF
is also
required and
should be used
when moving
side images to
the front
speakers.
Someday we
will all be
able to get
our own
personal pinna
and HRTF
responses
measured and
stored on
CD-ROM for use
in Klayman
type-synthesizers,
but until
then, the
bottom line,
for
audiophiles,
is that the
only way to
minimize pinna
and
head-induced
image
distortion is
to give the
pinnae what
they are
listening for.
This means
launching all
signals as
much as is
feasible from
the directions
nature
intended and
requires that
pure ambient
signals such
as early
reflections
and hall
reverberation
(uncontaminated
with direct
sound) come
from
additional
speakers,
appropriately
located. It
implies that
recorded
ambient
signals
inadvertently
coming from
the front
channels have
not been
enhanced to
the point
where the
anomaly of
rear hall
reverb coming
strongly from
up front
causes
subconscious
confusion.
(Most CDs and
LPs are fine
in this regard
but would be
improved by a
more
Ambiophonic
recording
style.) It
means that
strong room
reflections
that allow
almost
undelayed
direct sound
to hit the
listener from
the wrong
angle, or
allow early
reflections to
come from the
sides, the
ceiling, the
floor or the
rear wall,
have been
eliminated
through
inexpensive
and simple
room treatment
and/or the use
of focused
(point source
or collimated)
loudspeakers.
Finally it
means moving
the left and
right main
loudspeakers
much closer
together, as
discussed
above.
Two-Eared
Pinnae Effects
So
far we have
been
considering
single ear and
head response
effects. Now
we want to
examine the
even more
dramatic
contribution
of both pinnae
and the head,
jointly, to
the interaural
hearing
mechanism that
gives us such
an accurate
ability to
sense
horizontal
angular
position.
William B.
Snow, a
one-time Bell
Telephone Labs
researcher, in
1953, and
James Moir in
Audio
Magazine, in
1952, reported
that for
impulsive
clicks or
speech and, by
extension,
music,
differences in
horizontal
angular
position as
small as one
degree could
be perceived.
For a source
only one
degree off
dead ahead we
are talking
about an
arrival-time
difference
between the
ears of only
about ten
microseconds
and an
intensity
difference
just before
reaching the
ears so small
as not to
merit serious
consideration.
Moir went even
further and
showed that
with the sound
source indoors
(even at a
distance of 55
feet!), and
using sounds
limited to the
frequency band
over 3000 hz,
that the
angular
localization
got even
better,
approaching
half a degree.
It appears
that when it
comes to the
localization
of sounds like
music, the ear
is only
slightly less
sensitive than
the eyes in
the front
horizontal
plane.
It
is not a
coincidence
that the ear
is most
accurate in
sensing
position in
the high
treble range,
for this is
the same
region where
we find the
extreme
gyrations in
peaks and
nulls due to
pinna shape
and head
diffraction.
This is also
the frequency
region where
interaural
intensity
differences
have long been
claimed to
govern
binaural
perception.
However, it is
not the simple
amplitude
difference in
sound arriving
at the outer
ears that
matters, but
the difference
in the sound
at the
entrance to
the ear canal
after pinna
convolution
(another
favorite term
of the
auralization
fraternity).
Going even
further, at
frequencies in
excess of 2000
hz it is not
the average
intensity that
matters but
the
differences in
the pattern of
nulls and
peaks between
the ears that
allow the
two-eared
person to
locate sounds
better than
the one-eared
individual.
Remember that
at these
higher audible
frequencies,
direct sounds
bouncing off
the various
surfaces of
the pinna add
and subtract
at the
entrance to
the ear canal.
This random
and almost
unplottable
concatenation
of hills and
deep valleys
is further
complicated by
later but
identical
sound that
arrives from
hall (but
hopefully not
home) wall
reflections or
from over,
under, the
front of, or
the back of
the head. This
pattern of
peaks and
nulls is
radically
different at
each ear canal
and thus the
difference
signal between
the ears is a
very leveraged
function of
both frequency
and source
position. In
their action a
pair of pinnae
are
exquisitely
sensitive
mechanical
amplifiers
that convert
small changes
in incident
sound angles
to dramatic
changes in the
fixed unique,
picket fence,
patterns that
each
individual's
brain has
learned to
associate with
a particular
direction.
Another way of
describing
this process
is to say that
the pinna
converts small
differences in
the angle of
sound
incidence into
large changes
in the shape
of complex
waveforms by
inducing large
shifts in the
amplitude and
even the
polarity of
the sinewave
components of
such
waveforms.
(Martin
D.Wilde, see
above, also
posits that
the pinna
generate
differential
delays or what
amount to
reflections or
echos of the
sound reaching
the ear and
that the brain
is also adept
at recognizing
these echo
patterns and
using them to
determine
position.
Since such
temporal
artifacts
would be on
the order of a
few
microseconds
it seems
unlikely that
the brain
actually make
use of this
time delay
data.)
Depth
and Angular
Perception at
Higher
Frequencies
To
put the
astonishing
sensitivity of
the ear in
perspective, a
movement of
one degree in
the vicinity
of the median
plane (the
vertical plane
bisecting the
nose)
corresponds to
a differential
change in
arrival time
at the ears of
only 8
microseconds.
Eight
microseconds
can be
compared to a
frequency of
120,000Hz or a
phase shift of
15=B0 at 5kHz.
I think we can
all agree that
the ear-brain
system could
not possibly
be responding
to such
differences
directly. But
when we are
dealing with
music that is
rich in
high-frequency
components, a
shift of only
a few
microseconds
can cause a
radical shift
in the
frequency
location,
depths, and
heights of the
myriad peaks
and nulls
generated by
the pinnae in
conjunction
with the HRTF.
To repeat, it
is clear that
very large
amplitude
changes
extending over
a wide band of
frequencies at
each ear and
between the
ears can and
do occur for
small source
or head
movements. It
is these gross
changes in the
fine structure
of the
interference
pattern that
allow the ear
to be so
sensitive to
source
position.
Thus,
just
considering
frequencies
below 10kHz,
at least one
null of 30db
is possible
for most
people at even
shallow source
angles, for
the ear facing
the sound
source. Peaks
of as much as
10db are also
common. The
response of
the ear on the
far side of
the head is
more irregular
since it
depends on
head, nose and
torso shapes
as well as
pinna
convolution.
One can easily
see that a
relatively
minute shift
in the
position of a
sound source
could cause a
null at one
ear to become
a peak while
at the same
time a peak at
the other ear
becomes a null
resulting in
an interaural
intensity
shift of 40db!
When we deal
with broadband
sounds such as
musical
transients,
tens of peaks
may become
nulls at each
ear and
viceversa,
resulting in a
radical change
in the
response
pattern, which
the brain then
interprets as
position or
realism rather
than as
timbre.
In
setting up a
stereo
listening
system, it is
not possible
to achieve a
realistic
concert hall
sound field
unless the
cues provided
by the pinnae
at the higher
frequencies
match the cues
being provided
by the lower
frequencies of
the music.
When the pinna
cues don't
match the
interaural low
frequency
amplitude and
delay cues,
the brain
decides that
the music is
canned or that
the
reproduction
lacks depth,
precision,
presence, and
palpability or
is vague,
phasey, and
diffuse. But
even after
insuring that
our pinnae are
being properly
serviced,
other problems
are inherent
in the old
stereo or new
multi-channel
surround-sound
paradigms. We
must still
consider and
eliminate the
psychoacoustic
confusion that
always arises
when there are
two or more
spaced
loudspeakers
delivering
information
about the same
stage position
but
communicating
with both
pinnae and
both ear
canals. We
must deal with
non-pinna
induced
comb-filter
effects and
the
stage-width
limitations
still inherent
in these
modalities
even after 64
years. But
this is a
subject for
other web
pages.

|