Ambiophonics, 2nd Edition: Replacing Stereophonics to Achieve Concert-Hall Realism
Chapter 4
Ralph Glasgal
September 2000

www.ambiophonics.org

Pinna Power

Those fluted, rather grotesque, protuberances that extend out from each ear canal are called pinnae. The importance of satisfying one's pinnae by reproducing sound fields that complement their complex nature cannot be exaggerated. Like fingerprints, no two individuals have exactly identical ear pinnae. Thought to be vestigial, even as late as the mid 20th century, the intricacy which characterizes these structures would suggest that their function must not only be very important to the hearing mechanism but also that their working must be of a very complex, personal and sensitive nature. For audiophiles in search of more realistic sound reproduction, an understanding of how the pinna, head, and torso interact with stereophonic or surround-sound fields is of importance since at the present time a major mismatch exists. Repairing the discrepancy between what the present recording and playback methods deliver and what the human ear pinnae expect and require is the last major psychoacoustic barrier to be overcome, both in hi-fi music reproduction and in the hot PC multi-media field.

We wish to duplicate the normal biological binaural listening experience a listener would have had at a specific location in that original space. As live music enthusiasts rather than seekers after virtual computer reality, we are concerned with the recreation of horizontal-staged-acoustic, usually musical, events recorded in enclosed spaces such as concert halls, opera houses, pop venues, etc., where the listening position is centered, fixed and usually close to the stage. I have called this two-channel subset of the broader 360-degree movie requirement Ambiophonics because it is both related to and a suitable replacement for stereophonics. Another way of stating a major goal of Ambiophonics and describing a still, unsolved problem of virtual reality or surround auralization is the externalization of the binaural earphone effect. In brief, this means duplicating the full, everyday binaural hearing experience, either via earphones, without having the sound field appear to be within one's head, or via loudspeakers, without losing either binaural's directional clarity or the "cocktail party" effect whereby one can focus on a particular conversation despite noise or other voices. So far this goal has eluded those researchers trying to externalize the binaural effect over a full sphere or circle, but it can be done using Ambiophonic methods for the front half of the horizontal plane.

Pinnae as Direction Finders

It is intuitively obvious, as mathematicians are fond of observing, that duplicating the binaural effect at home, simply involves presenting at the entrance of the home ear canal an exact replica of what the same ear canal would have been presented with at the live music event. But to get to the entrance of the ear canal, almost all sound over about 1.5 kHz must first interact with the surface of a pinna. Each pinna of your ear is in essence your own personal high frequency direction finder. The pinna of my ear produces a quite different (and undoubtedly superior) series of nulls and peaks than does yours. The sound that finally makes it to the entrance of the ear canal, in the kilohertz region, is subject to severe attenuation or boost, depending on the angle from which the sound originates as well as on its exact frequency. Additionally, sounds that come from the remote side of the head are subject to additional delay and filtering by the head and torso and this likewise very individual head plus pinna characteristic is called the Head-Related Transfer Function or HRTF. In this book I will try to distinguish between the functions of one pinna alone, both pinna working together, the HRTF without any pinna effects, and finally the whole enchilada which is understood to include the shadowing, reflection, and diffraction due to the head and torso, and all the resonances and delays in the pinna cavities, particularly the large bowl known as the concha.

The effects of the head and torso become appreciable starting at frequencies around 500 Hz with the pinna becoming extremely active over 1500 Hz. Because the many peaks and nulls of the HRTF are very close together and sometimes very narrow it is exceedingly difficult to make measurements using human subjects, and not every bit of fine structure can be captured, particularly at the higher frequencies where the interference pattern is very hard to resolve. Figure 4.1 shows a series of measurements recorded by Henrik Moller made using a small microphone placed right at the entrance to the ear canals for several subjects. As the sound source moves about the head both the variety and the complexity of the response is plainly evident. One can also see the obvious variation between different auditors. Note that when the sound source is at the far side of the head the curves include the head shadowing frequency response. Because the peaks or nulls are so narrow and also because a null at one ear is likely to be something else at the other ear, we do not hear these dips as changes in timbre or a loss or boost of treble response, but, as we shall see, the brain relies on these otherwise inaudible serrations to determine angular position with phenomenal accuracy.

Much research has been devoted to trying to find an average pinna response curve and an average HRTF that could be used to generate virtual reality sound fields for military and commercial use in computer simulations, games, etc. So far no average pinna-HRTF emulation program has been found that satisfies more than a minority of listeners and none of these efforts is up to audiophile standards. Remember that a solution to this problem must take into account the fact that each of us has a different pattern of sound transference around, over and under the head, as well as differing pinna.

The moral of all this is that if you are interested in exciting, realistic sound reproduction of concert hall music, it does not pay to try to fool your pinna. If a sound source on a stage is in the center, then when that sound is recorded and reproduced at home it had better come from speakers that are reasonably straight ahead and not from nearby walls, surround or Ambisonic speakers. The traditional equilateral stereophonic listening triangle is quite deficient in this regard. It causes ear-brain image processing confusion for central sound sources because although both ears get the same full range signal telling the brain that the source is directly ahead, the pinnae are simultaneously reporting that there are higher frequency sound sources at 30 to the left and at 30 to the right. All listeners will hear a center image under these conditions, which is why stereophonic reproduction has lasted 70 years so far, but almost no one would confuse this center image with the real thing. Unfortunately, a recorded discrete center channel and speaker is of little help in this regard. We will see later that such a solution has its own problems and is an unnecessary expense that does nothing for the existing unencoded two-channel recorded library.

Testing Your Single Pinna Power

A very simple experiment demonstrates the ability of a single pinna to sense direction in the front horizontal plane at higher frequencies. Set up a metronome or have someone tap a glass, run water, or shake a rattle about ten feet directly in front of you. Close your eyes and locate the sound source using both ears. Now, keeping your eyes closed, block one ear as completely as possible and estimate how far the apparent position of the sound has moved in the direction of the still-open ear. Most audio practitioners would expect that a sound that is only heard in the right ear would seem to come from the extreme right, but you will find that in this experiment the shift is seldom more than 5 degrees, and if you have great pinnae the source may not move at all. A variation of this experiment is to spin around with your eyes closed and then see how close you come to locating the sound source. In this case the shadowing effect of the head assists the pinna in the process until you are facing the source head on. These are both cases where the single pinna directional detecting system is stronger than the interaural intensity effect and explains why one-eared individuals can still detect sound source positions.

Another moral of this experiment is that for most people, over the higher audible frequency range, which includes most musical transients and harmonics, the one-eared pinna/head directional sense is easily a match for the interaural or two-eared-intensity-time difference localization mechanism. Therefore, all recorded music signals, including direct sound, early reflections, and reverberation had better come from directions that please the pinnae, if you want your brain to accept the listening experience as real.

If you now switch to a fuller range music source, such as a small radio, and repeat the experiment above you will likely hear a greater image shift, since the external ear and head are less important to sound localization as the sound gets down to 400 Hz or so. Even the best stereo systems that seemingly have great localization based on lower frequency interaural time and intensity cues, still sound naggingly unrealistic because of the conflict between the interaural and the intraaural localization mechanisms inherent in the old fashioned stereo triangle.

The Department of the Interior

Eliminate the outer ears, and all the sound will appear to originate inside your head. Do you doubt this? Then open your mouth and hum or sing with your mouth open. You will hear this sound coming from the lip area. Now put both hands over your ears and the sound will jump up into the middle of your skull. Every child has tried this at one time except maybe you. What the effect illustrates is that in the complete absence of pinna and head shape filtering, the brain makes the only perfectly logical decision it can based on the sonic facts. That is, that the sound must originate from a point on the brain side of the eardrum, for how otherwise could the sound have avoided being modified by the pinna, the head, and the ear canal.

Now while listening to running water or other transient rich sound, bring the flat palms of your hands to within a half-inch of both your ears. You will hear the character of the sound change, usually in a manner that makes the sound seem closer to you. The presence of the additional mass and enclosed air trapped between your palm and ear interferes with the resonances in the cavities of the pinna and changes what you think you hear. 

These effects, are why it is so difficult to get a natural externalized sound image using earphones. In-the-ear-canal phones, while quite realistic compared to stereo, are especially prone to producing very pronounced internalization. Again, it does not pay to fool pinna nature and that is why the Ambiophonic method limits itself to using loudspeakers.

I Am Not Alone

Martin D. Wilde, in his paper, "Temporal Localization Cues and Their Role in Auditory Perception" AES Preprint 3798, Oct., 1993 states:

"There has been much discussion in the literature whether human localization ability is primarily a monaural or binaural phenomena. But interaural differences cannot explain such things as effective monaural localization. However, the recognition and selection of unique monaural pinna delay encodings can account for such observed behavior. This is not to say that localization is solely a monaural phenomenon. It is probably more the case that the brain identifies and makes estimates of a sound's location for each ear's input alone and then combines the monaural results with some higher-order binaural processor."

Again, any reproduction system that does not take into account the sensitivity of the pinna to the direction of music incidence will not sound natural or realistic. Two-eared localization is not superior to one-eared localization, they must both agree at all frequencies for realistic concert hall music reproduction.

Pinna and Phantom Images at the Sides

A phantom front center image can be generated by feeding identical in-phase signals to speakers at the front left and front right of a forward facing listener. Despite the inferiority of the phantom illusion, the surround sound crowd would be ecstatic if they could pan as good a phantom image, to the side, in a similar way, by feeding in-phase signals just to a right front and a right rear speaker pair. Unfortunately, phantom images cannot be panned this way between side speakers. The reason realistic phantom side images are difficult to generate is that we are largely dealing with a one-eared hearing situation. Let us assume that for a right side sound only negligible sound is reaching the remote left ear. We already know that the only directional sensing mechanism a one-eared person has for higher frequency sound is the pinna convolution mechanism. Thus if a sound comes from a speaker at 45 degrees to the front, the pinna will locate it there. If, at the same time, a similar sound is coming from 45 degrees to the rear, one either hears two discrete sound sources or one speaker predominates and the image hops backward and forward between them. Of course, some sound does leak around the head to the other ear and depending on room reflections, this affects every individual differently and unpredictably. One can also use Ambisonic or HRTF processing to position side virtual images but such methods usually do not sound realistic where music is concerned.

Apparent Front Stage Width

The sensitivity of the ears to the direction from which a sound originates, mandates that to achieve realistic Ambiophonic reproduction, all signals in the listening room must originate from directions that will not confuse the ear-brain system. Thus if a concert hall has strong early reflections from 55 degrees (as the best halls should) then the home reproduction system should similarly launch such reflections from approximately this direction. In the same vein, much stage sound, particularly that of soloists, originates in the center twenty degrees or so more often than at the extremes. Thus it makes more sense to move the front-channel speakers to where the angle to the listening position is on the order of ten degrees instead of the usual thirty. This eliminates most of the pinna angular position distortion.

One might suppose that, if a main speaker is in front, that sounds that are meant to image to the extreme sides will suffer from pinna angle distortion and that we will just have traded the central pinna angle error of the stereo triangle for the side pinna angle error of Ambiophonics. But if you look at the curves of Figures 4.1 and 4.2 you will see that at the wider angles beyond say 60 degrees a sound coming from the side has a clear shot at the entrance to the ear canal and thus pinna curve is relatively flat and therefore minimal. In practice Ambiophonics easily produces easy to listen to images out to 85 degrees either side of center.

 

 

 

 

 

 

 

It should also be remembered that, in an Ambiophonic sound field, a seemingly narrower stage is simply equivalent to moving back a few rows in the auditorium and so has not proven to be noticeable. In the same vein, the sensitivity of the pinnae to the directions from which any sound comes dictates that reconstructed or recorded early reflections or reverberant tails attributed to the sides or rear of a concert hall should not come to the home ears from the main front speakers.

 

Pinna Considerations in Binaural or Stereo Recording

The pinna must be taken into account when recordings are made, particularly recordings made with dummy heads. For example, if a dummy-head microphone has molded ear pinnae then such a recording will only sound exceptionally realistic if played back through earphones that fit inside the ear canal. Even then, since each listener's pinnae are different from the ones on the microphone, most listeners will not experience an optimum binaural effect. On the other hand, if the dummy head does not have pinnae, then the recording should either be played back Ambiophonically, using loudspeakers, or through earphones that stand out in front of the ears far enough to excite the normal pinna effect. (As in the IMAX system, loudspeakers can then be used to provide the lost bass.)

But one must also take into account the head-related effects as well. Thus if one uses a dummy head microphone without pinnae, then listening with stereo spaced loudspeakers would produce side image distortion, due to the doubled transmission around, over and under both the microphone head and the listener's head.

The Rule Is:

In any recording/reproduction chain there should be only one set of Pinnae and it better be yours and only one but at least one head which need not necessarily be yours.

Normal two channel recordings LP or CD or DVD are not inherently old stereo. No recording engineer takes into account the crosstalk and the pinna response errors in reproduction when microphones are selected and spaced. Panning equations used to shift sonic images, likewise, seldom consider the full extent of HRTF effects. This is fortunate since the existing library of recordings is thus not obsoleted in the slightest where Ambiophonic reproduction and the pinna are concerned.

Pinna Foolery or Feet of Klayman

Arnold Klayman (SRS, NuReality) (and many other companies) has gamely tackled the essentially intractable problem of manipulating parts of a stereo signal to suit the angular sensitivity of the pinna, while still restricting himself to just two loudspeakers. To do this, he first attempts to extract those ambient signals in the recording that should reasonably be coming to the listening position from the side or rear sides. There is really no hi-fi way to do this, but let us assume, for argument's sake, that the difference signal (l-r) is good enough for this purpose, particularly after some Klayman equalization, delay and level manipulation. This extracted ambient information, usually mostly mono by now, must then be passed through a filter circuit that represents the side pinna response for an average ear. Since this pinna-corrected ambience signal is to be launched from the main front speakers, along with the direct sound, these modified ambience signals are further corrected by subtracting the front pinna response from them. The fact that all this legerdemain produces an effect that many listeners find pleasing is an indication that the pinnae have been seriously impoverished by Blumlein stereo for far too long, and is a tribute to Klayman's extraordinary perseverance and ingenuity.

While Klayman's and other similar boxes cost relatively little and are definitely better than doing nothing at all about pinna distortion, any method that relies on average pinna response or, like matrixed forms of surround sound, attempts to separate early reflections, reverberant fields or extreme side signals from standard or matrixed stereo recordings of music is doomed to only minor success. The Klayman approach must also consider that an average HRTF is also required and should be used when launching side images from the front speakers. Someday we will all be able to get our own personal pinna and HRTF responses measured and stored on CD-ROM for use in Klayman type-synthesizers, but until then, the bottom line, for audiophiles, is that the only way to minimize pinna and head-induced image distortion is to give the pinnae what they are listening for. This means launching all signals as much as is feasible from the directions nature intended and requires that pure ambient signals such as early reflections and hall reverberation (uncontaminated with direct sound) come from additional speakers, appropriately located. It implies that recorded ambient signals, inadvertently coming from the front channels, have not been unduly enhanced to the point where the anomaly of rear hall reverb coming strongly from up front causes subconscious confusion. (Most CDs and LPs are fine in this regard but would be improved by a more Ambiophonic recording style.) It means that strong room reflections that allow almost undelayed direct sound to hit the listener from the wrong angle or allow early reflections to come from the sides, the ceiling, the floor or the rear wall, have been eliminated through inexpensive and simple room treatment and/or through the use of focused (point source or collimated) loudspeakers. Finally it means moving the left and right main loudspeakers much closer together, as discussed in the following chapters.

Two-Eared Pinnae Effects

So far we have been considering single ear and head response effects. Now we want to examine the even more dramatic contribution of both pinnae and the head, jointly, to the interaural hearing mechanism that gives us such an accurate ability to sense horizontal angular position. William B. Snow, a one-time Bell Telephone Labs researcher, in 1953, and James Moir of CBS in Audio Magazine, in 1952, reported that for impulsive clicks or speech and, by extension, music, differences in horizontal angular position as small as one degree could be perceived. For a source only one degree off dead ahead we are talking about an arrival-time difference between the ears of only about ten microseconds and an intensity difference just before reaching the ears so small as not to merit serious consideration. Moir went even further and showed that with the sound source indoors (even at a distance of 55 feet!), and using sounds limited to the frequency band over 3000 Hz, that the angular localization got even better, approaching half a degree. It appears that when it comes to the localization of sounds like music, the ear is only slightly less sensitive than the eyes in the front horizontal plane.

It is not a coincidence that the ear is most accurate in sensing position in the high treble range, for this is the same region where we find the extreme gyrations in peaks and nulls due to pinna shape and head diffraction. This is also the frequency region where interaural intensity differences have long been claimed to govern binaural perception. However, it is not the simple amplitude difference in sound arriving at the outer ears that matters, but the difference in the sound at the entrance to the ear canal after pinna convolution.

Going even further, at frequencies in excess of 2000 Hz it is not the average intensity that matters but the differences in the pattern of nulls and peaks between the ears that allow the two-eared person to locate sounds better than the one-eared individual. Remember that at these higher audible frequencies, direct sounds bouncing off the various surfaces of the pinna add and subtract at the entrance to the ear canal. This random and almost unplottable concatenation of hills and deep valleys is further complicated by later but identical sound that arrives from hall (but hopefully not home) wall reflections or from over, under, the front of, or the back of the head. This pattern of peaks and nulls is radically different at each ear canal and thus the difference signal between the ears is a very leveraged function of both frequency and source position. In their action a pair of pinnae are exquisitely sensitive mechanical amplifiers that convert small changes in incident sound angles to dramatic changes in the fixed unique, picket fence, patterns that each individual's brain has learned to associate with a particular direction.

Another way of describing this process is to say that the pinna converts small differences in the angle of sound incidence into large changes in the shape of complex waveforms by inducing large shifts in the amplitude and even the polarity of the sinewave components of such waveforms. (Martin D.Wilde, see above, also posits that the pinna generate differential delays or what amount to micro reflections or echoes of the sound reaching the ear and that the brain is also adept at recognizing these echo patterns and using them to determine position. Since such temporal artifacts would be on the order of a few microseconds it seems unlikely that the brain actually makes use of this time delay data.)

Angular Perception at Higher Frequencies

To put the astonishing sensitivity of the ear in perspective, a movement of one degree in the vicinity of the median plane (the vertical plane bisecting the nose) corresponds to a differential change in arrival time at the ears of only 8 microseconds. Eight microseconds can be compared to a frequency of 120,000Hz or a phase shift of 15 degrees at 5kHz. I think we can all agree that the ear-brain system could not possibly be responding to such differences directly. But when we are dealing with music that is rich in high-frequency components, a shift of only a few microseconds can cause a radical shift in the frequency location, depths, and heights of the myriad peaks and nulls generated by the pinnae in conjunction with the HRTF. To repeat, it is clear that very large amplitude changes extending over a wide band of frequencies at each ear and between the ears can and do occur for small source or head movements. It is these gross changes in the fine structure of the interference pattern that allow the ear to be so sensitive to source position.

Thus, just considering frequencies below 10kHz, at least one null of 30db is possible for most people at even shallow source angles, for the ear facing the sound source. Peaks of as much as 10db are also common. The response of the ear on the far side of the head is more irregular since it depends on head, nose and torso shapes as well as pinna convolution. One can easily see that a relatively minute shift in the position of a sound source could cause a null at one ear to become a peak while at the same time a peak at the other ear becomes a null resulting in an interaural intensity shift of 40db! When we deal with broadband sounds such as musical transients, tens of peaks may become nulls at each ear and vice versa, resulting in a radical change in the response pattern, which the brain then interprets as position or realism rather than as timbre.

In setting up a home listening system, it is not possible to achieve a realistic concert hall sound field unless the cues provided by the pinnae at the higher frequencies match the cues being provided by the lower frequencies of the music. When the pinna cues don't match the interaural low frequency amplitude and delay cues, the brain decides that the music is canned or that the reproduction lacks depth, precision, presence, and palpability or is vague, phasey, and diffuse. But even after insuring that our pinnae are being properly serviced, other problems are inherent in the old stereo or new multi-channel surround-sound paradigms. We must still consider and eliminate the psychoacoustic confusion that always arises when there are two or three widely spaced front loudspeakers delivering information about a stage position but erroneously communicating with both pinnae and both ear canals. We must deal with non-pinna induced comb-filter effects and the stage-width limitations still inherent in these modalities even after 64 years. But this is a subject for the next chapter.

Ch. 1 Ch. 2 Ch. 3 Ch. 4 Ch. 5 Ch. 6 Ch. 7 Ch. 8