3D Audio is dependent on how the sound is processed when it reaches and goes through a listener's head. The geometry of a listener's anatomy, from upper torso to the shape of the head and ears, will a affect various qualities of the final sound. As a result, each person will hear sounds differently and thus rely on varying personal optimizations for audio localization. Rather than set up and calibrate the model for each individual who would like to experience virtual 3D Audio, a resource intensive and nigh impossible feat for the likes of game developers, we attempt to find a non-personalized model that can deliver the 3D Audio effect through a generalized model.
Using direct auditory devices such as headphones presents a different set of working conditions compared to speakers. Because headphones play sound directly into a listener's ears, the sound will bypass a majority of the physical factors to which environmental sounds are subject. Without this processing, the human brain is unable to pick up precise directional cues. As a result, without optimization the source of sounds heard through headphones will often seem to originate within the listener's head. However, 3D Stereo Audio is achievable through binaural means precisely because headphones bypass most physical transformations. By strategically attenuating the audio signal for each ear, it is possible to recreate the illusion of a directional sound.
There are two different approaches we can take to modelling binaural sound. The first possibility is the simple 2-D geometric model. This model sets the ears as two colinear points, and uses the pressure wave inverse law and simple trigonometrics in order to figure out the amplitude and phase differences between the two ears given a sound origin point. The sound is then individually attenuated for each ear with the respective parameters to provide the illusory sound effect of a directional sound.
The second model is the generalized Head Related Transfer Functions (HRTF), which create an even more realistic sense of directional sound. HTRFs are the result of taking the Head Related Impulse Response (HRIR), and applying a Fourier Transform in order to create a transfer function that can be applied to any desired sound. In comparison to the simple geometric model, the HTRF takes into account additional locational cues such as reverberation unaccounted for by simple amplitude and phase change of the source sound. However, the downsides of HRTF are in its complexity - a separate function is required for each ear, and the most precise way of obtaining said function is through experimentation for each individual. Thus, rather than take measurements from everyone wishing to experience 3D Audio, mathematically deriving a general HTRF from an approximation of the human head should theoretically be pretty sufficient for the average listener.
Due to the mathematical equivalences in the geometric method, testing was only done from 0-180 degrees for the sake of feasible accuracy (anything greater than 180 would be indistinguishable due to the cone of confusion, where 135 and 225 would sound identical).
Volunteers were asked to identify the direction of the sound from 0 to 180 degrees, in increments of 45 (or whatever they felt best), with 0 degrees being a person's right, and 180 degrees being their left. The following graph is the results of the experiment:
As can be seen, there is much variance between the projected, actual angle, and the angle perceived by the volunteers, as much as a 45+ degree difference. As expected, the most distinguishable sounds were 180 degrees and 0 degrees. 90 degrees (straight in front) was more difficult than expected because rather than have a projection from the front, the sounds would play at equal volumes and timings, thus creating a delocalized effect.
Interesting to note is the volunteers' opinions on the sound. Many reported that they felt that one ear is better at hearing than the other, at least in the context of the experiment (i.e. 0 degrees was far more noticeable than 180 degrees). There are many plausible explanations. One possibility is that the soundcard on my laptop does not play each audio channel at the exact same setting. Although the earphones are also a factor, this effect was replicated amongst different sets of auditory devices as well, making this possibility less likely. It may also be a case of aural lateralization - just as how people tend to have a dominant hand and a dominant eye, perhaps auditory directional cues rely on unequal hearing in order to better pinpoint a direction and mitigate confusion.
The reason for the lack of success is most likely attributable to the lack of secondary cues such as reverberation, as well as the roughness of geometric tuning. While the former isn't as applicable in a wide, open, area such as a field due to lack of reflective surfaces, the rest of the head will still affect the sound. This quality is unaccounted for in geometric tuning. Additionally, said tuning is based on simple geometry - with a more accurate model perhaps more realistic 3D Audio can be achieved.