What is 6DoF audio?

When we are out and about, we can hear sounds around us in 3D, we can locate directions and distances of sound sources. The way we perceive the directions and distances of sound sources depend on our position as well as the relative positions of sound sources with respect to our head.

Modern 3D audio rendering systems can provide a good level of realism and immersion for synthetic and also for real recordings of sound fields. However, the condition in which the listener moves translationally from their position can only be handled for synthetic content such as VR, AR and games.

The emerging media formats such as MPEG-I promise position-independent immersive video content within which an immersed viewer can move not only their heads but also translationally. 6DoF audio (or position-independent audio) is a term used for audible content which has to accompany immersive video. Such content is calling for real recordings of both video and audio and bring unprecedented challenges with respect to the theory and technical aspects of such recordings. Specific research questions in the field are:

  • How can we record an acoustic scene with a finite number of microphones, yet obtain a high-accuracy approximation of the scene at intermediate points?
  • How can we store, process, encode and transmit such content with a sufficiently low latency to allow interactive operation?
  • How can we transcode between different representations (e.g. scene-based, object-based and binaural) of the same scene?
  • How do we measure the perceptual quality, immersiveness, fluency, and responsiveness of position-independent spatial audio?

This project aims to provide answers to some of these questions, bring novel solutions, demonstrate their use in real-life recording scenrios, thereby furthering the state-of-the-art in 6DoF audio.