VR audio

From XVRWiki
Jump to navigation Jump to search
See also: Oculus Audio SDK

VR audio is a technology that simulates sound in a realistic manner for virtual reality (VR).

Localization is the process by which the human brain - with input signals coming from the ears- can precisely pinpoint the position of an object in 3D space only based on auditory clues. This characteristic of human biology is useful in different activities of day-to-day life and it can also be used to create immersive VR experiences. Indeed, while humans have five senses, only two of these are currently relevant to VR: sight and sound. Since these are the senses available to develop an immersive experience, they have to be explored to the fullest by means of high-caliber 3D graphics and truly 3D audio. [1]

Head tracking is a necessity of virtual reality audio. Sound is often pinpointed by moving the head slightly or rotating it. Therefore, it is essential to have truly 3D audio in a VR experience. [1][2]

Maintaining the audio cues that the brain needs to correctly localize the sound is still a challenge. The ears pick up audio in three dimensions, and the brain processes multiple cues to spatialize the sound. One of the cues is proximity, with the ear closer to the sound source picking up sound waves before the other. Distance is another cue, changing the audio levels. But these cues don’t apply to all directions. According to Lalwani (2016), “sounds that emerge from the front or the back are more ambiguous for the brain. In particular, when a sound from the front interacts with the outer ears, head, neck, and shoulders, it gets colored with modifications that help the brain solve the confusion. This interaction creates a response called Head-Related Transfer Function (HRTF), which has now become the linchpin of personalized immersive audio.” A person’s HTRFs is unique since the ears’ anatomy is different from person to person. [2]

Historically, audio has been a vital part of the computer and video gaming experience. It evolved from simple wave generators to FM synthesis, to 8-bit mono samples and 16-bit stereo samples, to today’s surround sound systems on modern gaming consoles. However, virtual reality is changing the traditional way that sound was used in computer and gaming experiences. VR brings the experience closer to the user through a head-mounted display (HMD) and headphones, and the head tracking changes how audio is implemented - being interdependent with the user’s actions and movements. [3]

With the advent of VR, virtual reality audio has gained more interest. Companies want to implement a VR audio solution that realistically reproduces audio in a virtual environment while not being computationally restrictive. The development of PC audio is more tumultuous than the field of graphics, but with the rise of VR, 3D audio is expected to gain traction and prominence. [4][5]

Importance of VR audio[edit]

VR audio is extremely important in a VR context in order to increase the user’s sense of presence by making the experience more immersive. VR developers cannot develop a virtual experience that only engages the sense of sight and expect to truly create an immersive environment. For the alternate worlds of VR to become real to the human brain, immersive graphics have to be matched by immersive 3D audio that simulates the natural listening experience. When properly implemented, it can solidify a scene, conveying information about where objects are and what type of environment the user is in. Visual and auditory cues amplify each other, and a conflict between the two will affect immersion. Indeed, truly 3D audio is vital to augment the entire VR experience, taking it to a level that could not be achieved by graphics only. [1][2][3][4][5][6]

Evolving VR audio[edit]

Les Borsai, VP of Business Development at Dysonics, has made some suggestions to move VR audio technology forward. He focuses mainly on three areas: better VR audio capture, better VR audio editing tools, and better VR audio for games. [6]

Improved VR audio recording means a gadget that captures true spherical audio, for the best reproduction over headphones. This enables the user to hear sounds change relative to the head movement and is essential for live-captured immersive content - one that adds an essential layer of contextual awareness and realism. According to Borsai, “the incorporation of motion restores the natural dynamics of sound, giving your brain a crystal-clear context map that helps you pinpoint and interact with sound sources all around you. These positional audio cues that lock onto the visuals are vital in extending the overall virtual illusion and result in hauntingly lifelike and compelling VR content.” [6]

The second suggestion made by Borsai - better VR audio editing tools - asserts that VR content creators need powerful but easy-to-use tools that will encompass all the stages of VR audio production, from raw capture to the finished product. Preferably the solution should be modular and easy-to-use since most content creators do not have the skill or time to focus on audio. Borsai’s suggestion of a complete audio stack includes “an 8-channel spherical capture solution for VR, plus post-processing tools that allow content creators to pull apart original audio, placing sounds around a virtual space with customizable 3D spatialization and motion-tracking control.” [6] His final suggestion touches on how significant developments in VR audio will come with the creation of plugins for the major gaming engines, such as Unity or Unreal. Borsai mentions that audio-realism is essential to gaming, that even the most subtle of audio cues allows the player to interact with sound sources around him resulting in an increase in overall immersion and natural reaction time. [6]

VR audio and the human auditory system[edit]

Humans depend on psychoacoustics and inference in order to locate sound sources within a three-dimensional space, taking into consideration factors like timing, phase, level, and spectral modifications. The main audio cues that humans use to localize sounds are interaural time differences, interaural level differences, and spectral filtering. [3][7]

Interaural time differences: this relates to the time of arrival of a sound wave to the left and right ears. The time difference varies according to the sound’s origin in relation to the person’s head. [7]

Interaural level differences: Humans are not able to discern the time of arrival of sound waves for higher frequencies. The level (volume) differences between the ears are used for frequencies above 1.5 KHz in order to identify the sound’s direction. [7]

Spectral filtering: The outer ears modify the sound’s frequencies depending on the direction of the sound. The alterations in frequency are used to determine the elevation of a sound source. [7]

Researchers have been tackling the VR audio problem, trying to measure individual audio modifications that allow the brain to localize simulated sounds with precision. In VR, the visual setting is predetermined, and the audio is best generated on a rendering engine that attaches sound to objects as they move and interact with the environment. Lalwani (2016) refers that, “this object-based audio technique uses software to assign audible cues to things and characters in 3D space.” [2]

Head-related Transfer Functions (HTRFs)[edit]

The HRTF is the foundation for the majority of current 3D sound spatialization techniques. Spatialization - the ability to reproduce a sound as if positioned at a specific place in a 3D environment - is an essential part of VR audio and a vital aspect to produce a sense of presence. Direction and distance are spatialization's main components. Depending on its direction, sounds are differently modified by the human body and ear geometry, and these effects are the basis of HRTFs that are used to localize them. [3]

Accurately capturing an HRTF requires an individual with microphones placed in the ears inside an anechoic chamber. Once inside, sounds are played from every direction necessary and recorded by the microphones. Comparing the original sound with the recorded one allows for the computation of the HRTF. To build a usable sample set of HRTFs, a sufficient number of discrete sound directions need to be captured. [3]

While custom HRTFs to match a person’s body and ear geometry would be ideal, it is not a practical solution. HRTFs are similar enough from one person to the other to allow for a generic reference set that is adequate for most situations, particularly when combined with head tracking. There are different publicly available datasets for HRTF-based spatialization implementations such as the IRCAM Listen Database, MIT KEMAR, CIPIC HRTF Database, and ARI (Acoustics Research Institute) HRTF Database. [3]

While HRTFs help to identify a sound’s direction, they do not model the localization of distance. Several factors affect how humans infer the distance to a sound source, which can be simulated with different levels of accuracy and computational cost. These are loudness, initial time delay, direct vs. reverberant sound, motion parallax, and high-frequency attenuation. [3]

Google and Valve’s VR audio[edit]

Google uses a technology called ambisonics to simulate sounds coming from virtual objects. The system surrounds the user with a high number of virtual loudspeakers that reproduce sound waves coming from all directions in the VR environment. The accuracy of the synthesized sound waves is directly proportional to the number of virtual loudspeakers. These are generated through the use of HRTFs. [7]

Valve as made available the Steam Audio SDK - a free option for developers who want to use VR audio in their VR apps. Steam Audio supports unity and Unreal Engine, and is available for Windows, Linux, MacOS, and Android. Furthermore, it is not restricted to a specific VR gadget or Steam. In a statement released by Valve, they said that “Steam Audio is an advanced spatial audio solution that uses physics-based sound propagation in addition to HRTF-based binaural audio for increased immersion. Spatial audio significantly improves immersion in VR; adding physics-based sound propagation further improves the experience by consistently recreating how sound interacts with the virtual environment.” [4]

History[edit]

Before the current emergence of VR, the interest in 3D audio was relatively low. Although sound has consistently improved over the years in terms of fidelity and signal-to-noise ratio, the real-time modeling of sound in a 3D space has not experienced the same level of consistent development. The true challenge for VR audio has been “reproducing the dynamic behavior of sound in a 3D space in real time.” The sound source and listener have to be computed in a 3D space (spatialization), so that has their positions change, the prerecorded audio sample sounds are also altered to adjust to the new spatial positions. Beside spatialization, the system has also to take into account the modifications made to a sound while it travels through an environment. The sound can be reflected, absorbed, blocked, or echoed. These effects on the sound are called audio ambiance and accounting for all these effects becomes computationally intensive. [1]

The capacity to create immersive, realistic VR audio already existed in the 1990s with a technology called A3D 2.0, developed by a company called Aureal. Mark Chase, in an article written for PC Gamer, said that “much of this technology relied on head-related transfer functions (or HRTFs), mathematical algorithms that take into account how sound from a 3D source enters the head based on ear and upper-body shape. This essentially helps replicate the auditory cues that allow us to pinpoint, or localize, where a sound is coming from.” [1]

The development of 3D audio would be affected by a legal action from Creative against Aureal for patent infringement. The cost of the legal action damaged Aureal financially, leaving the company to crippled to continue. Creative would then continue research on 3D audio, built on the backbone of DirectSound and DirectSound3D. [1]

DirectSound and DirectSound3D created a standardized, unified environment for 3D audio, helping it grow as a technology and be easily used by developers. It also allowed for the hardware acceleration of 3D sound. When Microsoft released Windows Vista, it stopped supporting DirectSound3D, affecting years of development by Creative. [1]

But with the advent of VR, the necessity of VR audio that can truly simulate natural sound has become a research priority. In 2014, Oculus licensed VisiSonic’s ReaSpace 3D audio technology, incorporating it into the Oculus Audio SDK. This technology follows the same principle that Aureal’s system used decades before, relying on custom HRTFs to recreate accurate spatialization over headphones. [1]

Microphones[edit]

AMBEO VR Mic

Dysonics RondoMic

References[edit]

  1. 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Chase, M. (2016). How VR is resurrecting 3D audio. Retrieved from http://www.pcgamer.com/how-vr-is-resurrecting-3d-audio/
  2. 2.0 2.1 2.2 2.3 Lalwani, M. (2016). For VR to be truly immersive, it needs convincing sound to match. Retrieved from https://www.engadget.com/2016/01/22/vr-needs-3d-audio/
  3. 3.0 3.1 3.2 3.3 3.4 3.5 3.6 Oculus. Introduction to virtual reality audio. Retrieved from https://developer.oculus.com/documentation/audiosdk/latest/concepts/book-audio-intro
  4. 4.0 4.1 4.2 Lang, B. (2017). Valve launches free steam audio SDK beta to give VR apps immersive 3D sound. Retrieved from https://www.roadtovr.com/valve-launches-free-steam-audio-sdk-beta-give-vr-apps-immersive-3d-sound/
  5. 5.0 5.1 Lang, B. (2017). Oculus to talk “Breakthroughs in spatial audio technologies” at Connect Conference. Retrieved from https://www.roadtovr.com/oculus-talk-breakthroughs-spatial-audio-technologies-connect-conference/
  6. 6.0 6.1 6.2 6.3 6.4 Borsai, L. (2016). This is why it’s time for VR audio to shine. Retrieved from https://www.roadtovr.com/this-is-why-its-time-for-vr-audio-to-shine/
  7. 7.0 7.1 7.2 7.3 7.4 Google. Spatial audio. Retrieved from https://developers.google.com/vr/concepts/spatial-audio