Visually Guided Hearing Aid: Hearing in plain sight


The concept of a Visually Guided Hearing Aid (VGHA) is intriguing and encompasses a number of different disciplines that are not for the faint of heart given the complexity of the technologies brought together.

Although not a new concept, VGHA, which amalgamated from progressing work on spatial hearing at Boston University’s Psychoacoustics Laboratory, Boston, is a prototype currently used for research that incorporates acoustic beamforming, a means of exploiting audio-frequency signals. to focus the amplification of a signal from a desired direction.

In this case, in the VGHA, the formed beam is manipulated by gaze to improve the ability of technology users to better focus on a sound despite the presence of competing sounds nearby.

“One of the early concepts that drove this work was the discovery that people can listen very selectively in space,” explained Gerald Kidd Jr., PhD, professor in the Department of Speech, Language Sciences and Hearing and Director of the Boston Psychoacoustics Laboratory. University. “Along the left-to-right dimension, i.e. azimuth, they can focus their attention on a particular point of interest and attenuate sound sources that are off-axis from the center of Warning.”

Evidence for this spatial tuning effect with natural hearing, first described in 2000,1 then embarked on studies of hearing loss and improving hearing aids.

“Looking at how we could improve people’s hearing in situations where there are multiple spatially distributed sound sources, which has always been a problem for people who wear hearing aids, we worked on designing of an algorithm to reproduce normal spatial hearing for those who cannot hear well, i.e. separating the sources in azimuth,” Kidd said.

Down the Beamforming Path

Kidd and his colleagues worked in conjunction with researchers from Sensimetrics Corporation of Malden, MA, who devised work on beamforming as a method to improve hearing aids by improving the signal-to-noise ratio for sounds that are immediately in front of the beamformer.

This research extended previous work at the Electronics Research Laboratory at the Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts.2 One option was to mount microphones on a spectacle frame.

In collaboration with Joseph Desloge, PhD, the 2 groups identified that a problem with a beamformer mounted on an eyeglass frame was the inability to move it from one place to another to, for example, follow a conversation in a group of people.

The listener’s head should move each time a different individual speaks to point the beamformer in the direction of the speaker.

Another method of steering the beamformer was also considered using a manual dial or telephone commands to move the beam in a desired direction.

However, the most natural way to direct the beam was in plain sight, by moving the beam with the gaze. The connection of auditory and visual attention was thus born and the 2 moved in tandem from left to right with the change in the source of the sound.

It was seen as a powerful way to improve amplification for people who are hard of hearing, Kidd explained.

VGA components

To accomplish this, Kidd and Desloge ultimately settled on a system using signals from a commercially available eye tracker combined with a bespoke microphone array consisting of 4 rows of 4 recessed microphones on a flexible strip that could be positioned across the top of the user’s head.3

Kidd explained that the microphone outputs (the acoustic component) are combined using an algorithm that has been applied to audio beamforming designed by the MIT group, among others.

“This configuration optimizes the response of the microphone array to the direction chosen by the user,” he said.

The signals from the microphone array are processed to respond maximally to an azimuth which is determined by the gaze which is detected by the eye tracker.

One requirement of the eye tracker is that it has both a worldview camera that points outward and a camera that points inward to track the location of the pupil.

The 2 used together calibrate where the eyes are positioned relative to where the external camera thinks the eyes are positioned, he explained.

The associated software contains a pre-measured set of head-related impulse responses.

“These responses provide the values ​​across the frequency for the algorithm that determines the optimal phase response, or time delay, and amplitude used to weight each microphone’s response to optimize responsiveness at a particular azimuth” , described Kidd.

After positioning the system on a manikin and obtaining a recorded set of impulse responses from a number of different locations in the forward hemifield, the resolution obtained was good.

When placed on a user, the individual detects, using the eye tracker, the angle at which the eyes are positioned and selects the head-related transfer function that corresponds to this azimuth and convolves with the approaching stimulus.

“It provides a very highly directional response that’s appropriate as if the sound is coming from that location,” he said.

The filtering function of the VGAH is sharpest at high frequencies (short wavelengths) and widest at low frequencies (long wavelengths), causing the beamformer to be most precisely tuned at high frequencies. frequencies.

Current state of the VGA

Kidd explained that, for sounds that fall outside the focus of the beam, they are strongly attenuated at high frequencies and less attenuated at low frequencies, depending on the distance from the focus.

“It usually degrades the quality a bit for sounds that are off-axis,” he said. “However, the beamformer can provide a great improvement in signal-to-noise ratio for nearby sound sources and in rooms with good acoustics.”

This technology appears to be most effective when used in a small conference room with people seated around a table or in a “cocktail” scenario, where selective hearing can be impaired when multiple people are speaking simultaneously.

In larger environments, such as at a concert or theater play, the technology would currently not be beneficial to select 1 speaker from among others from a greater distance.

“The basic physics of this technology is that it will be more beneficial to nearby sound sources,” Kidd said.

One step forward

A very recent beneficial advance in Kidd’s research has been the development of a triple beamformer that should provide better hearing for people with cochlear implants.4

For comparison, the original beamformer had a single channel output, he explained, which consisted of a spatial filter used to deliver sound to 1 or both ears, but with no difference in the sound that went to both ears.

“With single-channel output, while hearing is enhanced, our natural binaural hearing that provides a great advantage in localizing sounds is lost,” he said.

The latest focus from investigators uses multiple beams to focus on the primary sound source of interest. A second beam pointed at the right funnels only sounds to the right ear, and a third beam pointed at the left funnels only sounds to the left ear.

“This approach restores some of the normal binaural hearing in addition to the benefits of the original beamformer,” he explained.

This restoration, the main feature of the triple beamformer, works by improving the signal-to-noise ratio of the single-channel beam and improves spatial hearing, improving the ability to locate sound sources outside the beam.5

In patients with bilateral cochlear implants, the devices mostly operate independently of each other, resulting in the loss of much normal binaural spatial hearing.

“The triple beamformer enhances in-ear differences that lead to spatial localization and source segregation,” Kidd reported. He believes this research approach holds promise for this patient population.

VGHA research is currently confined to the laboratory and computers do the signal processing. Additionally, for the technology to become commercially available, the wearable system would need to be miniaturized and cosmetically acceptable to be worn on the head.

He foresees that product development could first aim to use the technology in the cochlear implant community.

Kidd concluded by expressing his hope that the research will lead to building a better hearing aid to solve situations such as the classic cocktail problem.

“Also, the technology isn’t just for people who are hard of hearing,” he said. “We believe that even people with normal hearing could benefit from this technology and that it could have wide application.”

Kidd has no financial interest in this technology.


1. Arbogast TL, Kidd G Jr. Evidence for spatial tuning in informational masking using the probe signal method. J Acoust Soc Am 2000;108:1803–10.

2. Desloge JG, Rabinowitz WM, Zurek PM. Microphone hearing aids with binaural output. I. Fixed treatment systems. IEEE Trans Audio Speech Lang Process 1997;5, 529-542.

3. Kidd G Jr, Favrot S, Desloge J, et al. Design and preliminary testing of a visually guided hearing aid. J Acoust Soc Am 2013;133, EL202–EL207.

4. Kidd G Jr, Jennings TR, Byrne AJ. Improve perceptual segregation and localization of sound sources with a triple beamformer. J Acoust Soc Am 2020;148:3598;

5. Yun D, ​​Jennings TR, Kidd G Jr., Goupell MJ. Benefits of triple acoustic beamforming in speech-to-speech masking and sound localization for bilateral cochlear implant users. J Acoust Soc Am 2021;149:3052-3072; doi: 10.1121/10.0003933


Comments are closed.