A physiologically inspired model for solving the cocktail party problem.
Chou, Kenny F.
Colburn, H. Steven
MetadataShow full item record
Citation (published version)Kenny F. Chou, Junzi Dong, H. Steven Colburn, Kamal Sen. 2019. "A Physiologically Inspired Model for Solving the Cocktail Party Problem.." J Assoc Res Otolaryngol, Volume 20, Issue 6, pp. 579 - 593. https://doi.org/10.1007/s10162-019-00732-4
At a cocktail party, we can broadly monitor the entire acoustic scene to detect important cues (e.g., our names being called, or the fire alarm going off), or selectively listen to a target sound source (e.g., a conversation partner). It has recently been observed that individual neurons in the avian field L (analog to the mammalian auditory cortex) can display broad spatial tuning to single targets and selective tuning to a target embedded in spatially distributed sound mixtures. Here, we describe a model inspired by these experimental observations and apply it to process mixtures of human speech sentences. This processing is realized in the neural spiking domain. It converts binaural acoustic inputs into cortical spike trains using a multi-stage model composed of a cochlear filter-bank, a midbrain spatial-localization network, and a cortical network. The output spike trains of the cortical network are then converted back into an acoustic waveform, using a stimulus reconstruction technique. The intelligibility of the reconstructed output is quantified using an objective measure of speech intelligibility. We apply the algorithm to single and multi-talker speech to demonstrate that the physiologically inspired algorithm is able to achieve intelligible reconstruction of an "attended" target sentence embedded in two other non-attended masker sentences. The algorithm is also robust to masker level and displays performance trends comparable to humans. The ideas from this work may help improve the performance of hearing assistive devices (e.g., hearing aids and cochlear implants), speech-recognition technology, and computational algorithms for processing natural scenes cluttered with spatially distributed acoustic objects.
Rights© 2019 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.