Investigation and evaluation of binaural models applied to speech perception in complex environments
MetadataShow full item record
The auditory system provides one of the most robust pathways human use in everyday life for speech communication. Within the auditory system, the function of the binaural processing is especially important for human listeners to better localize and perceive sounds. Past research in the literature has shown that the performance of normal hearing listeners is more robust than the existing automatic speech recognition machines in the presence of multiple interfering sounds. However, there is little understanding of the factors that are important for human speech perception in these environments. This thesis addresses the problem of speech intelligibility in complex environments by investigating the functionality of binaural processing in these environments. Two types of classical binaural processing models [Equalization-Cancellation (EC) model and Inter- aural Difference (ID) model] are extended and applied to speech perception in complex environments. The complex environments include multiple and different types of interfering sounds masking the frontal speech target from different spatial locations. The model outputs are evaluated using both the computational speech intelligibility measures and the psychoacoustic measures. The performance of the model is compared with existing speech intelligibility data. The comparison is followed by a discussion of how well these binaural models can account for binaural advantages in speech perception and what are the factors that are important in speech perception tasks. Results of the current study show that (1) both of the models can successfully predict speech intelligibility data in the presence of different noise maskers (either modulated or not); (2) both of the models fail to capture the informational masking component when the maskers are speech, especially in the co-located condition; (3) different factors involved in speech perception are identified, such as signal-to-noise ratio, gap-listening, and information masking and; and (4) Equalization-Cancellation models are able to generate diotic outputs that have equivalent intelligibility as the original dichotic stimuli while the Interaural Difference model can not, due to the sparseness of the mask in the time-frequency domain.
Thesis (Ph.D.)--Boston UniversityPLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at firstname.lastname@example.org. Thank you.